Xceptance Monitoring

Website Monitoring Done Differently

René Schwietzke, Xceptance

What is Monitoring?

A short explanation

Website monitoring is the process of testing and verifying that end-users can interact with a website or web application as expected. Website monitoring is often used by businesses to ensure website uptime, performance, and functionality behaves as expected.

Monitoring can also be Business Transaction Monitoring. ...a tool for tracking the flow of transactions across IT infrastructure, in addition to detection, alerting, and correction of unexpected changes in business or technical conditions...

Monitoring

Monitoring Goals

Direct Goals

  • Verification of Availability
  • Functional Verification
  • Performance Monitoring
  • Issue Notification
  • Location Comparison
  • First and Third Party Checks

Indirect Goals

  • Historical Tracking
  • Failure Investigation
  • Event Correlation
  • Find the unknown

Request

Simplest form of monitoring

  • Single url monitoring
  • Does not use a browser

Advantages

  • Simple
  • Quick to setup
  • No external dependencies aka browser

Disadvantages

  • No third parties
  • No page performance
  • Isolated request and not a page state

Page Load

Simple but more realistic

  • Single url monitoring
  • Uses a real browser
  • Typically only checks for response code and some text
  • Tracking of page load performance possible

Advantages

  • Simple
  • Mostly stable
  • Easy to code or setup
  • Includes third parties

Disadvantages

  • Verification is limited
  • Problematic when full load is before finished state (React/Vue)
  • Problematic when url is not a page load at all
  • No business flow

Synthetic Monitoring

Execute business flows

  • Run a business flow
  • Scripted or url approach
  • Can cover activities that require interactions

Advantages

  • Captures user interactions
  • Reaches into areas that can only be reached with state (cart, order)
  • Delivers consistent data

Disadvantages

  • Modern web is highly interactive, simple scripts might be to limiting
  • Data is not static
  • Only works with real browsers easily
  • Url-only approach is difficult due to the modern web

RUM

Real User Monitoring

  • Don't code, just measure
  • JavaScript client extension to measure in a real browser
  • Use real users and real traffic to measure and monitor

Advantages

  • Discovers browser depended oddities
  • High variance of data due to natural variance
  • Includes last mile

Disadvantages

  • Third party tracking can be blocked
  • When there is no traffic, there is no data
  • Broken functionality might be found late
  • Missing functionality is not discovered easily
  • Sales and marketing activities can change the data
  • Requires client instrumentation

All Combined

Synthetic Monitoring and RUM

  • Use a real flow to trigger RUM capturing
  • Permits measurement on test instances

Advantages

  • Guaranteed measurement of functionality

Disadvantages

  • Hard to keep real data and synthetic data apart
  • No measurement when there is no instrumentation
  • Needs client lib support similar to RUM

Advanced Synthetic Monitoring

Use real test cases with full test stack

  • Use regular test automation approach
  • Use a DevOps approach
  • Connect test automation and monitoring
  • Maintain the test case as part of the test automation
  • Real code possible

Advantages

  • Full control over flow
  • Covers non-page loads as well
  • Able to react to state
  • Use WebDriver automation stack
  • No third parties or instrumentation needed

Disadvantages

  • Developer knowledge needed
  • Stable test case required
  • Timing issue can be a challenge
  • It is not semi-automatic like RUM

Performance Details

Most Important Events During a Page's Life Cycle

Navigation Timing API

Events - TTFB/TTLB

When do we get things

  • Time to first byte
  • Time to last byte
  • Interesting because we know how long server processing and networking took
  • Interesting because we know how long downloading took
  • The following technical details are involved:
    • DNS
    • TCP handshake
    • TLS handshake
    • Send time
    • Server processing time
    • Time to first byte delivered (contains latency)
    • Time to last byte downloaded (also with latency)
    • Network capacity

DOM Events

A series of timestamps and events, measurements by Performance Timing API

  • domLoading: Got the first bytes and started parsing
  • domInteractive: Got all HTML, finished parsing, finished async JS, finished blocking JS, starting deferred JS processing
  • domContentLoaded: Deferred JS was executed, DomContentLoaded event fires and triggers event handler for JS
  • domComplete: All content has been loaded (aka images and more), DomContentLoaded event was fully processed (attached JS), fire onload event and start processing JS
  • loadEventEnd: All JS attached to onload was executed, dust should have settled

Events - Painting

Users don't care about technical details

  • Users judge by impression
  • No interest in DOM numbers
  • Browsers expose Paint Timing API: first-paint, first-contentful-paint
  • Alternative measurements are "first meaningful paint" aka something the user waited for, not just content (no API)
  • The page is complete and does not load or move anymore ("visual complete") (no API)
First Paint First Contentful Paint
First Meaningful Paint Visual Complete

Xceptance Monitoring

Ideas, Concepts, Features

Features

The Features and Ideas behind Xceptance Monitoring (XM)

  • SaaS Offering
  • Multiple Projects
  • Roles and Permissions
  • WebDriver based test automation or request based
  • API testing possible as well
  • Can instrument a browser for measurement
  • Automatic metric capturing
  • Browser based or request based, mix possible
  • Service testing possible as well
  • Ability to develop and test locally
  • Uses GIT to centralize and communicate code

Basics

What components you will see and use

Tools

  • XTC: the SaaS UI
  • XM: the monitoring core
  • XLT: our load testing framework for the automation and later measurement

XTC

  • User Management
  • Organisation Management
  • Project Management

Users and Organisation

The basic building blocks

Users

  • Users are independent of an organization
  • Users are identified by e-mail and authenticated by password
  • User can be project and organization member
  • User can have different roles in each project

Organizations and Projects

  • An organization is a kind of pool of projects
  • It has members and projects
  • Memberships don't have meaning yet
  • You can assign anyone to any project as long as you know the email and there is an account

The Monitoring Project

The Modules of XM

The Project

Some project terms first

  • Metrics: The collected data dashboards
  • History: The test executions in detail for diagnostics
  • Configuration: Where the basic setup is done

Project: General

The little details you can set

  • Name: shown everywhere
  • Short Name: used internally (think JIRA short code)
  • Description: Some verbal project purpose
  • Avatar: A logo for the project used here and there

Project: Repository

The way XM gets its code

  • GIT is the source of truth
  • Holds test suite and basic(!) test config
  • Code is automatically updated from GIT and recompiled
  • Ability to switch branches dynamically
    • Site deployment has a text file
    • File defines branch
    • Site change will automatically switch monitoring

Project: Scenarios

The core of monitoring

  • A scenario is a test case setup
  • It is not a test case, because you can map a test case multiple times and use it in many scenarios
  • Each project has defaults that are inherited by each scenario
  • Defaults simplify the setup

Project: Scenario Setup

  • A name for the scenarios to use
  • The test class - JUnit
  • A short description
  • An execution schedule
    • How often such as every 5 min
    • In case of a failure, should it be retried quicker?
  • Custom properties
  • Notifications to send, text and e-mail
  • Criteria to evaluate
    • Request runtime and errors
    • Transaction, action runtime
    • DomContent, OnLoad, First Paint, First Contentful Paint

The core of monitoring

Project: Memberships

Who can see and manage

  • Project members and their roles
  • Not all roles are fully utilized yet
  • Membership is not needed for a notification (but to diagnose it)
  • Membership is explicit by project

Project: Data Persistence

How long does data live

  • Informs about the current data retention
  • For result storage only, does not include metrics
  • Good runs: No failures or criteria misses
  • Bad runs: Anything that raised an alarm
  • Read-only view

Planning and Setup

Let's go over the Project Basics

Plan

What do we want to do?

Questions

  • What do we want to monitor?
  • Why do we monitor?
  • What can we use to prove it?
  • What states can we have?
  • When do we want to know a state change?
  • Who should know?

Answers

  • Website
  • Availability, correctness, speed
  • Criteria
    • Response Code - 200
    • Basic Elements - Empty cart, logged out, navigation, footer
    • FCP max 5,000 ms, OnLoad max 20,000 ms
  • Homepage correct, incorrect, different page, no response
  • Within three minutes, after second failure
  • Operations team

Implement

The minimal code

  • JUnit style test
  • XLT interface to WebDriver
  • Action block for proper naming
  • Opens an url and waits for the dialog to show
  • Verifies that we have not been redirected
public class TMinimalHomepage extends AbstractWebDriverScriptTestCase
{
    @Test
    public void minimalHomepage() throws Throwable
    {
        // ok, let's open the homepage, this is without tracking
        Action.run("Homepage", () ->
        {
            // hardcoded homepage
            Commands.open("https://www.foobar.com/en-us/");

            // make sure the page came up just fine and is fully loaded with 
            // its async JS, if the privacy stuff is missing, 
            // we will already fail here
            Commands.waitForVisible("css=#js-data-privacy-save-button");

            // just make sure we have not been redirected
            Assert.assertTrue("Url redirect happened", 
                Commands.getWebDriver().getCurrentUrl().endsWith("/en-us/"));
        });
    }
}

Setup

Demo of Setup

Verify

Use activity logging and storing

Verify Details

Verify Additional Details, such as Screenshots and HAR data

Notifications

How to Get Notified

Basics

Notification Basics

  • Notifications are optional
  • Via e-mail and text
  • Per Scenario setup possible or from project defaults
  • Can be deactivated per entry or all together
  • Notifications can be postponed to events happening in a row
  • You can set a reply-to if needed

Criteria

When do Notifications fire?

  • When something bad happens
    • Cannot execute at all
    • Any unknown error came up
    • When an assertion occurred
  • When a criteria is triggered
  • When the threshold is up: How often should a failure occur before a notification is sent (default: 1)

Event Types

  • Unexpected events
  • Expected events

Information

What a Notification tells you

  • Which project
  • What scenario
  • What time
  • What location
  • What is the reason
  • Additional details when available

Another Example

Examples of more general problems

Metrics

Long Term Observations

Metric Overview

A General Overview

  • Time-series based data storage
  • Captures action and page metrics as well as failure count and total test runtime
  • Supports time based view, filtering, and split by location

Action Timings

Actions are wrappers

  • Actions group logical activities
  • They run their own timers
  • Useful for non-page load activities
// open homepage
Action.run("Homepage", () ->
{
    Commands.open("https://www.foobar.com");
    Commands.waitForVisible("css=#js-data-privacy-save-button");
    
    Assert.assertTrue("Redirect missing", 
        Commands.getWebDriver().getCurrentUrl().endsWith("/"));
});

Page Load Timings

What detailed browser timings are collected

  • XLT extends Chrome and Firefox with a plugin
  • Captures Navigation Timing and Performance API data
  • Timings published
    • FirstPaint
    • FirstContentFullPaint
    • DomContentLoaded
    • Load (OnLoad)
P,Homepage [FirstPaint],1563186744174,3705,false
P,Homepage [FirstContentfulPaint],1563186744174,3705,false
P,Homepage [LoadEventStart],1563186744174,9548,false
P,Homepage [DomInteractive],1563186744174,6173,false
P,Homepage [DomComplete],1563186744174,9548,false
P,Homepage [DomContentLoadedEventStart],1563186744174,6258,false
P,Homepage [DomContentLoadedEventEnd],1563186744174,6258,false
P,Homepage [DomLoading],1563186744174,1113,false
P,Homepage [LoadEventEnd],1563186744174,9560,false

Filtering Metrics

Metrics can be filtered and grouped

  • Most metrics can be filtered and grouped
  • Filter By Scenario: Show only one relevant test
  • Filter By Action: Show only these actions
  • Filter By Location: Show only selected locations
  • Group by Scenario: Show chart lines by scenario
  • Group by Location: Show chart lines by location

More Examples

Time Dimensions

View by time and narrow down

  • Select by last X hours or days
    • Last Hour
    • Last Day
    • Today
    • Last Week
    • ...
  • Custom time ranges by date and time
  • Data is preserved at least 30 days
  • Working on 365 days
  • Auto refresh can be set

Example: 3rd Parties

Daily 3rd party influence as well as a major performance problem

Example: 3rd Parties

Daily 3rd party influence as well as a major performance problem by location

Example: Location Views

Provider maintenance effect

Example: Caches

Caches and monitoring traffic

Example: Caches

Caches and monitoring traffic

Feature Outlook

What features might be next

In the Pipeline

This is not a guaranteed roadmap

  • Extensive Starter Template
  • Extensive Documentation
  • Selection of Locations per Scenarios
  • Google OAuth
  • User based phone and email for notifications
  • PXX numbers
  • SLA definitions and monitoring
  • Y-Axis limitation
  • UI Reworks
  • Load Testing as new Project Type

Questions

Feel free to ask