Performance Testing

How to Approach It

René Schwietzke, Xceptance

Why?

Performance is a key selling point for everything, but often not mentioned.

  • Google: 1 min per search
  • Amazon: 80 s to accept an order
  • Netflix: 4 h to load a movie
  • Navigation System: 5 min to calculate a route
  • Google: 100 searches per min
  • Amazon: 3 s to accept an order
  • Netflix: 10 s to load a movie
  • Navigation System: 15 s to calculate a route

Motivation

Why do we need to know how to test or test at all?

„About 25 years ago Jakob Nielsen wrote a book called Usability Engineering and he offered some advice about response times which had already been in place for at least 25 years before he wrote the book.“

  • 0.1 second is about the limit for having a visitor feel as though the system is reacting instantaneously.
  • 1.0 second is about the limit for a visitor’s flow of thought to stay uninterrupted, even though the visitor will notice the delay.
  • 10 seconds is about the limit for keeping the visitor’s attention focused on the task they want to perform.

Motivation Continued

Business

  • Pinterest rebuilt pages for performance, achieved a 40% reduction in perceived wait times. Saw 15% increase in both search engine traffic and sign-ups
  • BBC found they lost an additional 10% of users for every additional second their site took to load
  • DoubleClick found 53% of mobile site visits were abandoned if a page took longer than 3 seconds to load
  • When AutoAnything reduced page load time by half, they saw a boost of 12-13% in sales

Technology

  • We add features constantly
  • There is new and faster hardware all the time
  • The amount of data keeps increasing dramatically
  • New business models pose new challenges

What is Performance?

Performance is a vage term

The word performance in computer performance means the same thing that performance means in other contexts, that is, it means "How well is the computer doing the work it is supposed to do?"

Arnold Allen

Computer performance is the amount of work accomplished by a computer system.

https://en.wikipedia.org/wiki/Computer_performance

Dimensions

What metrics and topics shape performance

Topics

  • Volume
  • Scaling
  • Growth
  • Concurrency
  • Reliability
  • High-Availability (HA)
  • Disaster Recovery (DR)

Metrics

  • The most important high level metrics
    • Response Time: How fast is a single action
    • Throughput: How many actions are possible per time unit
    • Utilization: How many resources does it need and how are these used
    • Availability: How usable is the system aka all the time, sometimes, occasionally

Typical Problems

Examples of performance that can go wrong

  • Slower with growing data sets
  • Exponential runtime for longer inputs
  • Cart with 10 items is fine, with 50 never returns
  • Hiccups every 30 min
  • Performance drops after a release
  • Outage every 12 h
  • No performance gain with more CPUs
  • Sudden performance change after 20 h
  • Not able to sustain a sudden burst
  • Repeating peak order traffic fails
  • When restarting under load, the system does not come up

Failing Regex

Checking emails and data input

This is a simple matching expression for all strings consisting of a and b, with each letter being optional and can come in combination of groups: (a*b*)*

String Runtime
aaab0ms
aaaaaab0ms
aaaaaaaaab0ms
aaaaaaaaaaaab0ms
aaaaaaaaaaaaaaab0ms
aaaaaaaaaaaaaaaaaab0ms
aaaaaaaaaaaaaaaaaaaaab0ms
aaaaaaaaaaaaaaaaaaaaaaaab0ms
aaaaaaaaaaaaaaaaaaaaaaaaaaab0ms
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaab0ms
String Runtime
aaax0ms
aaaaaax0ms
aaaaaaaaax1ms
aaaaaaaaaaaax2ms
aaaaaaaaaaaaaaax11ms
aaaaaaaaaaaaaaaaaax77ms
aaaaaaaaaaaaaaaaaaaaax341ms
aaaaaaaaaaaaaaaaaaaaaaaax643ms
aaaaaaaaaaaaaaaaaaaaaaaaaaax5,145ms
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaax42,008ms

Failing Performance

Examples of performance failures.

Performance Testing

How do we test all that

Performance Testing

In software engineering, performance testing is in general, a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

https://en.wikipedia.org/wiki/Software_performance_testing

WARNING - Performance and Testing

Performance is something you design into your product. Performance cannot be tested into it. If you haven't paid attention during design time, your testing will only state the obvious. Serious performance problem originate in design and not a few red hot code lines.

Performance testing is often seen as a lifesaver when things go wrong but it just reveals what is mostly known.

When you combine performance tests and tuning and you celebrate 25% improvement, you probably can improve by 500% when you change the design or architecture.

Performance Testing

What types of tests can we have?

Not all tests are the same

We will follow a different path than the industry

  • Load Testing
  • Stress Testing
  • Soak Testing
  • Spike Testing
  • Breakpoint Testing
  • Configuration Testing
  • Isolation Testing
  • Internet Testing
  • Seriously, forget all that!
  • Performance testing is about understanding the product, challenges, expectations, and the path to proving it
  • If you wanna fancy that up with some name... fine with me

Area of the Test

What you can test

  • Storefront
  • Backoffice
  • App
  • API
  • REST/Services
  • Processes
  • Isolated components
  • A mix of all that
  • Storefront and similar tests can be sub-divided into
    • Server-side only (no rendering)
    • Client-side, real rendering and user interaction measurement
  • Tests can be manual as well such as storefront performance evaluations

Size of the Test

Size in the sense of what and how long should be tested

  • A single topic, such as storefront search
  • Combination, such as search and reindexing
  • A full simulation of the system
  • Short cycle, 15 min
  • Day simulation, 24h
  • Lifetime simulation

Step #0 - The Product

Understand the product first

  • Understand your product
  • Performance testing is highly technical
  • Understand the technical pieces of the product
  • Understand the interaction with components to isolate the problem later
  • Map your interest to the product
  • Understand the typical states of your system
  • Try to get a real world view
  • Artificial stuff is ok, but might just waste time too
  • Ask yourself: If something never was designed for X, why should we test X, knowing that nobody ever considered that during design?!

Non-Computer Exercise

Test driving a car

Exercise 01

Intel just sent us new CPUs

Step #1 - Why

Know the problem or challenge first to understand it

  • What is the challenge?
  • What is the problem behind it?
  • Do you know already enough?
  • Has this happened before?
  • Did we measure it already?
  • What are the dimensions?
  • Nothing is known
  • There is a claim and we need prove
  • Result is known but the why is open
  • Something has changed and it has to be quantified

Exercise 02

Data Dimensions - Why

Step #2a - Requirements

We need target numbers or other things similar to that

  • What are your goals?
  • Anything you can attach numbers to?
  • Or are you looking only for numbers?
  • Anything you can compare to?
  • Constrain your numbers aka you get this, when this is given
  • Traffic
  • Throughput
  • Latency
  • Resource utilization
  • Response times
  • Rendering performance
  • Uptime
  • SKU Count
  • Customers
  • Order Size
  • Order Amount
  • Attribute Count

Some number examples

Just examples of how number based requirements might look like

Storefront

  • What
    • Response times P99 under 1 s
    • Regardless of cache state
    • No runtimes larger 5 s
    • No errors
  • When
    • 500k plain products
    • No hardware constraints
    • 5% conversion rate

Import Process

  • Able to import 100k products in 5 min
  • Initial import and update
  • DB contains either 0, 50k, or 100k of the same products
  • Side effect free

Order Export

  • Able to export unexported 1k orders every 5 min
  • Order up to 50 items with 10 custom attributes per item
  • Registered and guest orders
  • No side effect of order intake (rate 1k/5 min with burst of 5k/5min)

Step #2b - Limitations

A test is a test, state the limitations and limit yourself

Implicit

  • Data
  • Time
  • Growth
  • System wear and tear
  • Fragmentation
  • Randomness
  • Unknown behavior
  • External factors

Explicit

  • Data
  • Data Dependencies
  • Ignored Data and Processes
  • Customization
  • User Scenarios
  • Time

Percentiles to the rescue

Wait... 1 s response time... what is that?

  • Average: Ok, but not precise
  • You miss details
  • Percentiles help, combine with average
  • Preferable very high percentiles
  • P99 at least
  • Set an upper limit aka never over 10 s

A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value below which 20% of the observations may be found.

https://en.wikipedia.org/wiki/Percentile

  • Imagine 100k visits, 1 million page views and 10k orders per hour
  • P99 = 5 s says: 1k visits, 10k page views, and 100 orders are slower than that... unspecified by how much

Exercise 03

Data Dimensions - Requirements

Step #3 - Hypothesize

Come up with a theory what you expect to see and why

  • What are the expectations?
  • Why this result?
  • What goes on technically?
  • How can you prove the results or know that this is right?
  • What can go wrong?
  • Is there any uncertain thing?
  • Are they any dimensions that influences our doing?

Exercise 04

Data Dimensions - Hypothesis

Step #4 - Plan

What do you need and what has to be done?

  • Any tools available?
  • How do we get the system setup?
  • How can we repeat things?
  • How long will it take?
  • How valid are the results?
  • What other factor can contribute?
  • Test Concept (aka test plan)
  • Test Approach
  • Target numbers
  • Put in all your requirements and limitations

Exercise 05

Data Dimensions - Plan

Step #5 - Select and Build

Build, install, configure, validate, dry-run it.

  • Select a tool: XLT, JMeter, Blazemeter, Gatling...
  • Select an approach: DOM, REST, Request-Response, Microbenchmark
  • Build data tools
  • Train yourself
  • Test small scale
  • Consult

Step #6 - Measure

Get to the numbers using what you have prepared

  • Evaluate small scale
  • Try to find a baseline if possible
  • Measure large scale
  • Verify
  • Repeat
  • Able to stop?
  • Always measure at least 3 times
  • Question your data right away
  • When measuring two configurations: A and B, do A, B, A, B
  • Cannot repeat the test? You didn't test anything

Important: No detours

Don't get distracted

  • Don't start tuning when this is not your goal
  • Don't tune when you don't need to
  • Resist taking on different numbers or data schemes
  • Don't change anything when not planned
  • Resist shaping the test to achieve certain results... you can do this with a pen and some paper more easily
  • If you have to change things, you might have to restart measurements

Important: Be Flexible

You cannot plan for everything

  • Test fails? Check why
  • Need to abort? Do it
  • Does not make sense what you measure? Back to the drawing board
  • Take a step back if needed
  • Stack with a failure pattern? Investigate it
  • You don't get to the tests planned? Help to solve that

Step #7 - Evaluate

Generate and check the data. Discuss.

  • Expectations met?
  • Consistent?
  • Do you understand the results?
  • Proved theory and when not, why?
  • Able to sell the results?
  • Derive new tests and actions

Step #8 - Iterate

Shall we play it again, Sam?

  • Any doubt?
  • Any oddities?
  • Any identified open dimensions?

Server vs. Client-Side

Pick the right approach when measuring storefronts

Server-Side

  • Load testing needs resources
  • A browser is already more than one cpu and 500MB RAM
  • That is not sustainable when simulating 1 million or more pages a second
  • Server-side tests simplify the client side to focus on the server side
  • No rendering, no JavaScript, no static content (depends on the test)

Client-Side

  • Full user experience is more than server download
  • Rendering is significant overhead
  • JavaScript is an overhead
  • You want to measure the real user experience
  • Use a real browser but don't use it to load the environment
  • Depending on the load, you can use real browsers for load testing

Golden Rules

Some things to keep in mind

  • One change only
  • Keep your goals reachable
  • Don't betray yourself aka run with cache only
  • Don't get distracted
  • Capture enough data
  • Don't jump to conclusion, understand it

Warnings

Performance testing is art and science at the same time. Just measuring and claiming victory is not enough: "Wer misst, misst Mist." or just google Heisenberg's uncertainty principle.

  • Always question the results
  • Always have a theory first
  • The test is a simulation, it is not reality
  • If you don't understand the system, you can at least learn, but it does not necessarily become right what you measure or do
  • What is valid today is often obsolete tomorrow
  • If someone touches the code, it might render the results void

Questions and Answers

42 is a good start