Performance Testing
How to Approach It
René Schwietzke, Xceptance
Why?
Performance is a key selling point for everything, but often not mentioned.
- Google: 1 min per search
- Amazon: 80 s to accept an order
- Netflix: 4 h to load a movie
- Navigation System: 5 min to calculate a route
- Google: 100 searches per min
- Amazon: 3 s to accept an order
- Netflix: 10 s to load a movie
- Navigation System: 15 s to calculate a route
Motivation
Why do we need to know how to test or test at all?
„About 25 years ago Jakob Nielsen wrote a book called Usability Engineering and he offered some advice about response times which had already been in place for at least 25 years before he wrote the book.“
- 0.1 second is about the limit for having a visitor feel as though the system is reacting instantaneously.
- 1.0 second is about the limit for a visitor’s flow of thought to stay uninterrupted, even though the visitor will notice the delay.
- 10 seconds is about the limit for keeping the visitor’s attention focused on the task they want to perform.
Motivation Continued
Business
- Pinterest rebuilt pages for performance, achieved a 40% reduction in perceived wait times. Saw 15% increase in both search engine traffic and sign-ups
- BBC found they lost an additional 10% of users for every additional second their site took to load
- DoubleClick found 53% of mobile site visits were abandoned if a page took longer than 3 seconds to load
- When AutoAnything reduced page load time by half, they saw a boost of 12-13% in sales
Technology
- We add features constantly
- There is new and faster hardware all the time
- The amount of data keeps increasing dramatically
- New business models pose new challenges
What is Performance?
Performance is a vage term
The word performance in computer performance means the same thing that performance means in other contexts, that is, it means "How well is the computer doing the work it is supposed to do?"
Arnold Allen
Computer performance is the amount of work accomplished by a computer system.
https://en.wikipedia.org/wiki/Computer_performance
Dimensions
What metrics and topics shape performance
Topics
- Volume
- Scaling
- Growth
- Concurrency
- Reliability
- High-Availability (HA)
- Disaster Recovery (DR)
Metrics
- The most important high level metrics
- Response Time: How fast is a single action
- Throughput: How many actions are possible per time unit
- Utilization: How many resources does it need and how are these used
- Availability: How usable is the system aka all the time, sometimes, occasionally
Typical Problems
Examples of performance that can go wrong
- Slower with growing data sets
- Exponential runtime for longer inputs
- Cart with 10 items is fine, with 50 never returns
- Hiccups every 30 min
- Performance drops after a release
- Outage every 12 h
- No performance gain with more CPUs
- Sudden performance change after 20 h
- Not able to sustain a sudden burst
- Repeating peak order traffic fails
- When restarting under load, the system does not come up
Failing Regex
Checking emails and data input
This is a simple matching expression for all strings consisting of a and b, with
each letter being optional and can come in combination of groups:
(a*b*)*
String |
Runtime |
aaab | 0ms
|
aaaaaab | 0ms |
aaaaaaaaab | 0ms |
aaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaaaaaaaaaaaaaab | 0ms |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaab | 0ms |
String |
Runtime |
aaax | 0ms
|
aaaaaax | 0ms |
aaaaaaaaax | 1ms |
aaaaaaaaaaaax | 2ms |
aaaaaaaaaaaaaaax | 11ms |
aaaaaaaaaaaaaaaaaax | 77ms |
aaaaaaaaaaaaaaaaaaaaax | 341ms |
aaaaaaaaaaaaaaaaaaaaaaaax | 643ms |
aaaaaaaaaaaaaaaaaaaaaaaaaaax | 5,145ms |
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaax | 42,008ms |
Failing Performance
Examples of performance failures.
Performance Testing
How do we test all that
Performance Testing
In software engineering, performance testing is in general, a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.
https://en.wikipedia.org/wiki/Software_performance_testing
WARNING - Performance and Testing
Performance is something you design into your product. Performance cannot be tested into it. If you haven't paid attention during design time, your testing will only state the obvious. Serious performance problem originate in design and not a few red hot code lines.
Performance testing is often seen as a lifesaver when things go wrong but it just reveals what is mostly known.
When you combine performance tests and tuning and you celebrate 25% improvement, you probably can improve by 500% when you change the design or architecture.
Performance Testing
What types of tests can we have?
Not all tests are the same
We will follow a different path than the industry
- Load Testing
- Stress Testing
- Soak Testing
- Spike Testing
- Breakpoint Testing
- Configuration Testing
- Isolation Testing
- Internet Testing
- Seriously, forget all that!
- Performance testing is about understanding the product, challenges, expectations, and the path to proving it
- If you wanna fancy that up with some name... fine with me
Area of the Test
What you can test
- Storefront
- Backoffice
- App
- API
- REST/Services
- Processes
- Isolated components
- A mix of all that
- Storefront and similar tests can be sub-divided into
- Server-side only (no rendering)
- Client-side, real rendering and user interaction measurement
- Tests can be manual as well such as storefront performance evaluations
Size of the Test
Size in the sense of what and how long should be tested
- A single topic, such as storefront search
- Combination, such as search and reindexing
- A full simulation of the system
- Short cycle, 15 min
- Day simulation, 24h
- Lifetime simulation
Step #0 - The Product
Understand the product first
- Understand your product
- Performance testing is highly technical
- Understand the technical pieces of the product
- Understand the interaction with components to isolate the problem later
- Map your interest to the product
- Understand the typical states of your system
- Try to get a real world view
- Artificial stuff is ok, but might just waste time too
- Ask yourself: If something never was designed for X, why should we test X, knowing that nobody ever considered that during design?!
Non-Computer Exercise
Test driving a car
Exercise 01
Intel just sent us new CPUs
Step #1 - Why
Know the problem or challenge first to understand it
- What is the challenge?
- What is the problem behind it?
- Do you know already enough?
- Has this happened before?
- Did we measure it already?
- What are the dimensions?
- Nothing is known
- There is a claim and we need prove
- Result is known but the why is open
- Something has changed and it has to be quantified
Exercise 02
Data Dimensions - Why
Step #2a - Requirements
We need target numbers or other things similar to that
- What are your goals?
- Anything you can attach numbers to?
- Or are you looking only for numbers?
- Anything you can compare to?
- Constrain your numbers aka you get this, when this is given
- Traffic
- Throughput
- Latency
- Resource utilization
- Response times
- Rendering performance
- Uptime
- SKU Count
- Customers
- Order Size
- Order Amount
- Attribute Count
Some number examples
Just examples of how number based requirements might look like
Storefront
- What
- Response times P99 under 1 s
- Regardless of cache state
- No runtimes larger 5 s
- No errors
- When
- 500k plain products
- No hardware constraints
- 5% conversion rate
Import Process
- Able to import 100k products in 5 min
- Initial import and update
- DB contains either 0, 50k, or 100k of the same products
- Side effect free
Order Export
- Able to export unexported 1k orders every 5 min
- Order up to 50 items with 10 custom attributes per item
- Registered and guest orders
- No side effect of order intake (rate 1k/5 min with burst of 5k/5min)
Step #2b - Limitations
A test is a test, state the limitations and limit yourself
Implicit
- Data
- Time
- Growth
- System wear and tear
- Fragmentation
- Randomness
- Unknown behavior
- External factors
Explicit
- Data
- Data Dependencies
- Ignored Data and Processes
- Customization
- User Scenarios
- Time
Percentiles to the rescue
Wait... 1 s response time... what is that?
- Average: Ok, but not precise
- You miss details
- Percentiles help, combine with average
- Preferable very high percentiles
- P99 at least
- Set an upper limit aka never over 10 s
A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value below which 20% of the observations may be found.
https://en.wikipedia.org/wiki/Percentile
- Imagine 100k visits, 1 million page views and 10k orders per hour
- P99 = 5 s says: 1k visits, 10k page views, and 100 orders are slower than that... unspecified by how much
Exercise 03
Data Dimensions - Requirements
Step #3 - Hypothesize
Come up with a theory what you expect to see and why
- What are the expectations?
- Why this result?
- What goes on technically?
- How can you prove the results or know that this is right?
- What can go wrong?
- Is there any uncertain thing?
- Are they any dimensions that influences our doing?
Exercise 04
Data Dimensions - Hypothesis
Step #4 - Plan
What do you need and what has to be done?
- Any tools available?
- How do we get the system setup?
- How can we repeat things?
- How long will it take?
- How valid are the results?
- What other factor can contribute?
- Test Concept (aka test plan)
- Test Approach
- Target numbers
- Put in all your requirements and limitations
Exercise 05
Data Dimensions - Plan
Step #5 - Select and Build
Build, install, configure, validate, dry-run it.
- Select a tool: XLT, JMeter, Blazemeter, Gatling...
- Select an approach: DOM, REST, Request-Response, Microbenchmark
- Build data tools
- Train yourself
- Test small scale
- Consult
Step #6 - Measure
Get to the numbers using what you have prepared
- Evaluate small scale
- Try to find a baseline if possible
- Measure large scale
- Verify
- Repeat
- Able to stop?
- Always measure at least 3 times
- Question your data right away
- When measuring two configurations: A and B, do A, B, A, B
- Cannot repeat the test? You didn't test anything
Important: No detours
Don't get distracted
- Don't start tuning when this is not your goal
- Don't tune when you don't need to
- Resist taking on different numbers or data schemes
- Don't change anything when not planned
- Resist shaping the test to achieve certain results... you can do this with a pen and some paper more easily
- If you have to change things, you might have to restart measurements
Important: Be Flexible
You cannot plan for everything
- Test fails? Check why
- Need to abort? Do it
- Does not make sense what you measure? Back to the drawing board
- Take a step back if needed
- Stack with a failure pattern? Investigate it
- You don't get to the tests planned? Help to solve that
Step #7 - Evaluate
Generate and check the data. Discuss.
- Expectations met?
- Consistent?
- Do you understand the results?
- Proved theory and when not, why?
- Able to sell the results?
- Derive new tests and actions
Step #8 - Iterate
Shall we play it again, Sam?
- Any doubt?
- Any oddities?
- Any identified open dimensions?
Server vs. Client-Side
Pick the right approach when measuring storefronts
Server-Side
- Load testing needs resources
- A browser is already more than one cpu and 500MB RAM
- That is not sustainable when simulating 1 million or more pages a second
- Server-side tests simplify the client side to focus on the server side
- No rendering, no JavaScript, no static content (depends on the test)
Client-Side
- Full user experience is more than server download
- Rendering is significant overhead
- JavaScript is an overhead
- You want to measure the real user experience
- Use a real browser but don't use it to load the environment
- Depending on the load, you can use real browsers for load testing
Golden Rules
Some things to keep in mind
- One change only
- Keep your goals reachable
- Don't betray yourself aka run with cache only
- Don't get distracted
- Capture enough data
- Don't jump to conclusion, understand it
Warnings
Performance testing is art and science at the same time. Just measuring and claiming
victory is not enough: "Wer misst, misst Mist." or just google Heisenberg's uncertainty principle.
- Always question the results
- Always have a theory first
- The test is a simulation, it is not reality
- If you don't understand the system, you can at least learn, but it does not necessarily become right what you measure or do
- What is valid today is often obsolete tomorrow
- If someone touches the code, it might render the results void
Questions and Answers
42 is a good start