Performance Success Criteria

Effectively Communicate Performance Results

René Schwietzke, Xceptance GmbH

About Xceptance

Focused on Software Test and Quality Assurance
Performance Testing, Test Automation, Functional Testing, QA, and Test Process Consulting
Mostly active around e-commerce and web
Founded 2004
Headquarters in Jena, Germany
Subsidiary in Cambridge, MA, USA
Own Performance Test Tool XLT (), Java-based, Free, www.xceptance.com/xlt

About René Schwietzke

Co-Founder and Managing Directory Xceptance
Master of Computer Science (in German: Dipl.-Inf.)
Programmer since 1992
In QA and test since 1998
Performance Tester since 1999
@ReneFoobarJ
#java #qa #test #performance #performancetest #quality #automation

What Qualifies us?

Just Some "Proof" That We Know the Topic Well Enough

Performance testing since 2004
Own tooling (); deep behind the scenes knowledge, such as protocols and networking
Over 150 performance tests every year, not counting repetitions
World-wide customer base including APAC and South America
Web-based, full page load, or API based testing
Traffic ranges from 10 orders and 1,000 page views to 1.2 million orders and 50 million page views per hour

The Basic Challenges

Things that Make Performance Testing Complicated

The Perfect Project

It Could be so Simple

Requirement

42 or smaller

Result

37

Challenge - Audience

Who Needs Performance Testing? Who Consumes the Results?

Engineering (Eng): Eng wants to test their own stuff
Engineering: QA wants to test what Eng delivered
Engineering Management: Prove, validate, and demonstrate
Product Management: Needs prove that the delivered stuff works; hit by customer escalations
Sales: Needs numbers to sell better
Services: Who caused the customer escalation?
Implementation Partners: Do we play by the rules and deliver what is expected?
Merchants: Prove that the chosen platform scales or works similarly to the old one; test a prototype; prepare for sales events; verify new features...

Challenge - Requirements

The Test Reality

Desired Requirements

Visits/h - 100,000
Page Views/h - 1 million
Orders/h - 3,500
Runtime Average - 250 ms
Runtime P99.9 - 500 ms
Runtime Max - 3,000 ms
Errors - None
Customer Growth, Order Item Size, Performance Stable over 12 h

Typically Given

Visits/h - 103,181
Page Views/h - 1.716 million
Orders/h - 3,186
Runtimes - Fast Enough

Sometimes Given

No idea, you are the experts!

Challenge - Results

The Test Reality

Expected Result

Passed

Real Results

Visits/h - 103,000
Page Views/h - 1.12 million
Orders/h - 3,588
Runtime Average - 251 ms
Runtime P99.9 - 900 ms
Runtime Max - 16,870 ms
Errors - 17 order failures, 18 error codes 502, 125 response codes 404

Xceptance's Experience

Xceptance's Impression of and Experiences in the Field

Engineering: Starts precisely planned, loses focus quickly, cannot sell results to higher levels without causing uncertainness or unwanted discussions
Eng/QA: Don't talk the same language when talking about results
Eng Management: Asks technical questions about the testing over and over again because it does not understand the results or doubt them
Product Management: Too much data
Sales: Give us a number, one number only please.
Services: Was it our fault?
Implementation Partners: Does it work and is there anything to do?
Merchants: Will we succeed?

Problem Summary

The Requirement for Better Communication in a Nutshell

Except for some engineers, nobody really wants all details
The expectations reach from a set of small numbers over a single number to just yes or now
Results have to be explained again and again
A lot of input is needed for a "conclusive" verdict
Missing rules make things subjective
It takes too long to document and communicate
Inter-team communication is probably the hardest

A Rating System

Our Suggestion to Ease the Pain

Concept

The Basic Concept of the Rating System

Map the total success to school grades/marks
US-based (largest market) A to F and an A+ for overachieving
B symbolizes our average customer's performance
BUT: This is the total of three other criteria!
Criteria: Response Time, Errors, Predictability
The worst grade of these three rules it all

Criteria - Response Times

The Most Interesting Values Mapped to a Grade

Split into groups of common types
B is the average of our e-commerce target group
Worst grade determines the total grade
Data is based on years of measurements
Data is adjustable when customers have other ideas
Similar data can be setup for APIs or page loads
P95 is used (P99 is recommended)
... and P99.9 is our unachievable goal

Response Times - An Example

Criteria - Errors

Something Always Goes Wrong

Barely any test run is error free when reaching a certain complexity
Trying to convey the significance of errors with this criteria
Technical Errors: No response or response codes 500 or higher
Functional Errors: Application based errors such as validation failures
Number of visits affected and visible patterns of failure
Feature importance is used as a factor
Manual evaluation hence slightly subjective

Errors - An Example

Criteria - Predictability

Simplify the View on Long Time Behavior

Turn it into a business value
Strategic motivation, not scientifically proven

10 sec is chosen based on the user perception model published in Google RAIL Model¹
Applies to requests not page loads, hence it is weaker
Example: 100 * ((0 + 2) / 682) = 0.29 %

Goal: P95 of 5,000 ms
Mean: 1,764 ms
P50: 1,735 ms
P95: 2,540 ms
P99: 2,550 ms
P99.9: 29,940 ms

Predictability Mapped

Mapping the Business Impact Value

Use the BIV as input to the rating
Combines errors and response time
Add a view of response time patterns
Pattern view is slightly subjective again
Hopefully demonstrates the end-user's pain

Example of Response Time Patterns

Bonus Motivation

Apply a Monetary Value to Enhance Visibility

Reporting

The Final Summary

A management view
One line verdict for the lazy plus a grade that pops
A monetary value for the business audience
Traffic and response times for the engineers
A compromise to cater to all groups at once
A more detailed report is attached
Yes, this is oversimplification
Yes, the engineer in me is very sad

Summary

Problems, Challenges, and Results Summarized

Good

Easier result communication
Less time needed to explain things
Results are comparable, important for implementation partners and vendors

Bad

Some customers start to haggle over the grade
The engineer has less intel to work with
Potential error patterns might become invisible
Errors become an accepted fact

This is still not an absolute set of numbers that can be used to automatically evaluate tests and draw conclusions
One tends to work according to the books and apply less common sense
The subjective pieces are often a cause for discussion
BUT: Efficiency and communication culture improved. Achieved quicker turnarounds and created a common ground for all parties.

>Bonus Outcome

Some more experiences

Some customer are motivated to be better than the average, because the average is clearly stated

Risks

What is left out or unrated

Finally...

Download and Share Freely - https://bit.ly/2kGSEGe

Questions & Answers

Example 1

Mean: 156 ms P95: 730 ms P99: 1,610 ms P99.9: 4,930 ms Max: 6,592 ms Errors: 0 Total: 6,796

Example 2

Mean: 1,790 ms P95: 5,225 ms P99: 10,580 ms P99.9: 11,880 ms Max: 12,602 ms Errors: 0

Example 3

Mean: 467 ms P95: 690 ms P99: 2,140 ms P99.9: 3,730 ms Max: 30,033 ms Errors: 4/33

Example 4

Mean: 504 ms P95: 550 ms P99: 2,660 ms P99.9: 4,280 ms Max: 6,169 ms Errors: 0 Total: 114,386

Example 5

Mean: 2,033 ms P95: 2,330 ms P99: 3,020 ms P99.9: 10,830 ms Max: 21,744 ms Errors: 0 Total: 3,551

Example 6

Mean: 102 ms P95: 160 ms P99: 240 ms P99.9: 600 ms Max: 2,106 ms Errors: 0/114 Total: 44,784

Example 7

Mean: 174 ms P95: 210 ms P99: 250 ms P99.9: 480 ms Max: 1,169 ms Errors: 0 Total: 14,645

Example 8

Mean: 2,342 ms P95: 6,070 ms P99: 8,680 ms P99.9: 11,650 ms Max: 14,423 ms Errors: 1/0 Total: 7,042

Example 9

Mean: 356 ms P95: 490 ms P99: 560 ms P99.9: 3,340 ms Max: 15,604 ms Errors: 0 Total: 868,457

Questions And Answers

Your Questions and Feedback