Performance Success Criteria

Effectively Communicate Performance Results

René Schwietzke, Xceptance GmbH

About Xceptance

  • Focused on Software Test and Quality Assurance
  • Performance Testing, Test Automation, Functional Testing, QA, and Test Process Consulting
  • Mostly active around e-commerce and web
  • Founded 2004
  • Headquarters in Jena, Germany
  • Subsidiary in Cambridge, MA, USA
  • Own Performance Test Tool XLT (), Java-based, Free, www.xceptance.com/xlt

About René Schwietzke

  • Co-Founder and Managing Directory Xceptance
  • Master of Computer Science (in German: Dipl.-Inf.)
  • Programmer since 1992
  • In QA and test since 1998
  • Performance Tester since 1999
  • @ReneFoobarJ
  • #java #qa #test #performance #performancetest #quality #automation

What Qualifies us?

Just Some "Prove" That We Know the Topic Well Enough

  • Performance testing since 2004
  • Own tooling (); deep behind the scenes knowledge, such as protocols and networking
  • Over 150 performance tests every year, not counting repetitions
  • World-wide customer base including APAC and South America
  • Web-based, full page load, or API based testing
  • Traffic ranges from 10 orders and 1,000 page views to 1.2 million orders and 50 million page views per hour

The Basic Challenges

Things that Make Performance Testing Complicated

The Perfect Project

It Could be so Simple

Requirement

42 or smaller

Result

37

Challenge - Audience

Who Needs Performance Testing? Who Consumes the Results?

  • Engineering (Eng): Eng wants to test their own stuff
  • Engineering: QA wants to test what Eng delivered
  • Engineering Management: Prove, validate, and demonstrate
  • Product Management: Needs prove that the delivered stuff works; hit by customer escalations
  • Sales: Needs numbers to sell better
  • Services: Who caused the customer escalation?
  • Implementation Partners: Do we play by the rules and deliver what is expected?
  • Merchants: Prove that the chosen platform scales or works similarly to the old one; test a prototype; prepare for sales events; verify new features...

Challenge - Requirements

The Test Reality

Desired Requirements

  • Visits/h - 100,000
  • Page Views/h - 1 million
  • Orders/h - 3,500
  • Runtime Average - 250 ms
  • Runtime P99.9 - 500 ms
  • Runtime Max - 3,000 ms
  • Errors - None
  • Customer Growth, Order Item Size, Performance Stable over 12 h

Typically Given

  • Visits/h - 103,181
  • Page Views/h - 1.716 million
  • Orders/h - 3,186
  • Runtimes - Fast Enough

Sometimes Given

  • No idea, you are the experts!

Challenge - Results

The Test Reality

Expected Result

  • Passed

Real Results

  • Visits/h - 103,000
  • Page Views/h - 1.12 million
  • Orders/h - 3,588
  • Runtime Average - 251 ms
  • Runtime P99.9 - 900 ms
  • Runtime Max - 16,870 ms
  • Errors - 17 order failures, 18 error codes 502, 125 response codes 404

Xceptance's Experience

Xceptance's Impression of and Experiences in the Field

  • Engineering: Starts precisely planned, loses focus quickly, cannot sell results to higher levels without causing uncertainness or unwanted discussions
  • Eng/QA: Don't talk the same language when talking about results
  • Eng Management: Asks technical questions about the testing over and over again because it does not understand the results or doubt them
  • Product Management: Too much data
  • Sales: Give us a number, one number only please.
  • Services: Was it our fault?
  • Implementation Partners: Does it work and is there anything to do?
  • Merchants: Will we succeed?

Problem Summary

The Requirement for Better Communication in a Nutshell

  • Except for some engineers, nobody really wants all details
  • The expectations reach from a set of small numbers over a single number to just yes or now
  • Results have to be explained again and again
  • A lot of input is needed for a "conclusive" verdict
  • Missing rules make things subjective
  • It takes too long to document and communicate
  • Inter-team communication is probably the hardest

A Rating System

Our Suggestion to Ease the Pain

Concept

The Basic Concept of the Rating System

  • Map the total success to school grades/marks
  • US-based (largest market) A to F and an A+ for overachieving
  • B symbolizes our average customer's performance
  • BUT: This is the total of three other criteria!
  • Criteria: Response Time, Errors, Predictability
  • The worst grade of these three rules it all

Criteria - Response Times

The Most Interesting Values Mapped to a Grade

  • Split into groups of common types
  • B is the average of our e-commerce target group
  • Worst grade determines the total grade
  • Data is based on years of measurements
  • Data is adjustable when customers have other ideas
  • Similar data can be setup for APIs or page loads
  • P95 is used (P99 is recommended)
  • ... and P99.9 is our unachievable goal

Response Times - An Example

Criteria - Errors

Something Always Goes Wrong

  • Barely any test run is error free when reaching a certain complexity
  • Trying to convey the significance of errors with this criteria
  • Technical Errors: No response or response codes 500 or higher
  • Functional Errors: Application based errors such as validation failures
  • Number of visits affected and visible patterns of failure
  • Feature importance is used as a factor
  • Manual evaluation hence slightly subjective

Errors - An Example

Criteria - Predictability

Simplify the View on Long Time Behavior

  • Turn it into a business value
  • Strategic motivation, not scientifically proven

  • 10 sec is chosen based on the user perception model published in Google RAIL Model1
  • Applies to requests not page loads, hence it is weaker
  • Example: 100 * ((0 + 2) / 682) = 0.29 %
  • Goal: P95 of 5,000 ms
  • Mean: 1,764 ms
  • P50: 1,735 ms
  • P95: 2,540 ms
  • P99: 2,550 ms
  • P99.9: 29,940 ms

Predictability Mapped

Mapping the Business Impact Value

  • Use the BIV as input to the rating
  • Combines errors and response time
  • Add a view of response time patterns
  • Pattern view is slightly subjective again
  • Hopefully demonstrates the end-user's pain

Example of Response Time Patterns

Bonus Motivation

Apply a Monetary Value to Enhance Visibility

Reporting

The Final Summary

  • A management view
  • One line verdict for the lazy plus a grade that pops
  • A monetary value for the business audience
  • Traffic and response times for the engineers
  • A compromise to cater to all groups at once
  • A more detailed report is attached
  • Yes, this is oversimplification
  • Yes, the engineer in me is very sad

Summary

Problems, Challenges, and Results Summarized

Good

  • Easier result communication
  • Less time needed to explain things
  • Results are comparable, important for implementation partners and vendors

Bad

  • Some customers start to haggle over the grade
  • The engineer has less intel to work with
  • Potential error patterns might become invisible
  • Errors become an accepted fact
  • This is still not an absolute set of numbers that can be used to automatically evaluate tests and draw conclusions
  • One tends to work according to the books and apply less common sense
  • The subjective pieces are often a cause for discussion
  • BUT: Efficiency and communication culture improved. Achieved quicker turnarounds and created a common ground for all parties.

>Bonus Outcome

Some more experiences

  • Some customer are motivated to be better than the average, because the average is clearly stated

Risks

What is left out or unrated

Finally...

Download and Share Freely - https://bit.ly/2kGSEGe





Questions & Answers

Example 1

Mean: 156 ms P95: 730 ms P99: 1,610 ms P99.9: 4,930 ms Max: 6,592 ms Errors: 0 Total: 6,796

Example 2

Mean: 1,790 ms P95: 5,225 ms P99: 10,580 ms P99.9: 11,880 ms Max: 12,602 ms Errors: 0

Example 3

Mean: 467 ms P95: 690 ms P99: 2,140 ms P99.9: 3,730 ms Max: 30,033 ms Errors: 4/33

Example 4

Mean: 504 ms P95: 550 ms P99: 2,660 ms P99.9: 4,280 ms Max: 6,169 ms Errors: 0 Total: 114,386

Example 5

Mean: 2,033 ms P95: 2,330 ms P99: 3,020 ms P99.9: 10,830 ms Max: 21,744 ms Errors: 0 Total: 3,551

Example 6

Mean: 102 ms P95: 160 ms P99: 240 ms P99.9: 600 ms Max: 2,106 ms Errors: 0/114 Total: 44,784

Example 7

Mean: 174 ms P95: 210 ms P99: 250 ms P99.9: 480 ms Max: 1,169 ms Errors: 0 Total: 14,645

Example 8

Mean: 2,342 ms P95: 6,070 ms P99: 8,680 ms P99.9: 11,650 ms Max: 14,423 ms Errors: 1/0 Total: 7,042

Example 9

Mean: 356 ms P95: 490 ms P99: 560 ms P99.9: 3,340 ms Max: 15,604 ms Errors: 0 Total: 868,457

Questions And Answers

Your Questions and Feedback