About Xceptance
- Focused on Software Test and Quality Assurance
- Performance Testing, Test Automation, Functional Testing, QA, and Test Process Consulting
- Mostly active around e-commerce and web
- Founded 2004
- Headquarters in Jena, Germany
- Subsidiary in Cambridge, MA, USA
- Own Performance Test Tool XLT (), Java-based, Free, www.xceptance.com/xlt
About René Schwietzke
- Co-Founder and Managing Directory Xceptance
- Master of Computer Science (in German: Dipl.-Inf.)
- Programmer since 1992
- In QA and test since 1998
- Performance Tester since 1999
-
@ReneFoobarJ
#java #qa #test #performance #performancetest #quality #automation
What Qualifies us?
Just some "Proof" That We Know the Topic Well Enough
- Performance testing since 2004
- Own tooling (); deep behind the scenes knowledge, such as protocols and networking
- Over 150 performance tests every year, not counting repetitions
- World-wide customer base including APAC and South America
- Web-based, full page load, or API based testing
- Traffic ranging from 10 orders and 1,000 page views to 500,000 orders and 20 million page views per hour
The Basic Challenges
Things that Make Performance Testing Complicated
The Perfect Project
It Could be so Simple
Requirement
42
or smaller
Challenge - Audience
Who Needs Performance Testing? Who Consumes the Results?
- Engineering (Eng): Eng wants to test their own stuff
- Engineering: QA wants to test what Eng delivered
- Engineering Management: Prove, validate, and demonstrate
- Product Management: Needs prove that the delivered stuff works; hit by customer escalations
- Sales: Needs numbers to sell better
- Services: Who caused the customer escalation?
- Implementation Partners: Do we play by the rules and deliver what is expected?
- Merchants: Prove that the chosen platform scales or works similarly to the old one; test a prototype; prepare for sales events; verify new features...
Challenge - Requirements
The Test Reality
Desired Requirements
- Visits/h - 100,000
- Page Views/h - 1 million
- Orders/h - 3,500
- Runtime Average - 250 ms
- Runtime P99.9 - 500 ms
- Runtime Max - 3,000 ms
- Errors - None
- Customer Growth, Order Item Size, Performance Stable over 12 h
Typically Given
- Visits/h - 103,181
- Page Views/h - 1.716 million
- Orders/h - 3,186
- Runtimes - Fast Enough
Sometimes Given
- No Idea, you are the experts!
Challenge - Results
The Test Reality
Real Results
- Visits/h - 103,000
- Page Views/h - 1.12 million
- Orders/h - 3,588
- Runtime Average - 251 ms
- Runtime P99.9 - 900 ms
- Runtime Max - 16,870 ms
- Errors - 17 order failures, 18 error codes 502, 125 response codes 404
Xceptance's Experience
Xceptance's Impression of and Experiences in the Field
- Engineering: Starts precisely planned, loses focus quickly, cannot sell results to higher levels without causing uncertainty or unwanted discussions
- Eng/QA: Don't talk the same language when talking about results
- Eng Management: Asks technical questions about the testing over and over again because it does not understand the results or doubt them
- Product Management: Too much data
- Sales: Give us a number, one number only please.
- Services: Was it our fault?
- Implementation Partners: Does it work and is there anything to do?
- Merchants: Will we succeed?
Problem Summary
The Requirement for Better Communication in a Nutshell
- Except for some engineers, nobody really wants all details
- The expectations reach from a set of small numbers over a single number to just yes or now
- Results have to be explained again and again
- A lot of input is needed for a "conclusive" verdict
- Missing rules make things subjective
- It takes too long to document and communicate
- Inter-team communication is probably the hardest
A Rating System
Our Suggestion to Ease the Pain
Concept
The Basic Concept of the Rating System
- Map the total success to school grades/marks
- US-based (largest market) A to F and an A+ for overachieving
- B symbolizes our average customer's performance
- BUT: This is the total of three other criteria!
- Criteria: Response Time, Errors, Predictability
- The worst grade of these three rules it all
Criteria - Response Times
The Most Interesting Values Mapped to a Grade
- Split into groups of common types
- B is the average of our e-commerce target group
- Worst grade determines the total grade
- Data is based on years of measurements
- Data is adjustable when customers have other ideas
- Similar data can be setup for APIs or page loads
- P95 is used (P99 is recommended)
- ... and P99.9 is our unachievable goal
Criteria - Errors
Something Always Goes Wrong
- Barely any test run is error free when reaching a certain complexity
- Trying to convey the significance of errors with this criteria
- Technical Errors: No response or response codes 500 or higher
- Functional Errors: Application based errors such as validations failed that check the expected behavior
- Number of visits affected and visible patterns of failure
- Feature importance is used as a factor
- Manual evaluation hence slightly subjective
Criteria - Predictability
Simplify the View on Long Time Behavior
- Turn it into a business value
- Strategic motivation, not scientifically proven
- 10 sec is chosen based on the user perception model published in Google RAIL Model1
- Applies to requests not page loads, hence it is weaker
- Example: 100 * ((0 + 2) / 682) = 0.29 %
- Goal: P95 of 5,000 ms
- Mean: 1,764 ms
- P50: 1,735 ms
- P95: 2,540 ms
- P99: 2,550 ms
- P99.9: 29,940 ms
Predictability Mapped
Mapping the Business Impact Value
- Use the BIV as input to the rating
- Add a view of response time patterns
- Pattern view is slightly subjective again
- Should draw attention to an error and response time perspective
- Hopefully demonstrates the end-user's pain
Bonus Motivation
Apply a Monetary Value to Enhance Visibility
Example
The Final Summary
- A management view
- One line verdict for the lazy plus a grade that pops
- A monetary value for the business audience
- Traffic and response times for the engineers
- A compromise to cater to all groups at once
- A more detailed report is attached
- Yes, this is oversimplification
- Yes, the engineer in me is very sad
Summary
Problems, Challenges, and Results Summarized
Good
- Easier result communication
- Less time needed to explain things
- Results are comparable, important for implementation partners and vendors
Bad
- Some customers start to haggle over the grade
- The engineer has less intel to work with
- Potential error patterns might become invisible
- Errors become an accepted fact
- Still not an absolute set of numbers that can be used to automatically evaluate test and draw conclusions
- One tends to work according to the books and apply less common sense
- The subjective pieces are often a cause for discussion
- BUT: Efficiency and communication culture improved. Achieved quicker turnarounds and created a common ground for all parties.
Finally...
Download and Share Freely - https://bit.ly/2kGSEGe
Example 1
Mean: 156 ms
P95: 730 ms
P99: 1,610 ms
P99.9: 4,930 ms
Max: 6,592 ms
Errors: 0
Total: 6,796
Example 2
Mean: 1,790 ms
P95: 5,225 ms
P99: 10,580 ms
P99.9: 11,880 ms
Max: 12,602 ms
Errors: 0
Example 3
Mean: 467 ms
P95: 690 ms
P99: 2,140 ms
P99.9: 3,730 ms
Max: 30,033 ms
Errors: 4/33
Example 4
Mean: 504 ms
P95: 550 ms
P99: 2,660 ms
P99.9: 4,280 ms
Max: 6,169 ms
Errors: 0
Total: 114,386
Example 5
Mean: 2,033 ms
P95: 2,330 ms
P99: 3,020 ms
P99.9: 10,830 ms
Max: 21,744 ms
Errors: 0
Total: 3,551
Example 6
Mean: 102 ms
P95: 160 ms
P99: 240 ms
P99.9: 600 ms
Max: 2,106 ms
Errors: 0/114
Total: 44,784
Example 7
Mean: 174 ms
P95: 210 ms
P99: 250 ms
P99.9: 480 ms
Max: 1,169 ms
Errors: 0
Total: 14,645
Example 8
Mean: 2,342 ms
P95: 6,070 ms
P99: 8,680 ms
P99.9: 11,650 ms
Max: 14,423 ms
Errors: 1/0
Total: 7,042
Example 9
Mean: 356 ms
P95: 490 ms
P99: 560 ms
P99.9: 3,340 ms
Max: 15,604 ms
Errors: 0
Total: 868,457