René Schwietzke, Xceptance GmbH
I care and share
c b
This presentation is licensed under
Creative Commons Attribution 4.0 International License.
Third-party content could use a different license.
@ReneSchwietzke
@reneschwietzke@foojay.social
#qa #test #performancetest #loadtest #quality #java #performance #tuning
Things we won't talk about today
We can discuss these after the talk.
Our Daily Bread
Sliced in 600 Sec or Less
Whom we work for and with
100 to 150 load test projects annually.
How much do they know about performance testing?
Optionally to be garnished with: Houston, we have a problem!
Clarifying direction and basic testing concepts
People think perceived performance but mean client-side rendering while looking for server-side testing.
They: Many opinions, ideas, approaches, and goals.
We: Got too much data.
Make load testing efficient, comparable, and scalable.
Photo by Roger Wollstadt, Wolfsburg - Volkswagen Assembly Line, CC-BY-SA 2.0
Standardize naming and terminology
The measured performance should, under most load conditions, not imply that the system is under stress.
In English: A single user sees the same performance under heavy load as if they were the only user.
How Does Our Daily Work Look Like?
Performance testing is simple, isn't it?
500 ms
or less
321 ms
How to get from A to B? Standardize it!
P.S. SaaS is typically tested for 100% aka expectations, not capacity.
What we will ask for
Once again, very much simplified.
What we have to work with at the end
Visits / h | 100,000 |
Page Interactions / h | 1,000,000 |
Orders / h | 3,000 |
Conversion Rate* | 3% |
Bot Visits / h* | 1,000 |
Searches / h* | 7,000 |
Cart Size Distribution* | 1/10% 2/20% 3/50% 4/20% |
*) If we are lucky.
The Nonsense Metric That Sticks
Concurrent Users is a metric that lacks a time component as well as a definition what these users do.
Concurrent users is more of a result metric, because when you apply activities and time, you get a number that actually makes sense.
What we will be asked for at the end
*) Sometimes
What Do We Measure?
What data can we and will we capture
R,QuickView.1,1571927593069,112,false,1593,6096,200,https://host/842177173640.html?cgid=sales,text/html,0,0,111,0,111,111,,,,,0,,
R,QuickView.2,1571927593184,79,false,1639,592,200,https://host/Wishlist?productID=842177173640,application/json,0,0,79,0,79,79,,,,,0,,
A,QuickView,1571927593064,199,false
R,AddToCart.1,1571927597981,263,false,1727,3889,200,https://host/Cart-AddProduct?format=ajax,text/html,0,0,260,1,260,261,,,,,0,,
A,AddToCart,1571927597981,264,false
T,TOrder,1571927533453,100982,false,,
Simplified to almost HTTP only. *) Might be XLT only.
Moving average of the last 1%
Test time: 75 min
- Total: 348,798
- Mean: 576 ms
P50: 576 ms
- P95: 1,420 ms
- P99: 2,850 ms
- P99.9: 6,430 ms
Too much was removed before
Test time: 75 min
- Total: 348,798
- Mean: 576 ms
P50: 576 ms
- P95: 1,420 ms
- P99: 2,850 ms
- P99.9: 6,430 ms
Just collect
A standard load test result of a large US customer
Runtime | 3 hours |
User Scenarios | 17 |
Visits | 5,266,130 |
Page Interactions | 55,462,101 |
Total Requests | 122,185,828 |
Orders | 677,606 |
Errors | 70,491 |
Datacenters | 7 |
Load Generators | 50 / 800 Cores / 1.6 TB RAM |
Test Cases | 17 |
Transactions | 5,266,130 |
Actions | 55,925,554 |
Requests | 122,185,828 |
Events | 124,519 |
Custom | 5,232,721 |
Agent | 53,409 |
Data Lines | 189,751,960 |
How many points of data are captured?
For Transactions | 47,395,170 |
For Actions | 279,627,770 |
For Requests | 2,810,274,044 |
For Custom Data | 622,595 |
For Event Data | 26,163,605 |
For Agent Data | 1,228,407 |
Total | 3,165,311,591 |
Uncompressed Data | 48.72 GB |
Compressed Data | 4.10 GB |
Lines per Second | 17,569 |
Datapoints per Second | 293,084 |
Transaction
: Scenario execution Action
: Any kind of user interaction Request
: Obvious, isn't it
Let's Sell the Results
The Basic Idea
The Most Interesting Data
*) A P99 derails quickly, when you don't control the full stack.
Something Always Goes Wrong
Simplify the View on Long-Term Behavior
1 https://developers.google.com/web/fundamentals/performance/rail
ev = 1,522
rv = 739
tv = 1,120,892
Business Impact Value
= 0.20 %
Mapping the Business Impact Value
Apply a monetary value to enhance visibility
P.S. You can call it "fear factor".
The Final Summary
The good, the bad, and everything in between
Four lines, not more
Download: https://t.ly/JfFat
Never believe in any benchmark result unless you doctored it yourself!
René Schwietzke
Let's Do Lunch Break Q&A!
What are we looking for?
Once again, very much simplified.
Our Result and Communication Challenge
A data example
Test time: 1 h - Total: 14,645 - Mean: 174 ms - P95: 210 ms - P99: 250 ms - Max: 1,169 ms - P99.9: 480 ms
Is the average good enough?
Test time: 8 h - Total: 6,796 - Mean: 156 ms - P95: 730 ms - P99: 1,610 ms - Max: 6,592 ms - P99.9: 4,930 ms
Which PXX might be a good vehicle for the message?
Test Time: 3h 30 m - Total: 114,386 - Mean: 504 ms - P95: 550 ms - P99: 2,660 ms - Max: 6,169 ms - P99.9: 4,280 ms
What is our final communication data set?
Test time: 1 h - Total: 112,695 - Mean: 501 ms - P50: 250 ms - P95: 1,950 ms - P99: 5,170 ms - Max: 16,689 ms - P99.9: 7,830 ms
Test time: 3 h - Total: 3,145,233 - Mean: 669 ms - P50: 250 ms - P95: 2,760 ms - P99: 8,740 ms - Max: 35,945 ms - P99.9: 30,020 ms
Does it makes sense to collect 10x more data than needed?
Some More Technical Stuff
How we crunch the data