Writing another
load test tool

What a stupid idea, wasn't it?

René Schwietzke, Xceptance GmbH

About René Schwietzke

Master of Computer Science (Dipl.-Inf.)
Programmer since 1989
Java since Java 1.0, 1996
QA and Testing since 1998
Performance Tester since 1999
Co-Founder of
@ReneSchwietzke
@reneschwietzke@foojay.social

#java #performance #tuning #qa #test #performancetest #loadtest

About

Founded 2004
Headquarters in Jena, Germany; Subsidiary in Cambridge, MA, USA
Specialized in Software Testing and Quality Assurance
Performance testing since 2004
Over 150 performance test projects every year
World-wide customer base including APAC and South America
Performance Test Tool , Java-based, APL 2.0

License

I care and share

c b a

This presentation is licensed under
Creative Commons Attribution-ShareAlike 4.0 International License.

A little bit of History

History Before the History

SilkPerformer

2000 to 2004

Work before Xceptance
Seque SilkPerformer
Microsoft Windows*
Scripting language Visual Basic-like
Needed a DB (SQL Server)

No real open data
Odd protocol for agents
No good default reports

500,000 Euro, 20k(?) users
Plus support at 15%-20% annually
Needed hardware, 600 blade machines

Early Xceptance Gig

The First Load Test

2006

Back to load testing

Our first large customer
Required extensive product load testing
Later also project load testing
Hosting fully on Linux
Cloud was not yet a thing

They had no money as a startup
We had not money as a startup
Loadrunner and Silkperformer, no option
Also no project license concept

Why not JMeter?

Wasn't there something open source?

JMeter was already a thing
Version 2.X or something
No yet an Apache project

Only request level recording
No scripting language
Cumbersome UI
Scaling was difficult
No debugging
No ready-to-use reports

Our own tool

Why we rolled a first version

A first version

The how and what of the first version

Needed a debuggable scripting language: Java
Didn't want to fiddle with dynamic forms: HtmlUnit
Wanted to query the DOM and not regexp it: HtmlUnit
HttpClient in the JDK was horrible: Apache HttpClient
Needed some charts: JFreeChart
Something familar for better reuse and IDE support: JUnit
Must fit version control systems (SVN was a thing)
We hated the naming in most tools (transaction)

Drawbacks

Not everything was working yet

No HTML reports
No scaling
Used multimachine parallel runs for scale


[java] ===================================== Total =======================================
[java] Timer List Size: 16
[java] RegisterUser: 356 (1924msec) within 00:14:58,096
[java] SimpleSearch: 45920 (174msec) within 00:15:45,088
[java] Storefront: 11283 (36msec) within 00:15:43,762
[java] Checkout.OrderSummary: 2642 (891msec) within 00:15:22,356
[java] ViewCart: 2642 (587msec) within 00:15:23,727
[java] ProductDetails: 25625 (59msec) within 00:15:34,437
[java] Checkout.Unregistered.ShippingMethod: 2642 (218msec) within 00:15:22,400
[java] AddToCart: 19378 (585msec) within 00:15:35,220
[java] Checkout: 2642 (211msec) within 00:15:23,579
[java] Checkout.Unregistered.Addresses: 2642 (303msec) within 00:15:22,644
[java] Checkout.Unregistered: 2642 (117msec) within 00:15:22,617
[java] Checkout.Unregistered.Payment: 2642 (682msec) within 00:15:22,545
[java] MyAccountPage.LoggedOff: 356 (46msec) within 00:14:57,560
[java] MyAccountPage.RegisterWithUs: 356 (43msec) within 00:14:56,522
[java] SelectVariationProductDetails: 35306 (226msec) within 00:15:34,566
[java] SimpleBrowsing: 860800 (67msec) within 00:15:33,816

[java] Total Requests: 1017874

First Charts

We can do stupid names too!

No abbreviation, no cool tool

YART - Yet Another Regression Test Tool

Charts and Data

A Thing We Wanted To Do Right

Charts from Others

It's a trap, Luke!

Chart Example

Trust me, they are lying!

Test time: 1 h - Total: 112,695
Mean: 501 ms - P50: 250 ms - P95: 1,950 ms - P99: 5,170 ms - Max: 16,689 ms - P99.9: 7,830 ms

The Mean Friend

The mean is not your friend

Test Time: 3h 30 m - Total: 114,386
Mean: 504 ms - P95: 550 ms - P99: 2,660 ms - Max: 6,169 ms - P99.9: 4,280 ms

Test Automation

One Stone, Two Birds, A Costly Mistake

One Script, Two Use Cases

The right idea, but the wrong web

Why not use the same scripts for test automation and load testing?
Build a nice UI to create scripts
Execute on the fly, don't transform to Java code

Why it failed

Failed for 2 and a Half Reasons

Browsers evolve too quickly
JavaScript in HtmlUnit differs from browser JS
No UI rendering, different outcome
Real browser load testing is expensive
XUL was deprecated by Mozilla
7 years of work down the drain

Second attempt using Eclipse/SWT failed
Gave devs too much room
Lost in UI ideas and API issues
Eclipse programming model is too complicated
Cost about 500,000 EUR

What came out of it

One good thing remained

Load testing with real browsers
Overcomes the JavaScript state mess
Scripting similar to test automation with WebDriver
You gotta pay with hardware cost aka scale
A Chrome eats easily 500 MB and 2 cores
Byproduct: You get Web.Vitals metrics!

Data, Lt. Cmdr.

Load Testing is About a Ton of Data

How Much Data

A standard load test result of a large US customer

Business Perspective

Runtime	3 hours
User Scenarios	17
Visits	5,266,130
Page Interactions	55,462,101
Total Requests	122,185,828
Orders	677,606
Errors	70,491
Datacenters	7
Load Generators	50 / 800 Cores / 1.6 TB RAM

Tool Perspective


Test Cases	17
Transactions	5,266,130
Actions	55,925,554
Requests	122,185,828
Events	124,519
Custom	5,232,721
Agent	53,409
Data Lines	189,751,960

How many data points?

How many points of data are captured?

For Transactions	47,395,170
For Actions	279,627,770
For Requests	2,810,274,044
For Custom Data	622,595
For Event Data	26,163,605
For Agent Data	1,228,407
Total	3,165,311,591

Uncompressed Data	48.72 GB
Compressed Data	4.10 GB
Lines per Second	17,569
Datapoints per Second	293,084

Transaction: Scenario execution Action: Any kind of user interaction Request: Obvious, isn't it

Open Data

Open data for custom analytics, modification, and reporting

CSV holds all measured data
XML for intermediate data

XSLT for transformation into HTML
CSS for styling


R,ProductDetailsPage.1,1666819841884,2759,false,1345,40942,200,
    https://acme.org/p/soap-126303030.html?cgid=foaming-hand-soap,text/html,0,0,2749,10,2749,2759,,GET,,,0,,
R,ProductDetailsPage.2,1666819844769,993,false,1858,429,200,
    https://acme.org/en_US/__Analytics-Start?url=https...,image/gif,0,0,992,0,992,992,,GET,,,0,,
R,ProductDetailsPage.3,1666819845762,940,false,1305,1259,200,
    https://acme.org/authiframe,text/html,0,0,940,0,940,940,,GET,,,0,,
R,ProductDetailsPage.4,1666819846703,1008,false,1350,2050,200,
    https://acme.org/en_US/Cart-MiniCartContent,text/html,0,0,1008,0,1008,1008,,GET,,,0,,
A,ProductDetailsPage,1666819841883,7968,false
T,TAddToCart,1666819778626,72846,false,,,,

Lifesaver Features

Things That Make us Stand Out
But Which Are All Based on Learnings

Isolation and Cleanup

What many get wrong in the first place

Fully isolated clients with no sharing by default
"You are sharing a session!" - Nope, never ever ;)
You have to break it intentionally when needed
"You are using the same session!" - No, we don't!
We are not even sharing the same connection to make it as real as possible.
Learning: Make it right in the first place and prevent stupidity.

Archiving and Sharing

Surprising use cases

All data is open
Results can be zipped up
Reports can be zipped up
Hosting everywhere possible
Recreate a report at any time anywhere

Surprising use case: BaFin
Working for a German neobroker
Need to archive things for good
Simple with open, tool independent files

Misc

A list of small but important things

Flight mode: Close the laptop in BOS and capture the result in FRA
Emergency break: Stop an agent if it goes nuts
Partial: Download data and get a report at any time
Partial: Get data despite dead agents
Anywhere: You can download from another machine
Fence it: Setup filters to avoid wandering around
Custom: Log custom events, timers, and data
Java: Use any Java feature you like just don't mess with threading and keep overhead in mind

Result Browser

Don't give me excuses, give me details!

For debugging or on failure
Communicates all details
Keeps history to see how one got to the failure point
Able to restrict itself in volume when the same problems repeats too often
Can be triggered "manually" too

Pseudo Randomness

When randomness is predictable

Need: Tests are too static, randomize
Challenge: Hard to reproduce failures, because you don't know the path
Solution: Pseudo-random generators, keep the seed and you can replay it later
Limitation: The environment should behave (nearly) the same to make it work.
Important: Ensure that the seed is random to avoid the same numbers again.
Learning: We got burned badly by incorrect seeds.

Profiles and Arrival Rate

Concurrent users?

User rate
Arrival rate
Flexible over runtime
Learned that from a customer

Test Data

Avoid Hard Coding Data

Take the data from the pages
Categories, SKUs, URLs, forms...
Customer can update data freely
Data can go offline during testing
Form changes get noticed
Exploratory load testing!
Doesn't apply to all testing


public class ViewCart extends PageAction<ViewCart>
{
    @Override
    protected void doExecute() throws Exception
    {
        // Get mini cart link.
        final HtmlElement cartLink =
            GeneralPages.instance.miniCart.
            getViewCartLink().asserted().single();

        // Click it.
        loadPageByClick(cartLink);
    }

    @Override
    protected void postValidate() throws Exception
    {
        // this was a page load, so validate
        // what is important
        Validator.validatePageSource();

        // basic checks for the cart
        CartPage.instance.validate();
    }
}

Merging and Splitting

Testing is Dynamic

Merging, Splitting, Filtering

Infinite possibilities and regexp of course

Massage the data later
By time
By datacenter or agent
By test case
By response code
By URL (parts)
By response time
By content type
By name

Out of tool magic
Merge load tests
Manually split tests
Search data
Obfuscate

Every feature was built, because we needed that at some point or sometimes we were ahead of the curve.

Java

What? Where is the Java relationship?

Java? Java!

Where is Java?

The entire stack is Java
We profit from the ecosystem
Learned a lot about performance
Learned a lot about scalability
Learned a lot about memory and GC

We went from JDK 1.3 to 17
We support ARM and x86
Learned a lot about cloud machines

Looking forward to Leyden
Looking forward to Valhalla

Ecosystem

What we use

Apache-Commons DNSJava Freemarker Hessian WebDriver Selenium JFreeChart Java-HLL HtmlUnit HttpClient OkHttp Jetty Log4J SLF4J Xalan Xerces WebP-ImageIO XStream DSIUtils JSON JUnit Progressbar and more

Wait...

There is more! JVM rocks!

Java Lessons Learned

Things That Were Messy

1.3 to 17

<= JDK 8

Tuned the hell out of CMS
Ran many smaller VMs per hardware box
Compensate for GC pauses and measurement impact

JDK 11

Let the VM grab 70% of memory
One VM only, G1 (enough reserved space)
Compensates for GC pauses and measurement impact
Module concept made migration harder

JDK 17

Playing with ZGC and Shenandoah
Socket impl changes impacted us
External libs and deprecations
Hunted a C2 compiler issue, never got it resolved, JDK 21 took care of it

Mac ARMs

Suddently realized how many native code some dependencies bring

Best Finds

Success due to Functionality

CDNs made the world turn differently

Errors became a thing

When adding the CDN to our large customer, errors became a thing
500, 502, 503 out of nowhere for normal load
Before that, we often went completely error free
Call with CDN provider: 1% errors are quite normal
Nothing in life is free, CDN caching and protection costs
Feature: Full error detail logging

Akamai Customer Setup

Performance was turning south quickly

"You don't test right."
"No, you don't test right"
"You use the same session."
"More IPs!"
"More locations."
"Give us requests and custom headers."
"You must do EDNS."
"You must not cache the DNS entry."

Got them all of that
It was a prefetch setting
It killed the origin because of the logic to load more than needed
Learned that many load testing seems to be garbage; one is not trusted

Cloudflare Kernel Issue

Occationally requests got lost

Request was sent but was never answered
"Can't be! You test wrong. Proof it."
"Give us details."
"We see it in the log, must be origin."
Origin never got the request

Cloudflare discovered a kernel issue when handling I/O against the disk cache
Feature: Insert customer UUIDs into requests and user agents, only agent names are always logged
Feature: Logging all details by default, able to log even non-failures fully, run tests for up to 7 days

Cost, License, Benefits

Some business things

Open Source

Try to increase the reach

No sales or marketing department
Reach was kind of limited
Decided to go open source for reach
Picked APL because of the stack

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Going open source did not change much
No external contributions
Rather people dropping support requests and screenshots of exceptions
But it gave us a good feel and helped in some project sales
Still requires marketing!

Cost and Benefits

The final bill

Started 2007 to evolve into a product
17 years of development
Sometimes three devs, sometimes none
More than 1.25 Million Euros, for sure
No costs for external tools though
Able to run unlimited tests at any size concurrently
Quick turn around in projects
Debugging help and features done in no time

Any Future?

So Much Competition, so Much More to Do

What is next?

Features, features, features

JDK 21 and virtual threads
HTTP/3 support
HTTP/2 as default
OpenTelemetry support
Realtime errors and metrics

Auto-Rating/Scorecards
JMeter replay
Even faster report generation
Maybe live data querying

Built it into an SaaS Offer

We could do things we couldn't have done otherwise.

We never broke even.

It was worth the money.

We are damn proud of the tool we have built.

Now, there are too many tools on the market, hence doing the same in 2024 may not make sense anymore. Or does it?

The JVM is still the right choice when you want to do more than just firing requests.

Resources

Just pointers more information

Writing anotherload test tool