Workshop - Microbenchmarking

The Art of Realizing One is Wrong - Exercises

Always use StringBuilder

Put the myth to the test

Requirements

  • Compare String operation performance
  • Free to pick data amount and shape
  • Have a good story for your decisions

public class D01
{
    ...

	@Benchmark
	public String classic()	{
		// classic string magic here, such as a + b
		return null;
	}

	@Benchmark
	public String concat() {
		// use a.concat(b)
		return null;
	}

	@Benchmark
	public String builder()	{
		// use a stringbuilder a.append(b)
		return null;
	}

	@Benchmark
	public String sizedBuilder() {
		// use a presized stringbuilder
		// new StringBuilder(expectedSize)
		// a.append(b)
		return null;
	}
}

Split a String

Just write a benchmark

Requirements

  • Compare four ways to split a string by a string
  • Free to pick data amount and shape
  • Have a good story for your decisions

public class D05
{
    @Setup
	public void setup()
	{
		// Berlin;12.4
		// Rom;25.9
		// Bergen;-5.5
	}

	@Benchmark
	public void split(final Blackhole b)
	{
		// String::split()
	}

	@Benchmark
	public void indexOf(final Blackhole b)
	{
		// String::indexOf
	}

	@Benchmark
	public void tokenizer(final Blackhole b)
	{
		// use StringTokenizer
	}

	@Benchmark
	public void yourOwn(final Blackhole b)
	{
		// an optional different idea
	}
}

Parse Numbers

A little bit of #1brc fun

Requirements

  • Let's review together
  • These are the simple tuning versions

Benchmark                                         Mode  Cnt    Score   Error      Units
classic                                           avgt    5   29.378 ± 0.282      ns/op
classic:IPC                                       avgt         3.674          insns/clk
classic:L1-dcache-load-misses                     avgt         1.578               #/op
classic:L1-dcache-loads                           avgt        69.274               #/op
classic:branch-misses                             avgt         0.493               #/op
classic:branches                                  avgt        70.527               #/op
classic:cycles                                    avgt        74.957               #/op
classic:instructions                              avgt       275.363               #/op
classic:stalled-cycles-frontend                   avgt         4.690               #/op

ownParseDouble                                    avgt    5    9.260 ± 0.155      ns/op
ownParseDouble:IPC                                avgt         4.174          insns/clk
ownParseDouble:L1-dcache-load-misses              avgt         2.288               #/op
ownParseDouble:L1-dcache-loads                    avgt        12.608               #/op
ownParseDouble:branch-misses                      avgt         0.006               #/op
ownParseDouble:branches                           avgt        18.181               #/op
ownParseDouble:cycles                             avgt        23.588               #/op
ownParseDouble:instructions                       avgt        98.465               #/op
ownParseDouble:stalled-cycles-frontend            avgt         0.328               #/op

parseToIntFromByte                                avgt    5    6.372 ± 0.216      ns/op
parseToIntFromByte:IPC                            avgt         4.805          insns/clk
parseToIntFromByte:L1-dcache-load-misses          avgt         1.018               #/op
parseToIntFromByte:L1-dcache-loads                avgt        10.084               #/op
parseToIntFromByte:branch-misses                  avgt         0.009               #/op
parseToIntFromByte:branches                       avgt        15.778               #/op
parseToIntFromByte:cycles                         avgt        16.394               #/op
parseToIntFromByte:instructions                   avgt        78.778               #/op
parseToIntFromByte:stalled-cycles-frontend        avgt         0.146               #/op

parseToIntFromByteFixed3                          avgt    5    2.767 ± 0.009      ns/op
parseToIntFromByteFixed3:IPC                      avgt         4.074          insns/clk
parseToIntFromByteFixed3:L1-dcache-load-misses    avgt         1.023               #/op
parseToIntFromByteFixed3:L1-dcache-loads          avgt         5.461               #/op
parseToIntFromByteFixed3:branch-misses            avgt         0.001               #/op
parseToIntFromByteFixed3:branches                 avgt         6.447               #/op
parseToIntFromByteFixed3:cycles                   avgt         7.040               #/op
parseToIntFromByteFixed3:instructions             avgt        28.683               #/op
parseToIntFromByteFixed3:stalled-cycles-frontend  avgt         0.050               #/op

parseToIntFromString                              avgt    5    8.439 ± 0.065      ns/op
parseToIntFromString:IPC                          avgt         4.187          insns/clk
parseToIntFromString:L1-dcache-load-misses        avgt         2.297               #/op
parseToIntFromString:L1-dcache-loads              avgt        12.617               #/op
parseToIntFromString:branch-misses                avgt         0.004               #/op
parseToIntFromString:branches                     avgt        17.272               #/op
parseToIntFromString:cycles                       avgt        21.475               #/op
parseToIntFromString:instructions                 avgt        89.904               #/op
parseToIntFromString:stalled-cycles-frontend      avgt         0.257               #/op

Approaching Microbenchmarks

Let's Tackle the Problem

Got a Problem?

What is your motivation?

Real Problem

  • Production problem or oddity
  • You have an idea or suspicion

Made Up Problem

  • Colleague told you
  • Read it on somewhere
  • Told to make it scale
  • Told to cut cost in half
  • Production problem has been profiled, hopefully
  • You have to benchmark before you microbenchmark
  • When an arbitrary goal is given, you have to profile first
  • You might have to instrument first

Theory

No idea? No benchmark!

  • Develop a sound theory why something is slow or hungry
  • You might need a benchmark first!
  • Consider these five areas:
    • Code aka algorithms
    • Memory aka usage (allocation, freeing, access)
    • Cache aka locality of memory usage
    • I/O aka external dependencies
    • Scale aka synchronisation and CPU usage
  • Think about interactions of your problem component with the world

Test Case

Build your code

  • Isolate your problem
  • Avoid rewriting the code, try to use the API
  • Don't remove code only because it is not important for the benchmark
  • Isolate data preparation from code execution
  • Standard JMH prepares data only once, difficult when mutating data
  • Concurrency tests are hard to write, see how your production runs it
  • Don't apply your ideas or theory to the test case

Data

Shape the future outcome

  • Shape your test data according to your production data
  • No production data, because the idea is made up? Think in dimensions:
    • Amount - How many strings?
    • Size - How long is a string?
    • Structure - Where is the challenge in the string?
    • Layout - Holders and copies
    • Variance - How many different sizes, structures, and layouts and how? At once? Isolated?
  • You data might create or be the problem you are looking for
  • Think unit tests and the data you might need to cover your code properly

Execution Profile

Wrong conclusions can be drawn in seconds with this trick

  • How often is your code executed?
    • Only at startup (how often)
    • Occasionally
    • Always
    • On error or on a rare path
  • Where is your code used?
    • Inlining
    • Branch prediction and out of order

Unit Testing

Functional correctness before performance

  • Your first duty is to correctness!
  • Try to reuse your regular tests
  • Validate every change to avoid false hopes
  • Don't unit-test memory, cache or other performance things
  • Ensure proper coverage, because performance tuning often removes edge-cases
  • Try to use the same data for testing and benchmarking

Question the Results

Never believe in your first results

  • Is this your expected outcome? Question it.
  • Is this not what you expected? Question it.
  • You have not had any expectations? Well... we got a problem.
  • Start validating, what you got, by experimenting around
  • Vary the hell out of code and data before your start tuning!
  • Think of edge-cases

Verification by Variation

Vary a little to feel confident

  • Always keep the original state running
  • Vary a copy of original
  • Vary the tuned version
  • Vary the code
    • Just switch lines
    • Reuse variables or don't
    • Flip branches and conditions
    • Flatten out code or compact it
  • Vary the data
    • Sort or unsort it
    • Make it longer, bigger, and smaller
  • Vary the environment
    • Try another JDK
    • Try another machine
    • Play with the GC

Verification by Tooling

Use the power of JMH

  • GC -prof gc
  • Low-level stats -prof perf/perfnorm
  • Generated code -prof perfasm
  • JIT Compiler -prof comp
  • Classloader -prof cl
  • JFR -prof jfr
  • Async Profiler -prof async
  • And more!

Verification by Review

Ask someone but ChatGPT

  • Ask a colleague
  • Don't just send a PR
  • Explain your thought process
  • No need to ask a benchmarking expert
  • Compare with JDK benchmarks and JMH examples
  • Ask the mighty Internet politely

Verfication by Knowledge

Remember our experiments and more

  • Care about the platform and environment
  • Remember our CPU, cache, and memory examples
  • Keep the Java core concepts in mind (JIT, JMM)
  • Be prepared to start from scratch
  • Always benchmark the outcome in reality
  • NEVER apply your benchmark outcome as a general rule in the future!

Microbenchmark when...

A Summary of Things

Conclusion

Microbenchmarking is...

Good

  • For isolating a problem
  • Investigating an isolated problem
  • Tuning an isolated problem
  • Extremly detailed tuning
  • Understanding the inner-workings
  • Squeezing the lemon a little more

Bad

  • Drawing general conclusions
  • Black and white comparisons
  • Expecting large scale impacts
  • The impatient developer

The Conclusion rephrased

When and when not to do microbenchmarks

  • Yes You are an JDK contributor
  • Yes You are an open source contributor
  • Yes Your problem can be isolated...
  • Yes ...or you isolate it for easier tuning
  • Yes You want to debunk your colleague's claims
  • Yes You are just curious
  • No You want to write a Medium post about which method is faster
  • No You want to tune production without a need
  • No You swipped left on all previous slides

One last thing

The JMH warning

REMEMBER: The numbers are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.

Do not assume the numbers tell you what you want them to tell.

Because you have seen it on my slides, does not make it right!