Java Measurement Harness - JMH

How to do Microbenchmarking Correctly

An atypical Motivation

Programming language power consumption aka efficiency

Goals

What do we want to learn today?

Explicit Goals

Learn JMH basics
Learn the approach to micro benchmarking
Learn careful benchmarking
Learn validation of results
Learn to question and cross-check

Implicit Goals

Understand Java internals
Understand modern memory and cpu concepts
Avoid a mistake or two...

The Code

Less typing and more reading

Code: https://github.com/Xceptance/jmh-training
License: MIT License

A Challenge

No Challenge, no Fun

Investigate a Common Task

Split a String with common tooling

Given strings such as Java,Go,Rust
Split them at a certain character aka ,
Limit to JDK 8 resources for now
Maybe we try a naive implementation as well
Limited to a single separator character for this training
Results should be an List<String>

String.split()
java.util.StringTokenizer
Pattern.split()
Pattern.splitAsStream()

Theory: No idea, no reason to measure

Benchmarking needs theories and expectations

Have a theory about the outcome
Have same expectations
Best case: Houston, we got a problem!
Benchmarking is horribly wrong in most cases
If there is nothing you can validate your results against, it stays wrong
Even a wrong theory can lead to enough probing and research to turn it right

Speed
Change in Speed and Change in Data
Suitability for Short or Long Running Code
Suitability for Tasks during Warmup
Resource Usage
Theory: Tokenizer is fastest, Stream is not suitable for short tasks, and Pattern.split and String.split performing similarly

JMH - Java Measurement Harness

The Basics To Get Started

Why a Framework?

Why do we need a measurement framework?

Java is interpreted and compiled
Hotspot based Java is multi-staged compiled
Java optimizes things out, it doesn't need
System.currentTimeMillis can be cached
Profiler support built-in would be cool

Developed by the OpenJDK team
The recommended way to do microbenchmarks
Platform support and profiles built-in
Only Maven support, no IDE support
Part of the JDK since Java 12
Proven to be almost right

https://github.com/Xceptance/jmh-training

Simplest Benchmark

The minimal benchmark setup

You either use the archetype to create things
or you just go with the provided example
mvn clean compile install
java -jar target/benchmarks.jar Training01

No special settings yet, we discuss that next
But it does not look helpful at all
It runs horribly long


package org.xceptance.training.jmh;

import org.openjdk.jmh.annotations.Benchmark;

/**
 * Simplies benchmark ever. All default and we measure nothing
 * 
 * @author rschwietzke
 */
public class Training01
{
     @Benchmark
     public void bench()
     {
     }
}


# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4
# VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
# VM options: <none>
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.xceptance.training.jmh.Training01.bench

# Run progress: 0.00% complete, ETA 00:08:20
# Fork: 1 of 5
# Warmup Iteration   1: 1583355399.039 ops/s
# Warmup Iteration   2: 1539214226.386 ops/s
# Warmup Iteration   3: 1497187122.677 ops/s
# Warmup Iteration   4: 1464143076.223 ops/s
# Warmup Iteration   5: 1408890030.487 ops/s
Iteration   1: 1400594747.986 ops/s
Iteration   2: 1443236681.579 ops/s
Iteration   3: 1394508697.454 ops/s
Iteration   4: 1423377056.796 ops/s
Iteration   5: 1418124423.941 ops/s
...

https://github.com/Xceptance/jmh-training

Test Runtime

How to control the overall measurement

We have warmup and we have measurement
@Warmup: How long to warm the VM
@Measurement: How often to measure
@Fork: How often to repeat in a new VM

The less you warmup or measure, the more inaccurate
The less VM forks the less probing
Most of the time, small works fine but always validate


/**
 * Restrain the execution and explain warmup and measurement
 * Warmup 3x for 1 sec
 * Measure 5x for 1 sec and average the result
 * Fork only once
 * 
 * @author rschwietzke
 */
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training02
{
     @Benchmark
     public void bench()
     {
     }
}


# Run progress: 0.00% complete, ETA 00:00:08
# Fork: 1 of 1
# Warmup Iteration   1: 1641713453.126 ops/s
# Warmup Iteration   2: 1581322414.154 ops/s
# Warmup Iteration   3: 1549432518.891 ops/s
Iteration   1: 1523201697.274 ops/s
Iteration   2: 1559404970.899 ops/s
Iteration   3: 1543778128.057 ops/s
Iteration   4: 1597859305.474 ops/s
Iteration   5: 1584935253.225 ops/s


Result "org.xceptance.training.jmh.Training02.bench":
  1561835870.986 ±(99.9%) 116412105.696 ops/s [Average]
  (min, avg, max) = (1523201697.274, 1561835870.986, 1597859305.474), 
  stdev = 30231854.167
  CI (99.9%): [1445423765.289, 1678247976.682] (assumes normal distribution)


# Run complete. Total time: 00:00:08

Benchmark          Mode  Cnt           Score           Error  Units
Training02.bench  thrpt    5  1561835870.986 ± 116412105.696  ops/s

https://github.com/Xceptance/jmh-training

Still not easy to use

Change the measurement mode

@BenchmarkMode: How to measure
- AverageTime: Average all executions per iteration
- Throughput: Check how often you can execute it per iteration
- SingleShot: Execute it once per iteration
- SampleTime: Don't average, statistical evaluate each execute per iteration
AverageTime and SingleShot will be used today


@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training03
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MICROSECONDS)
     public void benchAvg() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }

     @Benchmark
     @BenchmarkMode(Mode.SampleTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public void benchSample() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }

     @Benchmark
     @BenchmarkMode(Mode.Throughput)
     @OutputTimeUnit(TimeUnit.SECONDS)
     public void benchThroughput() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }
     
     @Benchmark
     @BenchmarkMode(Mode.SingleShotTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public void benchSingle() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }
}


Benchmark                                     Mode  Cnt       Score    Error  Units
Training03.benchThroughput                   thrpt    5       9.982 ±  0.003  ops/s
Training03.benchAvg                           avgt    5  100182.424 ± 69.858  us/op

Training03.benchSample                      sample   50     100.110 ±  0.027  ms/op
Training03.benchSample:benchSample·p0.00    sample          100.008           ms/op
Training03.benchSample:benchSample·p0.50    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.90    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.95    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.99    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.999   sample          100.139           ms/op
Training03.benchSample:benchSample·p0.9999  sample          100.139           ms/op
Training03.benchSample:benchSample·p1.00    sample          100.139           ms/op

Training03.benchSingle                          ss    5     100.157 ±  0.143  ms/op

https://github.com/Xceptance/jmh-training

Basics Set, Let's Get Testing

Try some theories and start small

Naive First Benchmark

Don't think about it, just do it.

Just split a string the recommended way
Increase iterations so we can see some runtime development
Our measurement is somehow not consistent
Something is wrong, isn't it?


/**
 * Small test first. Totally naive!!
 * 
 * @author rschwietzke
 */
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training05
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.NANOSECONDS)
     public void bench()
     {
         "Foo,Bar,Test".split(",");
     }
}


# Run progress: 0.00% complete, ETA 00:00:13
# Fork: 1 of 1
# Warmup Iteration   1: 194.662 ns/op
# Warmup Iteration   2: 166.320 ns/op
# Warmup Iteration   3: 158.765 ns/op
Iteration   1: 155.177 ns/op
Iteration   2: 148.119 ns/op
Iteration   3: 143.208 ns/op
Iteration   4: 144.138 ns/op
Iteration   5: 143.335 ns/op
Iteration   6: 144.108 ns/op
Iteration   7: 146.223 ns/op
Iteration   8: 145.873 ns/op
Iteration   9: 142.524 ns/op
Iteration  10: 144.172 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training05.bench  avgt   10  145.688 ± 5.638  ns/op

https://github.com/Xceptance/jmh-training

More Warming Up

Try to correct the unsteady result.

More warmup time
When the warmup already shows the runtime of the measurement, we might have warmed up right


/**
 * Small test first. Totally naive!!
 * Now with more warmup.
 *  
 * @author rschwietzke
 */
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training06
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.NANOSECONDS)
     public void bench()
     {
         "Foo,Bar,Test".split(",");
     }
}


# Run progress: 0.00% complete, ETA 00:00:20
# Fork: 1 of 1
# Warmup Iteration   1: 162.427 ns/op
# Warmup Iteration   2: 152.925 ns/op
# Warmup Iteration   3: 139.471 ns/op
# Warmup Iteration   4: 139.766 ns/op
# Warmup Iteration   5: 140.489 ns/op
Iteration   1: 142.501 ns/op
Iteration   2: 140.077 ns/op
Iteration   3: 140.256 ns/op
Iteration   4: 142.049 ns/op
Iteration   5: 140.772 ns/op
Iteration   6: 142.111 ns/op
Iteration   7: 140.427 ns/op
Iteration   8: 142.458 ns/op
Iteration   9: 142.550 ns/op
Iteration  10: 141.791 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  141.499 ± 1.515  ns/op

https://github.com/Xceptance/jmh-training

Measure more

Measure it five times

We use @Fork(5) to repeat it five times.
We see nicely the fluctuation in measurement
We talk small numbers here!!


# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 5
# Warmup Iteration   1: 209.659 ns/op
# Warmup Iteration   2: 155.465 ns/op
# Warmup Iteration   3: 139.121 ns/op
# Warmup Iteration   4: 140.133 ns/op
# Warmup Iteration   5: 140.187 ns/op
Iteration   1: 139.212 ns/op
...
Iteration  10: 139.401 ns/op

# Run progress: 20.00% complete, ETA 00:01:22
# Fork: 2 of 5
# Warmup Iteration   1: 159.677 ns/op
...
# Warmup Iteration   5: 138.522 ns/op
Iteration   1: 139.015 ns/op
...
Iteration  10: 145.531 ns/op

# Run progress: 40.00% complete, ETA 00:01:01
# Fork: 3 of 5
# Warmup Iteration   1: 168.662 ns/op
...
# Warmup Iteration   5: 141.841 ns/op
Iteration   1: 142.497 ns/op
...
Iteration  10: 143.128 ns/op

# Run progress: 60.00% complete, ETA 00:00:40
# Fork: 4 of 5
# Warmup Iteration   1: 165.220 ns/op
...
# Warmup Iteration   5: 144.540 ns/op
Iteration   1: 144.370 ns/op
...
Iteration  10: 144.914 ns/op

# Run progress: 80.00% complete, ETA 00:00:20
# Fork: 5 of 5
# Warmup Iteration   1: 175.664 ns/op
...
# Warmup Iteration   5: 134.586 ns/op
Iteration   1: 134.711 ns/op
...
Iteration  10: 136.447 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training07.bench  avgt   50  140.803 ± 1.530  ns/op

https://github.com/Xceptance/jmh-training

Warning: Things can get wrong quickly

BAT vs. AC and JDK8 vs. JDK11


# JDK 11, BAT mode

# Warmup Iteration   1: 160.005 ns/op
# Warmup Iteration   2: 158.151 ns/op
# Warmup Iteration   3: 137.997 ns/op
# Warmup Iteration   4: 137.938 ns/op
# Warmup Iteration   5: 137.956 ns/op
Iteration   1: 153.733 ns/op
Iteration   2: 145.046 ns/op
Iteration   3: 137.993 ns/op
...
Iteration   8: 137.727 ns/op
Iteration   9: 138.723 ns/op
Iteration  10: 138.151 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  140.685 ± 7.650  ns/op


# JDK 8, BAT mode

# Warmup Iteration   1: 175.988 ns/op
# Warmup Iteration   2: 146.560 ns/op
# Warmup Iteration   3: 144.860 ns/op
# Warmup Iteration   4: 145.489 ns/op
# Warmup Iteration   5: 147.307 ns/op
Iteration   1: 147.824 ns/op
Iteration   2: 153.391 ns/op
Iteration   3: 147.060 ns/op
...
Iteration   8: 147.824 ns/op
Iteration   9: 146.193 ns/op
Iteration  10: 147.381 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  147.658 ± 3.227  ns/op


# JDK 11, AC mode

# Warmup Iteration   1: 124.700 ns/op
# Warmup Iteration   2: 114.472 ns/op
# Warmup Iteration   3: 112.499 ns/op
# Warmup Iteration   4: 109.766 ns/op
# Warmup Iteration   5: 109.921 ns/op
Iteration   1: 110.670 ns/op
Iteration   2: 109.924 ns/op
Iteration   3: 110.574 ns/op
...
Iteration   8: 114.441 ns/op
Iteration   9: 110.204 ns/op
Iteration  10: 109.732 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  110.667 ± 2.085  ns/op


# JDK 11, AC mode, Benchmark in Background

# Warmup Iteration   1: 152.907 ns/op
# Warmup Iteration   2: 165.058 ns/op
# Warmup Iteration   3: 143.192 ns/op
# Warmup Iteration   4: 151.717 ns/op
# Warmup Iteration   5: 205.670 ns/op
Iteration   1: 265.120 ns/op
Iteration   2: 193.039 ns/op
Iteration   3: 220.406 ns/op
...
Iteration   8: 198.791 ns/op
Iteration   9: 225.030 ns/op
Iteration  10: 179.021 ns/op

Benchmark         Mode  Cnt    Score    Error  Units
Training06.bench  avgt   10  222.567 ± 42.339  ns/op

https://github.com/Xceptance/jmh-training


/**
 * Ok, we still have to learn one thing, one thing only....
 * Return or eat something otherwise this is nonsense.
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training08
{
     @Benchmark
     public void s1()
     {
         new String();
     }

     @Benchmark
     public void s2()
     {
         String s = "";
     }

     @Benchmark
     public void s3()
     {
         String s = new String("");
     }

     @Benchmark
     public String s1_r()
     {
         return new String();
     }
     
     @Benchmark
     public String s2_r()
     {
         return "";
     }

     @Benchmark
     public String s3_r()
     {
         return new String("");
     }     
     
     @Benchmark
     public void nothing()
     {
     }
}

We gotta talk!

There is still something wrong


Benchmark           Mode  Cnt  Score   Error  Units
Training08.s1       avgt    3  0.503 ± 0.047  ns/op
Training08.s2       avgt    3  0.498 ± 0.019  ns/op
Training08.s3       avgt    3  0.511 ± 0.113  ns/op

Training08.s1_r     avgt    3  6.446 ± 0.566  ns/op
Training08.s2_r     avgt    3  3.259 ± 0.238  ns/op
Training08.s3_r     avgt    3  7.439 ± 1.736  ns/op

Training08.nothing  avgt    3  0.502 ± 0.141  ns/op

Similar code is way faster when nothing is returned?
An empty method is the same speed
Hence: The compiler optimized all code away, because it is not needed
Conclusion: Always return something.


# Execution Interpreter only: -Xint   
    
Benchmark           Mode  Cnt    Score    Error  Units
Training08.s1       avgt    3  289.713 ± 35.695  ns/op
Training08.s2       avgt    3   86.911 ±  4.295  ns/op
Training08.s3       avgt    3  284.768 ± 46.444  ns/op

Training08.s1_r     avgt    3  353.986 ± 81.802  ns/op
Training08.s2_r     avgt    3  167.986 ± 38.805  ns/op
Training08.s3_r     avgt    3  366.652 ± 90.978  ns/op

Training08.nothing  avgt    3   92.487 ±  1.405  ns/op

https://github.com/Xceptance/jmh-training


/**
 * Just returning does not help... the right thing has to be returned
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training09
{
     @Benchmark
     public long returnLong()
     {
         return 1 + new Random().nextLong();
     }

     @Benchmark
     public long returnIgnoreString()
     {
         final long l = new Random().nextLong();
    
         final String s = String.valueOf(l);
         return 10 + l;
     }

     @Benchmark
     public long returnAndUseString()
     {
         final long l = new Random().nextLong();

         final String s = String.valueOf(l);
         return s.length() + l;
     }
}

Just return something

Well, it is important what to return


Benchmark                      Mode  Cnt    Score    Error  Units
Training09.returnLong          avgt    3   58.611 ±  1.708  ns/op
Training09.returnIgnoreString  avgt    3  101.467 ±  4.931  ns/op
Training09.returnAndUseString  avgt    3  104.862 ± 33.953  ns/op

This does not prove the theory
But still a warning, Java is free to kill code it does not need
As long as the final result is the same, Java can make all kind of stunts

https://github.com/Xceptance/jmh-training


/**
 * Blackholes to the rescue
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training10
{
     @Benchmark
     public void s1_b(final Blackhole b)
     {
         b.consume(new String());
     }
     
     @Benchmark
     public void s2_b(final Blackhole b)
     {
         b.consume("");
     }

     @Benchmark
     public void s3_b(final Blackhole b)
     {
         b.consume(new String(""));
     }     
}

There is a Thing for It

Just make things disappear


Benchmark           Mode  Cnt  Score   Error  Units
Training10.s1_r     avgt    3  6.409 ± 0.487  ns/op
Training10.s2_r     avgt    3  3.251 ± 0.513  ns/op
Training10.s3_r     avgt    3  7.325 ± 0.890  ns/op

Training10.s1_b     avgt    3  6.349 ± 0.487  ns/op
Training10.s2_b     avgt    3  3.256 ± 0.439  ns/op
Training10.s3_b     avgt    3  7.258 ± 0.253  ns/op

Training10.nothing  avgt    3  0.504 ± 0.184  ns/op

JMH object to consume content
Built in a way to avoid compiler optimization
Declare it as parameter and you get one, don't instantiate

https://github.com/Xceptance/jmh-training

Let's Get Serious

After the intro, let's try really something out


/**
 * Our four methods
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training20
{
    final String toSplit = "foo,bar,mars,sun";
    
    @Benchmark
    public void split(final Blackhole b)
    {
        final String[] cols = toSplit.split(",");

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void tokenize(final Blackhole b)
    {
        final StringTokenizer st = new StringTokenizer(toSplit, ",");

        while (st.hasMoreTokens())
        {
            b.consume(st.nextToken());
        }
    }

    @Benchmark
    public void pattern(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        final String[] cols = p.split(toSplit); 

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void stream(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        p.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
}

Our First Attempt

Just a naive first benchmark

Our four different split methods
Plain first usage and consumption (!)
Not yet mapped to a common usage pattern
So... what will be the result?


Benchmark            Mode  Cnt    Score    Error  Units
=======================================================
Training20.split     avgt    3  157.705 ± 25.934  ns/op
Training20.tokenize  avgt    3  135.870 ± 13.294  ns/op
Training20.pattern   avgt    3  400.350 ± 14.014  ns/op
Training20.stream    avgt    3  428.894 ± 17.496  ns/op

What's next?
Finished?
Did we get a result?

https://github.com/Xceptance/jmh-training

No conclusive result

We are not there yet


Benchmark            Mode  Cnt    Score    Error  Units
=======================================================
Training20.split     avgt    3  157.705 ± 25.934  ns/op
Training20.tokenize  avgt    3  135.870 ± 13.294  ns/op
Training20.pattern   avgt    3  400.350 ± 14.014  ns/op
Training20.stream    avgt    3  428.894 ± 17.496  ns/op

It is data... kind of a fact
It is nothing conclusive
It gives us a direction for more testing

Theories and Observations

String.split and Pattern.split should behave the same
Memory is the problem
Data size is bad for a Stream

Next Steps

Tune GC
Check the memory consumption
Try a pre-compiled pattern
Vary the data in size and content


$ java -jar target/benchmarks.jar -lprof

Supported profilers:
        cl: Classloader profiling via standard MBeans 
      comp: JIT compiler profiling via standard MBeans 
        gc: GC profiling via standard MBeans 
     hs_cl: HotSpot (tm) classloader profiling via ... MBeans 
   hs_comp: HotSpot (tm) JIT compiler profiling via ... MBeans 
     hs_gc: HotSpot (tm) memory manager (GC) profiling via ... MBeans 
     hs_rt: HotSpot (tm) runtime profiling via implementation-specific MBeans 
    hs_thr: HotSpot (tm) threading subsystem via implementation-specific MBeans 
    pauses: Pauses profiler 
      perf: Linux perf Statistics 
   perfasm: Linux perf + PrintAssembly Profiler 
  perfnorm: Linux perf statistics, normalized by operation count 
safepoints: Safepoints profiler 
     stack: Simple and naive Java stack profiler 

Unsupported profilers:
 dtraceasm: <none> 
  xperfasm: <none>

Profile into a Problem

Use profilers to know more

See available profilers:
java -jar target/benchmark.jar -lprof
Use a profiler:
java -jar <file-here> Training20 -prof gc

https://github.com/Xceptance/jmh-training

Try Poor Man's Tuning

Do some GC research

Ask profiler first
Use G1 (JDK 11 default)
Give it more memory
Take GC out of the equation and run EpsilonGC

java -jar ... Training20 -prof gc


Benchmark                              Mode  Cnt     Score      Error   Units
=============================================================================
pattern                                avgt    3   405.561 ±   43.193   ns/op
pattern:·gc.alloc.rate                 avgt    3  1442.185 ±  167.736  MB/sec
pattern:·gc.alloc.rate.norm            avgt    3   920.000 ±    0.001    B/op
pattern:·gc.churn.G1_Eden_Space        avgt    3  1511.200 ± 3646.752  MB/sec
pattern:·gc.churn.G1_Eden_Space.norm   avgt    3   963.512 ± 2206.159    B/op
pattern:·gc.churn.G1_Old_Gen           avgt    3     0.001 ±    0.024  MB/sec
pattern:·gc.churn.G1_Old_Gen.norm      avgt    3     0.001 ±    0.015    B/op
pattern:·gc.count                      avgt    3    13.000             counts
pattern:·gc.time                       avgt    3    15.000                 ms

split                                  avgt    3   163.800 ±   30.267   ns/op
split:·gc.alloc.rate                   avgt    3  1086.629 ±  202.796  MB/sec
split:·gc.alloc.rate.norm              avgt    3   280.000 ±    0.001    B/op
split:·gc.churn.G1_Eden_Space          avgt    3  1169.404 ± 3357.654  MB/sec
split:·gc.churn.G1_Eden_Space.norm     avgt    3   301.071 ±  821.807    B/op
split:·gc.churn.G1_Old_Gen             avgt    3    ≈ 10⁻³             MB/sec
split:·gc.churn.G1_Old_Gen.norm        avgt    3    ≈ 10⁻⁴               B/op
split:·gc.count                        avgt    3    11.000             counts
split:·gc.time                         avgt    3    11.000                 ms

stream                                 avgt    3   445.319 ±   41.647   ns/op
stream:·gc.alloc.rate                  avgt    3  1472.936 ±  139.776  MB/sec
stream:·gc.alloc.rate.norm             avgt    3  1032.000 ±    0.001    B/op
stream:·gc.churn.G1_Eden_Space         avgt    3  1491.875 ±    8.525  MB/sec
stream:·gc.churn.G1_Eden_Space.norm    avgt    3  1045.289 ±  103.696    B/op
stream:·gc.churn.G1_Old_Gen            avgt    3    ≈ 10⁻⁴             MB/sec
stream:·gc.churn.G1_Old_Gen.norm       avgt    3    ≈ 10⁻⁴               B/op
stream:·gc.count                       avgt    3    12.000             counts
stream:·gc.time                        avgt    3    12.000                 ms

tokenize                               avgt    3   136.363 ±   58.743   ns/op
tokenize:·gc.alloc.rate                avgt    3  1118.825 ±  474.088  MB/sec
tokenize:·gc.alloc.rate.norm           avgt    3   240.000 ±    0.001    B/op
tokenize:·gc.churn.G1_Eden_Space       avgt    3  1126.091 ± 3234.817  MB/sec
tokenize:·gc.churn.G1_Eden_Space.norm  avgt    3   241.927 ±  745.129    B/op
tokenize:·gc.churn.G1_Old_Gen          avgt    3     0.001 ±    0.010  MB/sec
tokenize:·gc.churn.G1_Old_Gen.norm     avgt    3    ≈ 10⁻⁴               B/op
tokenize:·gc.count                     avgt    3    11.000             counts
tokenize:·gc.time                      avgt    3    10.000                 ms

https://github.com/Xceptance/jmh-training

Try Poor Man's Tuning

Do some GC research

Ask profiler first
Use G1 (JDK 11 default)
Give it more memory
Take GC out of the equation and run EpsilonGC

-XX:+UseG1GC
-Xms4g
-Xmx4g
-XX:+AlwaysPreTouch


$ java -jar target/benchmarks.jar Training20
========================================================
Benchmark            Mode  Cnt    Score     Error  Units
Training20.pattern   avgt    3  405.122 ±  70.341  ns/op
Training20.split     avgt    3  156.316 ±  33.948  ns/op
Training20.stream    avgt    3  434.671 ± 105.422  ns/op
Training20.tokenize  avgt    3  133.950 ±  11.445  ns/op

$ java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch -jar ... Training20
Benchmark            Mode  Cnt    Score     Error  Units
========================================================
Training20.pattern   avgt    3  397.086 ±  27.243  ns/op
Training20.split     avgt    3  156.265 ±  54.684  ns/op
Training20.stream    avgt    3  422.262 ± 146.073  ns/op
Training20.tokenize  avgt    3  131.642 ±  14.806  ns/op

https://github.com/Xceptance/jmh-training

Try Poor Man's Tuning

Do some GC research

Ask profiler first
Use G1 (JDK 11 default)
Give it more memory
Take GC out of the equation and run EpsilonGC

-Xms10g -Xmx10g
-XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions
-XX:+UseEpsilonGC


$ java -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions \
    -XX:+UseEpsilonGC -jar target/benchmarks.jar Training20

# Warmup Iteration   1: 680.389 ns/op
# Warmup Iteration   2: 589.493 ns/op
# Warmup Iteration   3: 637.540 ns/op
# Warmup Iteration   4: Terminating due to java.lang.OutOfMemoryError: Java heap space
<forked VM failed with exit code 3>
<stdout last='20 lines'>
Terminating due to java.lang.OutOfMemoryError: Java heap space
</stdout>
<stderr last='20 lines'>
</stderr>

Ok, that is not helpful
We burn too much memory per iteration
But EpsilonGC is a good idea for less demanding case

https://github.com/Xceptance/jmh-training


    public class Training21
{
    final String toSplit = "foo,bar,mars,sun";
    final Pattern precompiledPattern = Pattern.compile(",");

    @Benchmark
    public void precompiledPattern(final Blackhole b)
    {
        final String[] cols = precompiledPattern.split(toSplit);
        
        for (String s : cols)
        {
            b.consume(s);
        }
    }
    
    @Benchmark
    public void precompiledStream(final Blackhole b)
    {
        precompiledPattern.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
...
    @Benchmark
    public void pattern(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        final String[] cols = p.split(toSplit);

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void stream(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        p.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
}

The Pattern Theory

Compiling Patterns is Key

Precompile patterns or else


Benchmark                      Mode  Cnt    Score     Error  Units
Training21.pattern             avgt    3  475.992 ±  67.709  ns/op
Training21.precompiledPattern  avgt    3  315.103 ±  44.834  ns/op

Training21.stream              avgt    3  571.862 ± 623.681  ns/op
Training21.precompiledStream   avgt    3  349.376 ±  80.088  ns/op

Training21.split               avgt    3  182.281 ±  15.728  ns/op
Training21.tokenize            avgt    3  218.433 ± 807.886  ns/op

It helped a lot
Site by site comparison shows it clearly
Avoids comparing to other runs with maybe wrong assumptions
Always try to compare within a run if possible

https://github.com/Xceptance/jmh-training


public class Training30
{
    final FastRandom r = new FastRandom(7L);

    final String[] data = {
            "foo",
            "foo,bar",
            "foo,bar,buz",
            "foo,bar,buz,top",
            "foo,bar,buz,top,tof",
            "foo,bar,buz,top,tof,rot",
            "foo,bar,buz,top,tof,rot,qwe",
            "foo,bar,buz,top,tof,rot,qwe,pop",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
    };
    
    String toSplit;

    @Setup(Level.Invocation)
    public void setup()
    {
        toSplit = data[r.nextInt(data.length)];
    }
    
    ...
}

What about Data?

Not the Android of course

@Setup permits to run any setup before
- Level.Invocation any benchmark method call
- Level.Iteration any measurement iteration
Iteration is default when nothing is specified


Benchmark                      Mode  Cnt    Score    Error  Units
Training30.pattern             avgt    3  587.557 ± 74.707  ns/op
Training30.precompiledPattern  avgt    3  441.839 ± 49.530  ns/op

Training30.stream              avgt    3  648.489 ± 18.975  ns/op
Training30.precompiledStream   avgt    3  494.524 ± 54.549  ns/op

Training30.split               avgt    3  272.838 ± 37.578  ns/op
Training30.tokenize            avgt    3  246.168 ± 51.308  ns/op


Benchmark                      Mode  Cnt    Score     Error  Units
Training21.pattern             avgt    3  475.992 ±  67.709  ns/op
Training21.precompiledPattern  avgt    3  315.103 ±  44.834  ns/op

Training21.stream              avgt    3  571.862 ± 623.681  ns/op
Training21.precompiledStream   avgt    3  349.376 ±  80.088  ns/op

Training21.split               avgt    3  182.281 ±  15.728  ns/op
Training21.tokenize            avgt    3  218.433 ± 807.886  ns/op

https://github.com/Xceptance/jmh-training


public class Training31
{
    @Param({"1", "5", "10", "20"})
    int size; 
    
    final String[] data = {
            "foo",
            "foo,bar,buz,top,tof,rot",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha," + 
                "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
    };
    
    String toSplit;

    @Setup
    public void setup()
    {
        switch (size)
        {
            case 1: toSplit = data[0]; break;
            case 5: toSplit = data[1]; break;
            case 10: toSplit = data[2]; break;
            case 20: toSplit = data[3]; break;
        };
    }
    ...
}

Precise Knowledge

Don't go over all data sizes at once

@Param data injected by JMH


Benchmark                      (size)  Mode  Cnt    Score 
Training31.pattern                  1  avgt    3  232.503
Training31.pattern                  5  avgt    3  546.355
Training31.pattern                 10  avgt    3  795.779
Training31.pattern                 20  avgt    3 1354.040

Training31.precompiledPattern       1  avgt    3   75.391
Training31.precompiledPattern       5  avgt    3  433.272
Training31.precompiledPattern      10  avgt    3  625.999
Training31.precompiledPattern      20  avgt    3 1227.985

Training31.stream                   1  avgt    3  275.538
Training31.stream                   5  avgt    3  638.750
Training31.stream                  10  avgt    3  864.800
Training31.stream                  20  avgt    3 1439.868

Training31.precompiledStream        1  avgt    3  121.478
Training31.precompiledStream        5  avgt    3  498.007
Training31.precompiledStream       10  avgt    3  743.830
Training31.precompiledStream       20  avgt    3 1305.923

Training31.split                    1  avgt    3    7.712
Training31.split                    5  avgt    3  245.822
Training31.split                   10  avgt    3  391.144
Training31.split                   20  avgt    3  841.006

Training31.tokenize                 1  avgt    3   12.375
Training31.tokenize                 5  avgt    3  239.217
Training31.tokenize                10  avgt    3  401.357
Training31.tokenize                20  avgt    3  828.597

https://github.com/Xceptance/jmh-training


public class Training32
{
    @Param({"1", "5", "10", "20"})
    int size; 

    @Param({"1", "2", "5", "10"})
    int length; 
    
    final Pattern precompiledPattern = Pattern.compile(",");

    String toSplit;

    @Setup
    public void setup()
    {
        String s = "";
        for (int i = 0; i < length; i++)
        {
            s += "a";
        }
        
        final StringJoiner sj = new StringJoiner(",");
        for (int i = 0; i < size; i++)
        {
            sj.add(s);
        }
        
        toSplit = sj.toString();
    }
...
}

Very Data Even More

Two Dimensional Testing

Very size if the full string to split aka substring count
Very length of each substring
JMH spans all combinations for us
Test runtime explodes to 20 min or so

https://github.com/Xceptance/jmh-training

Very Data Even More


Benchmark                      (length)  (size)  Mode  Cnt   Score  Units
Training32.split                      1       1  avgt    3    16.5  ns/op
Training32.split                      1       5  avgt    3   218.7  ns/op
Training32.split                      1      10  avgt    3   358.0  ns/op
Training32.split                      1      20  avgt    3   745.0  ns/op

Training32.split                      2       1  avgt    3    17.2  ns/op
Training32.split                      2       5  avgt    3   225.3  ns/op
Training32.split                      2      10  avgt    3   402.9  ns/op
Training32.split                      2      20  avgt    3   746.1  ns/op

Training32.split                      5       1  avgt    3    17.2  ns/op
Training32.split                      5       5  avgt    3   227.9  ns/op
Training32.split                      5      10  avgt    3   395.1  ns/op
Training32.split                      5      20  avgt    3   834.4  ns/op

Training32.split                     10       1  avgt    3    19.8  ns/op
Training32.split                     10       5  avgt    3   242.1  ns/op
Training32.split                     10      10  avgt    3   409.8  ns/op
Training32.split                     10      20  avgt    3   849.1  ns/op

Training32.tokenize                   1       1  avgt    3     9.2  ns/op
Training32.tokenize                   1       5  avgt    3   133.2  ns/op
Training32.tokenize                   1      10  avgt    3   277.7  ns/op
Training32.tokenize                   1      20  avgt    3   552.7  ns/op

Training32.tokenize                   2       1  avgt    3     9.3  ns/op
Training32.tokenize                   2       5  avgt    3   143.3  ns/op
Training32.tokenize                   2      10  avgt    3   298.2  ns/op
Training32.tokenize                   2      20  avgt    3   592.4  ns/op

Training32.tokenize                   5       1  avgt    3    11.4  ns/op
Training32.tokenize                   5       5  avgt    3   145.9  ns/op
Training32.tokenize                   5      10  avgt    3   290.9  ns/op
Training32.tokenize                   5      20  avgt    3   682.7  ns/op

Training32.tokenize                  10       1  avgt    3    15.8  ns/op
Training32.tokenize                  10       5  avgt    3   160.2  ns/op
Training32.tokenize                  10      10  avgt    3   329.0  ns/op
Training32.tokenize                  10      20  avgt    3   645.3  ns/op


Benchmark                      (length)  (size)  Mode  Cnt   Score  Units
Training32.split                      1       1  avgt    3    16.5  ns/op
Training32.split                      2       1  avgt    3    17.2  ns/op
Training32.split                      5       1  avgt    3    17.2  ns/op
Training32.split                     10       1  avgt    3    19.8  ns/op

Training32.split                      1       5  avgt    3   218.7  ns/op
Training32.split                      2       5  avgt    3   225.3  ns/op
Training32.split                      5       5  avgt    3   227.9  ns/op
Training32.split                     10       5  avgt    3   242.1  ns/op

Training32.split                      1      10  avgt    3   358.0  ns/op
Training32.split                      2      10  avgt    3   402.9  ns/op
Training32.split                      5      10  avgt    3   395.1  ns/op
Training32.split                     10      10  avgt    3   409.8  ns/op

Training32.split                      1      20  avgt    3   745.0  ns/op
Training32.split                      2      20  avgt    3   746.1  ns/op
Training32.split                      5      20  avgt    3   834.4  ns/op
Training32.split                     10      20  avgt    3   849.1  ns/op

Training32.tokenize                   1       1  avgt    3     9.2  ns/op
Training32.tokenize                   2       1  avgt    3     9.3  ns/op
Training32.tokenize                   5       1  avgt    3    11.4  ns/op
Training32.tokenize                  10       1  avgt    3    15.8  ns/op

Training32.tokenize                   1       5  avgt    3   133.2  ns/op
Training32.tokenize                   2       5  avgt    3   143.3  ns/op
Training32.tokenize                   5       5  avgt    3   145.9  ns/op
Training32.tokenize                  10       5  avgt    3   160.2  ns/op

Training32.tokenize                   1      10  avgt    3   277.7  ns/op
Training32.tokenize                   2      10  avgt    3   298.2  ns/op
Training32.tokenize                   5      10  avgt    3   290.9  ns/op
Training32.tokenize                  10      10  avgt    3   329.0  ns/op

Training32.tokenize                   1      20  avgt    3   552.7  ns/op
Training32.tokenize                   2      20  avgt    3   592.4  ns/op
Training32.tokenize                   5      20  avgt    3   682.7  ns/op
Training32.tokenize                  10      20  avgt    3   645.3  ns/op

https://github.com/Xceptance/jmh-training


public class Training35
{
    final Pattern precompiledPattern = Pattern.compile(",");
    String toSplit = "Go,Java,Rust,C";
    
    @Benchmark
    public List<String> precompiledPattern()
    {   
        final String[] cols = precompiledPattern.split(toSplit);
        return Arrays.asList(cols);
    }

     @Benchmark
     public List<String> pattern()
     {
         final Pattern p = Pattern.compile(",");
         final String[] cols = p.split(toSplit);
         
         return Arrays.asList(cols);
     }

     @Benchmark
     public List<String> tokenize()
     {
         final StringTokenizer st = new StringTokenizer(toSplit, ",");
         final List<String> result = new ArrayList<>();
         
         while (st.hasMoreTokens())
         {
             result.add(st.nextToken());
         }
         
         return result;
     }
...
}

Back to the Goal

We wanted more than this

Results are supposed to be List<String>
Simplify a little (less data) and rewrite


Benchmark                       Mode  Cnt  Score  Units
Training35.tokenize             avgt    3  165.3  ns/op
Training35.split                avgt    3  184.4  ns/op
Training35.precompiledPattern   avgt    3  325.0  ns/op
Training35.precompiledStream    avgt    3  347.9  ns/op
Training35.pattern              avgt    3  432.1  ns/op
Training35.stream               avgt    3  478.5  ns/op


# Original test with the data of Training35
Benchmark                       Mode  Cnt  Score  Units
Training35a.tokenize            avgt    3  113.4  ns/op
Training35a.split               avgt    3  195.1  ns/op
Training35a.precompiledPattern  avgt    3  314.0  ns/op
Training35a.precompiledStream   avgt    3  278.2  ns/op
Training35a.pattern             avgt    3  435.8  ns/op
Training35a.stream              avgt    3  407.0  ns/op

https://github.com/Xceptance/jmh-training


@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.SingleShotTime)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training40
{
    ...
    @Benchmark
    @Warmup(iterations = 0)
    @Measurement(iterations = 10)
    public List<String> tokenizeCold()
    {
        return tokenize(toSplit);
    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    public List<String> tokenizeWarm()
    {
        return tokenize(toSplit);
    }
    
    @Benchmark
    @Warmup(iterations = 100)
    @Measurement(iterations = 10)
    public List<String> tokenizeWarmer()
    {
        return tokenize(toSplit);
    }

    @Benchmark
    @Warmup(iterations = 10000)
    @Measurement(iterations = 10)
    public List<String> tokenizeHot()
    {
        return tokenize(toSplit);
    }      
    ...
}

Cold Performance

Checking warmup performance

So far, we checked the hot performance
All warmup pain was ignored
@BenchmarkMode(Mode.SingleShotTime) measures one time execution
@Warmup(iterations = 1) iterations define how often is measured

It gets really tricky


# Fork(1)
Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       300232.000               ns/op
Training40.splitCold          ss   10   52054.900 ± 139920.099  ns/op
Training40.splitHot           ss   10    6136.200 ±   6489.506  ns/op
Training40.splitWarm          ss   10   12353.500 ±   4100.746  ns/op
Training40.splitWarmer        ss   10   15408.600 ±  15743.686  ns/op
Training40.tokenizeIceCold    ss        73831.000               ns/op
Training40.tokenizeCold       ss   10   21710.100 ±  29223.398  ns/op
Training40.tokenizeWarm       ss   10   10834.800 ±   2105.584  ns/op
Training40.tokenizeWarmer     ss   10   12971.800 ±  13742.500  ns/op
Training40.tokenizeHot        ss   10    2547.300 ±   2586.435  ns/op

Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       292800.000               ns/op
Training40.splitCold          ss   10   51979.000 ± 159071.783  ns/op
Training40.splitHot           ss   10    5334.300 ±   4306.467  ns/op
Training40.splitWarm          ss   10   14026.800 ±   3826.923  ns/op
Training40.splitWarmer        ss   10   17236.500 ±  14239.322  ns/op
Training40.tokenizeIceCold    ss        77906.000               ns/op
Training40.tokenizeCold       ss   10   21006.500 ±  32446.097  ns/op
Training40.tokenizeWarm       ss   10    9151.900 ±   2316.037  ns/op
Training40.tokenizeWarmer     ss   10   14808.800 ±  14499.507  ns/op
Training40.tokenizeHot        ss   10    2360.800 ±   3371.624  ns/op

Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       444264.000               ns/op
Training40.splitCold          ss   10   56199.600 ± 132136.303  ns/op
Training40.splitWarm          ss   10   15304.500 ±   4467.518  ns/op
Training40.splitWarmer        ss   10   11234.500 ±   9546.547  ns/op
Training40.splitHot           ss   10    5776.100 ±   6985.279  ns/op
Training40.tokenizeIceCold    ss        74142.000               ns/op
Training40.tokenizeCold       ss   10   19142.300 ±  32839.600  ns/op
Training40.tokenizeWarm       ss   10   13393.500 ±   4882.290  ns/op
Training40.tokenizeWarmer     ss   10    6822.200 ±   3611.820  ns/op
Training40.tokenizeHot        ss   10    1248.800 ±   1709.296  ns/op

Results vary a lot
Trend is visible
Use @Fork(10) to measure more often from scratch
Fork repeats the entire test as it would be started fresh


# Fork(10)
Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss   10  380923.600 ±  96774.549  ns/op
Training40.splitCold          ss  100   88485.930 ± 102858.604  ns/op
Training40.splitWarm          ss  100   19252.030 ±   8536.751  ns/op
Training40.splitHot           ss  100    5804.700 ±   2754.319  ns/op
Training40.splitWarmer        ss  100   14554.390 ±   2554.317  ns/op
Training40.tokenizeIceCold    ss   10   78459.300 ±  33153.083  ns/op
Training40.tokenizeCold       ss  100   22712.400 ±   8913.579  ns/op
Training40.tokenizeWarm       ss  100   12422.790 ±   1382.564  ns/op
Training40.tokenizeWarmer     ss  100   13023.690 ±   3353.144  ns/op
Training40.tokenizeHot        ss  100    2047.580 ±   1066.626  ns/op

Wrong assumptions

Are we really the first?

Benchmark assume we are the first to use split() and StringTokenizer
Might not be true, just our code is cold, not everything
Use @Setup again to touch classes before use


public class Training41
{
    ...
    public String hold;
    
    @Setup
    public void setup()
    {
        // use String.spllt, but differently from us
        String[] result = "Test:Test".split(":");
        
        // use Tokenizer
        StringTokenizer st = new StringTokenizer(":");
        
        // ensure we are not removed
        hold = st.toString() + result;
    }
    ...
}


# Don't touch (Training40)
Benchmark                   Mode  Cnt        Score        Error  Units
Training40.splitIceCold       ss   10  380,923.600 ±  96774.549  ns/op
Training40.splitCold          ss  100   88,485.930 ± 102858.604  ns/op
Training40.splitWarm          ss  100   19,252.030 ±   8536.751  ns/op
Training40.splitHot           ss  100    5,804.700 ±   2754.319  ns/op
Training40.splitWarmer        ss  100   14,554.390 ±   2554.317  ns/op
Training40.tokenizeIceCold    ss   10   78,459.300 ±  33153.083  ns/op
Training40.tokenizeCold       ss  100   22,712.400 ±   8913.579  ns/op
Training40.tokenizeWarm       ss  100   12,422.790 ±   1382.564  ns/op
Training40.tokenizeWarmer     ss  100   13,023.690 ±   3353.144  ns/op
Training40.tokenizeHot        ss  100    2,047.580 ±   1066.626  ns/op

# Touch before use (Training41)
Benchmark                   Mode  Cnt        Score       Error  Units
Training41.splitIceCold       ss   10   58,826.800 ± 21882.562  ns/op
Training41.splitCold          ss  100   23,466.180 ±  6082.829  ns/op
Training41.splitWarm          ss  100   14,938.110 ±  1308.992  ns/op
Training41.splitWarmer        ss  100   12,317.460 ±  1884.732  ns/op
Training41.splitHot           ss  100    4,120.080 ±  2151.355  ns/op
Training41.tokenizeIceCold    ss   10   53,629.900 ± 34002.030  ns/op
Training41.tokenizeCold       ss  100   19,350.450 ±  6708.144  ns/op
Training41.tokenizeWarm       ss  100   11,405.080 ±   985.338  ns/op
Training41.tokenizeWarmer     ss  100    9,649.810 ±  2937.053  ns/op
Training41.tokenizeHot        ss  100    2,038.750 ±   692.256  ns/op

Matches expectations
Also shows that all results changed
Not sure if this is true

Help us test the benchmark

Need debugging and testing capabilities

We have not verified our code at all
When debugging is needed we cannot start it
Avoid using IDE based JMH, rather add JUnit to the mix
Needs a POM update too
Make the setup method also a JUnit setup
Either add a test method or use our method to test as well


<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.12</version>
</dependency>


@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread) // ignore for now
public class Training50
{
    private String toSplit;

    @Setup
    @Before
    public void setup()
    {
        toSplit = "C,Rust,Java";
    }
    
    @Benchmark
    public List<String> split()
    {
        final String[] cols = toSplit.split(",");
        return Arrays.asList(cols);
    }
    
    @Test
    public void testSplit()
    {
        final List<String> result = split();
        Assert.assertEquals(3, result.size());
        Assert.assertEquals("C", result.get(0));
        Assert.assertEquals("Rust", result.get(1));
        Assert.assertEquals("Java", result.get(2));
    }
}

Java Measurement Harness - JMH

An atypical Motivation

Goals

Explicit Goals

Implicit Goals

The Code

A Challenge

Investigate a Common Task

Theory: No idea, no reason to measure

JMH - Java Measurement Harness

Why a Framework?

Simplest Benchmark

Test Runtime

Still not easy to use

Basics Set, Let's Get Testing

Naive First Benchmark

More Warming Up

Measure more

Warning: Things can get wrong quickly

We gotta talk!

Just return something

There is a Thing for It

Let's Get Serious

Our First Attempt

No conclusive result

Theories and Observations

Next Steps

Profile into a Problem

Try Poor Man's Tuning

Try Poor Man's Tuning

Try Poor Man's Tuning

The Pattern Theory

What about Data?

Precise Knowledge

Very Data Even More

Very Data Even More

Back to the Goal

Cold Performance

It gets really tricky

Wrong assumptions

Help us test the benchmark

Questions and Answers