Java Measurement Harness - JMH

How to do Microbenchmarking Correctly

An atypical Motivation

Programming language power consumption aka efficiency

Goals

What do we want to learn today?

Explicit Goals

  • Learn JMH basics
  • Learn the approach to micro benchmarking
  • Learn careful benchmarking
  • Learn validation of results
  • Learn to question and cross-check

Implicit Goals

  • Understand Java internals
  • Understand modern memory and cpu concepts
  • Avoid a mistake or two...

The Code

Less typing and more reading

  • Code: https://github.com/Xceptance/jmh-training
  • License: MIT License

A Challenge

No Challenge, no Fun

Investigate a Common Task

Split a String with common tooling

  • Given strings such as Java,Go,Rust
  • Split them at a certain character aka ,
  • Limit to JDK 8 resources for now
  • Maybe we try a naive implementation as well
  • Limited to a single separator character for this training
  • Results should be an List<String>
  • String.split()
  • java.util.StringTokenizer
  • Pattern.split()
  • Pattern.splitAsStream()

Theory: No idea, no reason to measure

Benchmarking needs theories and expectations

  • Have a theory about the outcome
  • Have same expectations
  • Best case: Houston, we got a problem!
  • Benchmarking is horribly wrong in most cases
  • If there is nothing you can validate your results against, it stays wrong
  • Even a wrong theory can lead to enough probing and research to turn it right
  • Speed
  • Change in Speed and Change in Data
  • Suitability for Short or Long Running Code
  • Suitability for Tasks during Warmup
  • Resource Usage
  • Theory: Tokenizer is fastest, Stream is not suitable for short tasks, and Pattern.split and String.split performing similarly

JMH - Java Measurement Harness

The Basics To Get Started

Why a Framework?

Why do we need a measurement framework?

  • Java is interpreted and compiled
  • Hotspot based Java is multi-staged compiled
  • Java optimizes things out, it doesn't need
  • System.currentTimeMillis can be cached
  • Profiler support built-in would be cool
  • Developed by the OpenJDK team
  • The recommended way to do microbenchmarks
  • Platform support and profiles built-in
  • Only Maven support, no IDE support
  • Part of the JDK since Java 12
  • Proven to be almost right

Simplest Benchmark

The minimal benchmark setup

  • You either use the archetype to create things
  • or you just go with the provided example
  • mvn clean compile install
  • java -jar target/benchmarks.jar Training01
  • No special settings yet, we discuss that next
  • But it does not look helpful at all
  • It runs horribly long

package org.xceptance.training.jmh;

import org.openjdk.jmh.annotations.Benchmark;

/**
 * Simplies benchmark ever. All default and we measure nothing
 * 
 * @author rschwietzke
 */
public class Training01
{
     @Benchmark
     public void bench()
     {
     }
}    

# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4
# VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
# VM options: <none>
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.xceptance.training.jmh.Training01.bench

# Run progress: 0.00% complete, ETA 00:08:20
# Fork: 1 of 5
# Warmup Iteration   1: 1583355399.039 ops/s
# Warmup Iteration   2: 1539214226.386 ops/s
# Warmup Iteration   3: 1497187122.677 ops/s
# Warmup Iteration   4: 1464143076.223 ops/s
# Warmup Iteration   5: 1408890030.487 ops/s
Iteration   1: 1400594747.986 ops/s
Iteration   2: 1443236681.579 ops/s
Iteration   3: 1394508697.454 ops/s
Iteration   4: 1423377056.796 ops/s
Iteration   5: 1418124423.941 ops/s
...

Test Runtime

How to control the overall measurement

  • We have warmup and we have measurement
  • @Warmup: How long to warm the VM
  • @Measurement: How often to measure
  • @Fork: How often to repeat in a new VM
  • The less you warmup or measure, the more inaccurate
  • The less VM forks the less probing
  • Most of the time, small works fine but always validate

/**
 * Restrain the execution and explain warmup and measurement
 * Warmup 3x for 1 sec
 * Measure 5x for 1 sec and average the result
 * Fork only once
 * 
 * @author rschwietzke
 */
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training02
{
     @Benchmark
     public void bench()
     {
     }
}

# Run progress: 0.00% complete, ETA 00:00:08
# Fork: 1 of 1
# Warmup Iteration   1: 1641713453.126 ops/s
# Warmup Iteration   2: 1581322414.154 ops/s
# Warmup Iteration   3: 1549432518.891 ops/s
Iteration   1: 1523201697.274 ops/s
Iteration   2: 1559404970.899 ops/s
Iteration   3: 1543778128.057 ops/s
Iteration   4: 1597859305.474 ops/s
Iteration   5: 1584935253.225 ops/s


Result "org.xceptance.training.jmh.Training02.bench":
  1561835870.986 ±(99.9%) 116412105.696 ops/s [Average]
  (min, avg, max) = (1523201697.274, 1561835870.986, 1597859305.474), 
  stdev = 30231854.167
  CI (99.9%): [1445423765.289, 1678247976.682] (assumes normal distribution)


# Run complete. Total time: 00:00:08

Benchmark          Mode  Cnt           Score           Error  Units
Training02.bench  thrpt    5  1561835870.986 ± 116412105.696  ops/s

Still not easy to use

Change the measurement mode

  • @BenchmarkMode: How to measure
    • AverageTime: Average all executions per iteration
    • Throughput: Check how often you can execute it per iteration
    • SingleShot: Execute it once per iteration
    • SampleTime: Don't average, statistical evaluate each execute per iteration
  • AverageTime and SingleShot will be used today

@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training03
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.MICROSECONDS)
     public void benchAvg() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }

     @Benchmark
     @BenchmarkMode(Mode.SampleTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public void benchSample() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }

     @Benchmark
     @BenchmarkMode(Mode.Throughput)
     @OutputTimeUnit(TimeUnit.SECONDS)
     public void benchThroughput() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }
     
     @Benchmark
     @BenchmarkMode(Mode.SingleShotTime)
     @OutputTimeUnit(TimeUnit.MILLISECONDS)
     public void benchSingle() throws InterruptedException
     {
         TimeUnit.MILLISECONDS.sleep(100);
     }
}

Benchmark                                     Mode  Cnt       Score    Error  Units
Training03.benchThroughput                   thrpt    5       9.982 ±  0.003  ops/s
Training03.benchAvg                           avgt    5  100182.424 ± 69.858  us/op

Training03.benchSample                      sample   50     100.110 ±  0.027  ms/op
Training03.benchSample:benchSample·p0.00    sample          100.008           ms/op
Training03.benchSample:benchSample·p0.50    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.90    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.95    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.99    sample          100.139           ms/op
Training03.benchSample:benchSample·p0.999   sample          100.139           ms/op
Training03.benchSample:benchSample·p0.9999  sample          100.139           ms/op
Training03.benchSample:benchSample·p1.00    sample          100.139           ms/op

Training03.benchSingle                          ss    5     100.157 ±  0.143  ms/op

Basics Set, Let's Get Testing

Try some theories and start small

Naive First Benchmark

Don't think about it, just do it.

  • Just split a string the recommended way
  • Increase iterations so we can see some runtime development
  • Our measurement is somehow not consistent
  • Something is wrong, isn't it?

/**
 * Small test first. Totally naive!!
 * 
 * @author rschwietzke
 */
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training05
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.NANOSECONDS)
     public void bench()
     {
         "Foo,Bar,Test".split(",");
     }
}

# Run progress: 0.00% complete, ETA 00:00:13
# Fork: 1 of 1
# Warmup Iteration   1: 194.662 ns/op
# Warmup Iteration   2: 166.320 ns/op
# Warmup Iteration   3: 158.765 ns/op
Iteration   1: 155.177 ns/op
Iteration   2: 148.119 ns/op
Iteration   3: 143.208 ns/op
Iteration   4: 144.138 ns/op
Iteration   5: 143.335 ns/op
Iteration   6: 144.108 ns/op
Iteration   7: 146.223 ns/op
Iteration   8: 145.873 ns/op
Iteration   9: 142.524 ns/op
Iteration  10: 144.172 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training05.bench  avgt   10  145.688 ± 5.638  ns/op

More Warming Up

Try to correct the unsteady result.

  • More warmup time
  • When the warmup already shows the runtime of the measurement, we might have warmed up right

/**
 * Small test first. Totally naive!!
 * Now with more warmup.
 *  
 * @author rschwietzke
 */
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training06
{
     @Benchmark
     @BenchmarkMode(Mode.AverageTime)
     @OutputTimeUnit(TimeUnit.NANOSECONDS)
     public void bench()
     {
         "Foo,Bar,Test".split(",");
     }
}

# Run progress: 0.00% complete, ETA 00:00:20
# Fork: 1 of 1
# Warmup Iteration   1: 162.427 ns/op
# Warmup Iteration   2: 152.925 ns/op
# Warmup Iteration   3: 139.471 ns/op
# Warmup Iteration   4: 139.766 ns/op
# Warmup Iteration   5: 140.489 ns/op
Iteration   1: 142.501 ns/op
Iteration   2: 140.077 ns/op
Iteration   3: 140.256 ns/op
Iteration   4: 142.049 ns/op
Iteration   5: 140.772 ns/op
Iteration   6: 142.111 ns/op
Iteration   7: 140.427 ns/op
Iteration   8: 142.458 ns/op
Iteration   9: 142.550 ns/op
Iteration  10: 141.791 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  141.499 ± 1.515  ns/op

Measure more

Measure it five times

  • We use @Fork(5) to repeat it five times.
  • We see nicely the fluctuation in measurement
  • We talk small numbers here!!

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 5
# Warmup Iteration   1: 209.659 ns/op
# Warmup Iteration   2: 155.465 ns/op
# Warmup Iteration   3: 139.121 ns/op
# Warmup Iteration   4: 140.133 ns/op
# Warmup Iteration   5: 140.187 ns/op
Iteration   1: 139.212 ns/op
...
Iteration  10: 139.401 ns/op

# Run progress: 20.00% complete, ETA 00:01:22
# Fork: 2 of 5
# Warmup Iteration   1: 159.677 ns/op
...
# Warmup Iteration   5: 138.522 ns/op
Iteration   1: 139.015 ns/op
...
Iteration  10: 145.531 ns/op

# Run progress: 40.00% complete, ETA 00:01:01
# Fork: 3 of 5
# Warmup Iteration   1: 168.662 ns/op
...
# Warmup Iteration   5: 141.841 ns/op
Iteration   1: 142.497 ns/op
...
Iteration  10: 143.128 ns/op

# Run progress: 60.00% complete, ETA 00:00:40
# Fork: 4 of 5
# Warmup Iteration   1: 165.220 ns/op
...
# Warmup Iteration   5: 144.540 ns/op
Iteration   1: 144.370 ns/op
...
Iteration  10: 144.914 ns/op

# Run progress: 80.00% complete, ETA 00:00:20
# Fork: 5 of 5
# Warmup Iteration   1: 175.664 ns/op
...
# Warmup Iteration   5: 134.586 ns/op
Iteration   1: 134.711 ns/op
...
Iteration  10: 136.447 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training07.bench  avgt   50  140.803 ± 1.530  ns/op

Warning: Things can get wrong quickly

BAT vs. AC and JDK8 vs. JDK11


# JDK 11, BAT mode

# Warmup Iteration   1: 160.005 ns/op
# Warmup Iteration   2: 158.151 ns/op
# Warmup Iteration   3: 137.997 ns/op
# Warmup Iteration   4: 137.938 ns/op
# Warmup Iteration   5: 137.956 ns/op
Iteration   1: 153.733 ns/op
Iteration   2: 145.046 ns/op
Iteration   3: 137.993 ns/op
...
Iteration   8: 137.727 ns/op
Iteration   9: 138.723 ns/op
Iteration  10: 138.151 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  140.685 ± 7.650  ns/op

# JDK 8, BAT mode

# Warmup Iteration   1: 175.988 ns/op
# Warmup Iteration   2: 146.560 ns/op
# Warmup Iteration   3: 144.860 ns/op
# Warmup Iteration   4: 145.489 ns/op
# Warmup Iteration   5: 147.307 ns/op
Iteration   1: 147.824 ns/op
Iteration   2: 153.391 ns/op
Iteration   3: 147.060 ns/op
...
Iteration   8: 147.824 ns/op
Iteration   9: 146.193 ns/op
Iteration  10: 147.381 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  147.658 ± 3.227  ns/op

# JDK 11, AC mode

# Warmup Iteration   1: 124.700 ns/op
# Warmup Iteration   2: 114.472 ns/op
# Warmup Iteration   3: 112.499 ns/op
# Warmup Iteration   4: 109.766 ns/op
# Warmup Iteration   5: 109.921 ns/op
Iteration   1: 110.670 ns/op
Iteration   2: 109.924 ns/op
Iteration   3: 110.574 ns/op
...
Iteration   8: 114.441 ns/op
Iteration   9: 110.204 ns/op
Iteration  10: 109.732 ns/op

Benchmark         Mode  Cnt    Score   Error  Units
Training06.bench  avgt   10  110.667 ± 2.085  ns/op

# JDK 11, AC mode, Benchmark in Background

# Warmup Iteration   1: 152.907 ns/op
# Warmup Iteration   2: 165.058 ns/op
# Warmup Iteration   3: 143.192 ns/op
# Warmup Iteration   4: 151.717 ns/op
# Warmup Iteration   5: 205.670 ns/op
Iteration   1: 265.120 ns/op
Iteration   2: 193.039 ns/op
Iteration   3: 220.406 ns/op
...
Iteration   8: 198.791 ns/op
Iteration   9: 225.030 ns/op
Iteration  10: 179.021 ns/op

Benchmark         Mode  Cnt    Score    Error  Units
Training06.bench  avgt   10  222.567 ± 42.339  ns/op

/**
 * Ok, we still have to learn one thing, one thing only....
 * Return or eat something otherwise this is nonsense.
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training08
{
     @Benchmark
     public void s1()
     {
         new String();
     }

     @Benchmark
     public void s2()
     {
         String s = "";
     }

     @Benchmark
     public void s3()
     {
         String s = new String("");
     }

     @Benchmark
     public String s1_r()
     {
         return new String();
     }
     
     @Benchmark
     public String s2_r()
     {
         return "";
     }

     @Benchmark
     public String s3_r()
     {
         return new String("");
     }     
     
     @Benchmark
     public void nothing()
     {
     }
}    

We gotta talk!

There is still something wrong


Benchmark           Mode  Cnt  Score   Error  Units
Training08.s1       avgt    3  0.503 ± 0.047  ns/op
Training08.s2       avgt    3  0.498 ± 0.019  ns/op
Training08.s3       avgt    3  0.511 ± 0.113  ns/op

Training08.s1_r     avgt    3  6.446 ± 0.566  ns/op
Training08.s2_r     avgt    3  3.259 ± 0.238  ns/op
Training08.s3_r     avgt    3  7.439 ± 1.736  ns/op

Training08.nothing  avgt    3  0.502 ± 0.141  ns/op
  • Similar code is way faster when nothing is returned?
  • An empty method is the same speed
  • Hence: The compiler optimized all code away, because it is not needed
  • Conclusion: Always return something.

# Execution Interpreter only: -Xint   
    
Benchmark           Mode  Cnt    Score    Error  Units
Training08.s1       avgt    3  289.713 ± 35.695  ns/op
Training08.s2       avgt    3   86.911 ±  4.295  ns/op
Training08.s3       avgt    3  284.768 ± 46.444  ns/op

Training08.s1_r     avgt    3  353.986 ± 81.802  ns/op
Training08.s2_r     avgt    3  167.986 ± 38.805  ns/op
Training08.s3_r     avgt    3  366.652 ± 90.978  ns/op

Training08.nothing  avgt    3   92.487 ±  1.405  ns/op

/**
 * Just returning does not help... the right thing has to be returned
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training09
{
     @Benchmark
     public long returnLong()
     {
         return 1 + new Random().nextLong();
     }

     @Benchmark
     public long returnIgnoreString()
     {
         final long l = new Random().nextLong();
    
         final String s = String.valueOf(l);
         return 10 + l;
     }

     @Benchmark
     public long returnAndUseString()
     {
         final long l = new Random().nextLong();

         final String s = String.valueOf(l);
         return s.length() + l;
     }
}

Just return something

Well, it is important what to return


Benchmark                      Mode  Cnt    Score    Error  Units
Training09.returnLong          avgt    3   58.611 ±  1.708  ns/op
Training09.returnIgnoreString  avgt    3  101.467 ±  4.931  ns/op
Training09.returnAndUseString  avgt    3  104.862 ± 33.953  ns/op
  • This does not prove the theory
  • But still a warning, Java is free to kill code it does not need
  • As long as the final result is the same, Java can make all kind of stunts

/**
 * Blackholes to the rescue
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training10
{
     @Benchmark
     public void s1_b(final Blackhole b)
     {
         b.consume(new String());
     }
     
     @Benchmark
     public void s2_b(final Blackhole b)
     {
         b.consume("");
     }

     @Benchmark
     public void s3_b(final Blackhole b)
     {
         b.consume(new String(""));
     }     
}

There is a Thing for It

Just make things disappear


Benchmark           Mode  Cnt  Score   Error  Units
Training10.s1_r     avgt    3  6.409 ± 0.487  ns/op
Training10.s2_r     avgt    3  3.251 ± 0.513  ns/op
Training10.s3_r     avgt    3  7.325 ± 0.890  ns/op

Training10.s1_b     avgt    3  6.349 ± 0.487  ns/op
Training10.s2_b     avgt    3  3.256 ± 0.439  ns/op
Training10.s3_b     avgt    3  7.258 ± 0.253  ns/op

Training10.nothing  avgt    3  0.504 ± 0.184  ns/op
  • JMH object to consume content
  • Built in a way to avoid compiler optimization
  • Declare it as parameter and you get one, don't instantiate

Let's Get Serious

After the intro, let's try really something out


/**
 * Our four methods
 * 
 * @author rschwietzke
 */
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training20
{
    final String toSplit = "foo,bar,mars,sun";
    
    @Benchmark
    public void split(final Blackhole b)
    {
        final String[] cols = toSplit.split(",");

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void tokenize(final Blackhole b)
    {
        final StringTokenizer st = new StringTokenizer(toSplit, ",");

        while (st.hasMoreTokens())
        {
            b.consume(st.nextToken());
        }
    }

    @Benchmark
    public void pattern(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        final String[] cols = p.split(toSplit); 

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void stream(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        p.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
}     

Our First Attempt

Just a naive first benchmark

  • Our four different split methods
  • Plain first usage and consumption (!)
  • Not yet mapped to a common usage pattern
  • So... what will be the result?

Benchmark            Mode  Cnt    Score    Error  Units
=======================================================
Training20.split     avgt    3  157.705 ± 25.934  ns/op
Training20.tokenize  avgt    3  135.870 ± 13.294  ns/op
Training20.pattern   avgt    3  400.350 ± 14.014  ns/op
Training20.stream    avgt    3  428.894 ± 17.496  ns/op    
  • What's next?
  • Finished?
  • Did we get a result?

No conclusive result

We are not there yet


Benchmark            Mode  Cnt    Score    Error  Units
=======================================================
Training20.split     avgt    3  157.705 ± 25.934  ns/op
Training20.tokenize  avgt    3  135.870 ± 13.294  ns/op
Training20.pattern   avgt    3  400.350 ± 14.014  ns/op
Training20.stream    avgt    3  428.894 ± 17.496  ns/op    
  • It is data... kind of a fact
  • It is nothing conclusive
  • It gives us a direction for more testing

Theories and Observations

  • String.split and Pattern.split should behave the same
  • Memory is the problem
  • Data size is bad for a Stream

Next Steps

  • Tune GC
  • Check the memory consumption
  • Try a pre-compiled pattern
  • Vary the data in size and content

$ java -jar target/benchmarks.jar -lprof

Supported profilers:
        cl: Classloader profiling via standard MBeans 
      comp: JIT compiler profiling via standard MBeans 
        gc: GC profiling via standard MBeans 
     hs_cl: HotSpot (tm) classloader profiling via ... MBeans 
   hs_comp: HotSpot (tm) JIT compiler profiling via ... MBeans 
     hs_gc: HotSpot (tm) memory manager (GC) profiling via ... MBeans 
     hs_rt: HotSpot (tm) runtime profiling via implementation-specific MBeans 
    hs_thr: HotSpot (tm) threading subsystem via implementation-specific MBeans 
    pauses: Pauses profiler 
      perf: Linux perf Statistics 
   perfasm: Linux perf + PrintAssembly Profiler 
  perfnorm: Linux perf statistics, normalized by operation count 
safepoints: Safepoints profiler 
     stack: Simple and naive Java stack profiler 

Unsupported profilers:
 dtraceasm: <none> 
  xperfasm: <none> 

Profile into a Problem

Use profilers to know more

  • See available profilers:
    java -jar target/benchmark.jar -lprof
  • Use a profiler:
    java -jar <file-here> Training20 -prof gc

Try Poor Man's Tuning

Do some GC research

  • Ask profiler first
  • Use G1 (JDK 11 default)
  • Give it more memory
  • Take GC out of the equation and run EpsilonGC

java -jar ... Training20 -prof gc


Benchmark                              Mode  Cnt     Score      Error   Units
=============================================================================
pattern                                avgt    3   405.561 ±   43.193   ns/op
pattern:·gc.alloc.rate                 avgt    3  1442.185 ±  167.736  MB/sec
pattern:·gc.alloc.rate.norm            avgt    3   920.000 ±    0.001    B/op
pattern:·gc.churn.G1_Eden_Space        avgt    3  1511.200 ± 3646.752  MB/sec
pattern:·gc.churn.G1_Eden_Space.norm   avgt    3   963.512 ± 2206.159    B/op
pattern:·gc.churn.G1_Old_Gen           avgt    3     0.001 ±    0.024  MB/sec
pattern:·gc.churn.G1_Old_Gen.norm      avgt    3     0.001 ±    0.015    B/op
pattern:·gc.count                      avgt    3    13.000             counts
pattern:·gc.time                       avgt    3    15.000                 ms

split                                  avgt    3   163.800 ±   30.267   ns/op
split:·gc.alloc.rate                   avgt    3  1086.629 ±  202.796  MB/sec
split:·gc.alloc.rate.norm              avgt    3   280.000 ±    0.001    B/op
split:·gc.churn.G1_Eden_Space          avgt    3  1169.404 ± 3357.654  MB/sec
split:·gc.churn.G1_Eden_Space.norm     avgt    3   301.071 ±  821.807    B/op
split:·gc.churn.G1_Old_Gen             avgt    3    ≈ 10⁻³             MB/sec
split:·gc.churn.G1_Old_Gen.norm        avgt    3    ≈ 10⁻⁴               B/op
split:·gc.count                        avgt    3    11.000             counts
split:·gc.time                         avgt    3    11.000                 ms

stream                                 avgt    3   445.319 ±   41.647   ns/op
stream:·gc.alloc.rate                  avgt    3  1472.936 ±  139.776  MB/sec
stream:·gc.alloc.rate.norm             avgt    3  1032.000 ±    0.001    B/op
stream:·gc.churn.G1_Eden_Space         avgt    3  1491.875 ±    8.525  MB/sec
stream:·gc.churn.G1_Eden_Space.norm    avgt    3  1045.289 ±  103.696    B/op
stream:·gc.churn.G1_Old_Gen            avgt    3    ≈ 10⁻⁴             MB/sec
stream:·gc.churn.G1_Old_Gen.norm       avgt    3    ≈ 10⁻⁴               B/op
stream:·gc.count                       avgt    3    12.000             counts
stream:·gc.time                        avgt    3    12.000                 ms

tokenize                               avgt    3   136.363 ±   58.743   ns/op
tokenize:·gc.alloc.rate                avgt    3  1118.825 ±  474.088  MB/sec
tokenize:·gc.alloc.rate.norm           avgt    3   240.000 ±    0.001    B/op
tokenize:·gc.churn.G1_Eden_Space       avgt    3  1126.091 ± 3234.817  MB/sec
tokenize:·gc.churn.G1_Eden_Space.norm  avgt    3   241.927 ±  745.129    B/op
tokenize:·gc.churn.G1_Old_Gen          avgt    3     0.001 ±    0.010  MB/sec
tokenize:·gc.churn.G1_Old_Gen.norm     avgt    3    ≈ 10⁻⁴               B/op
tokenize:·gc.count                     avgt    3    11.000             counts
tokenize:·gc.time                      avgt    3    10.000                 ms    

Try Poor Man's Tuning

Do some GC research

  • Ask profiler first
  • Use G1 (JDK 11 default)
  • Give it more memory
  • Take GC out of the equation and run EpsilonGC
  • -XX:+UseG1GC
  • -Xms4g
  • -Xmx4g
  • -XX:+AlwaysPreTouch

$ java -jar target/benchmarks.jar Training20
========================================================
Benchmark            Mode  Cnt    Score     Error  Units
Training20.pattern   avgt    3  405.122 ±  70.341  ns/op
Training20.split     avgt    3  156.316 ±  33.948  ns/op
Training20.stream    avgt    3  434.671 ± 105.422  ns/op
Training20.tokenize  avgt    3  133.950 ±  11.445  ns/op

$ java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch -jar ... Training20
Benchmark            Mode  Cnt    Score     Error  Units
========================================================
Training20.pattern   avgt    3  397.086 ±  27.243  ns/op
Training20.split     avgt    3  156.265 ±  54.684  ns/op
Training20.stream    avgt    3  422.262 ± 146.073  ns/op
Training20.tokenize  avgt    3  131.642 ±  14.806  ns/op  

Try Poor Man's Tuning

Do some GC research

  • Ask profiler first
  • Use G1 (JDK 11 default)
  • Give it more memory
  • Take GC out of the equation and run EpsilonGC
  • -Xms10g -Xmx10g
  • -XX:+AlwaysPreTouch
  • -XX:+UnlockExperimentalVMOptions
  • -XX:+UseEpsilonGC

$ java -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions \
    -XX:+UseEpsilonGC -jar target/benchmarks.jar Training20

# Warmup Iteration   1: 680.389 ns/op
# Warmup Iteration   2: 589.493 ns/op
# Warmup Iteration   3: 637.540 ns/op
# Warmup Iteration   4: Terminating due to java.lang.OutOfMemoryError: Java heap space
<forked VM failed with exit code 3>
<stdout last='20 lines'>
Terminating due to java.lang.OutOfMemoryError: Java heap space
</stdout>
<stderr last='20 lines'>
</stderr>
  • Ok, that is not helpful
  • We burn too much memory per iteration
  • But EpsilonGC is a good idea for less demanding case

    public class Training21
{
    final String toSplit = "foo,bar,mars,sun";
    final Pattern precompiledPattern = Pattern.compile(",");

    @Benchmark
    public void precompiledPattern(final Blackhole b)
    {
        final String[] cols = precompiledPattern.split(toSplit);
        
        for (String s : cols)
        {
            b.consume(s);
        }
    }
    
    @Benchmark
    public void precompiledStream(final Blackhole b)
    {
        precompiledPattern.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
...
    @Benchmark
    public void pattern(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        final String[] cols = p.split(toSplit);

        for (String s : cols)
        {
            b.consume(s);
        }
    }

    @Benchmark
    public void stream(final Blackhole b)
    {
        final Pattern p = Pattern.compile(",");
        p.splitAsStream(toSplit).forEach(s -> b.consume(s));
    }
}

The Pattern Theory

Compiling Patterns is Key

  • Precompile patterns or else

Benchmark                      Mode  Cnt    Score     Error  Units
Training21.pattern             avgt    3  475.992 ±  67.709  ns/op
Training21.precompiledPattern  avgt    3  315.103 ±  44.834  ns/op

Training21.stream              avgt    3  571.862 ± 623.681  ns/op
Training21.precompiledStream   avgt    3  349.376 ±  80.088  ns/op

Training21.split               avgt    3  182.281 ±  15.728  ns/op
Training21.tokenize            avgt    3  218.433 ± 807.886  ns/op
  • It helped a lot
  • Site by site comparison shows it clearly
  • Avoids comparing to other runs with maybe wrong assumptions
  • Always try to compare within a run if possible

public class Training30
{
    final FastRandom r = new FastRandom(7L);

    final String[] data = {
            "foo",
            "foo,bar",
            "foo,bar,buz",
            "foo,bar,buz,top",
            "foo,bar,buz,top,tof",
            "foo,bar,buz,top,tof,rot",
            "foo,bar,buz,top,tof,rot,qwe",
            "foo,bar,buz,top,tof,rot,qwe,pop",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
    };
    
    String toSplit;

    @Setup(Level.Invocation)
    public void setup()
    {
        toSplit = data[r.nextInt(data.length)];
    }
    
    ...
}    

What about Data?

Not the Android of course

  • @Setup permits to run any setup before
    • Level.Invocation any benchmark method call
    • Level.Iteration any measurement iteration
  • Iteration is default when nothing is specified

Benchmark                      Mode  Cnt    Score    Error  Units
Training30.pattern             avgt    3  587.557 ± 74.707  ns/op
Training30.precompiledPattern  avgt    3  441.839 ± 49.530  ns/op

Training30.stream              avgt    3  648.489 ± 18.975  ns/op
Training30.precompiledStream   avgt    3  494.524 ± 54.549  ns/op

Training30.split               avgt    3  272.838 ± 37.578  ns/op
Training30.tokenize            avgt    3  246.168 ± 51.308  ns/op

Benchmark                      Mode  Cnt    Score     Error  Units
Training21.pattern             avgt    3  475.992 ±  67.709  ns/op
Training21.precompiledPattern  avgt    3  315.103 ±  44.834  ns/op

Training21.stream              avgt    3  571.862 ± 623.681  ns/op
Training21.precompiledStream   avgt    3  349.376 ±  80.088  ns/op

Training21.split               avgt    3  182.281 ±  15.728  ns/op
Training21.tokenize            avgt    3  218.433 ± 807.886  ns/op

public class Training31
{
    @Param({"1", "5", "10", "20"})
    int size; 
    
    final String[] data = {
            "foo",
            "foo,bar,buz,top,tof,rot",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha",
            "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha," + 
                "foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
    };
    
    String toSplit;

    @Setup
    public void setup()
    {
        switch (size)
        {
            case 1: toSplit = data[0]; break;
            case 5: toSplit = data[1]; break;
            case 10: toSplit = data[2]; break;
            case 20: toSplit = data[3]; break;
        };
    }
    ...
}   

Precise Knowledge

Don't go over all data sizes at once

  • @Param data injected by JMH

Benchmark                      (size)  Mode  Cnt    Score 
Training31.pattern                  1  avgt    3  232.503
Training31.pattern                  5  avgt    3  546.355
Training31.pattern                 10  avgt    3  795.779
Training31.pattern                 20  avgt    3 1354.040

Training31.precompiledPattern       1  avgt    3   75.391
Training31.precompiledPattern       5  avgt    3  433.272
Training31.precompiledPattern      10  avgt    3  625.999
Training31.precompiledPattern      20  avgt    3 1227.985

Training31.stream                   1  avgt    3  275.538
Training31.stream                   5  avgt    3  638.750
Training31.stream                  10  avgt    3  864.800
Training31.stream                  20  avgt    3 1439.868

Training31.precompiledStream        1  avgt    3  121.478
Training31.precompiledStream        5  avgt    3  498.007
Training31.precompiledStream       10  avgt    3  743.830
Training31.precompiledStream       20  avgt    3 1305.923

Training31.split                    1  avgt    3    7.712
Training31.split                    5  avgt    3  245.822
Training31.split                   10  avgt    3  391.144
Training31.split                   20  avgt    3  841.006

Training31.tokenize                 1  avgt    3   12.375
Training31.tokenize                 5  avgt    3  239.217
Training31.tokenize                10  avgt    3  401.357
Training31.tokenize                20  avgt    3  828.597

public class Training32
{
    @Param({"1", "5", "10", "20"})
    int size; 

    @Param({"1", "2", "5", "10"})
    int length; 
    
    final Pattern precompiledPattern = Pattern.compile(",");

    String toSplit;

    @Setup
    public void setup()
    {
        String s = "";
        for (int i = 0; i < length; i++)
        {
            s += "a";
        }
        
        final StringJoiner sj = new StringJoiner(",");
        for (int i = 0; i < size; i++)
        {
            sj.add(s);
        }
        
        toSplit = sj.toString();
    }
...
}   

Very Data Even More

Two Dimensional Testing

  • Very size if the full string to split aka substring count
  • Very length of each substring
  • JMH spans all combinations for us
  • Test runtime explodes to 20 min or so

Very Data Even More


Benchmark                      (length)  (size)  Mode  Cnt   Score  Units
Training32.split                      1       1  avgt    3    16.5  ns/op
Training32.split                      1       5  avgt    3   218.7  ns/op
Training32.split                      1      10  avgt    3   358.0  ns/op
Training32.split                      1      20  avgt    3   745.0  ns/op

Training32.split                      2       1  avgt    3    17.2  ns/op
Training32.split                      2       5  avgt    3   225.3  ns/op
Training32.split                      2      10  avgt    3   402.9  ns/op
Training32.split                      2      20  avgt    3   746.1  ns/op

Training32.split                      5       1  avgt    3    17.2  ns/op
Training32.split                      5       5  avgt    3   227.9  ns/op
Training32.split                      5      10  avgt    3   395.1  ns/op
Training32.split                      5      20  avgt    3   834.4  ns/op

Training32.split                     10       1  avgt    3    19.8  ns/op
Training32.split                     10       5  avgt    3   242.1  ns/op
Training32.split                     10      10  avgt    3   409.8  ns/op
Training32.split                     10      20  avgt    3   849.1  ns/op

Training32.tokenize                   1       1  avgt    3     9.2  ns/op
Training32.tokenize                   1       5  avgt    3   133.2  ns/op
Training32.tokenize                   1      10  avgt    3   277.7  ns/op
Training32.tokenize                   1      20  avgt    3   552.7  ns/op

Training32.tokenize                   2       1  avgt    3     9.3  ns/op
Training32.tokenize                   2       5  avgt    3   143.3  ns/op
Training32.tokenize                   2      10  avgt    3   298.2  ns/op
Training32.tokenize                   2      20  avgt    3   592.4  ns/op

Training32.tokenize                   5       1  avgt    3    11.4  ns/op
Training32.tokenize                   5       5  avgt    3   145.9  ns/op
Training32.tokenize                   5      10  avgt    3   290.9  ns/op
Training32.tokenize                   5      20  avgt    3   682.7  ns/op

Training32.tokenize                  10       1  avgt    3    15.8  ns/op
Training32.tokenize                  10       5  avgt    3   160.2  ns/op
Training32.tokenize                  10      10  avgt    3   329.0  ns/op
Training32.tokenize                  10      20  avgt    3   645.3  ns/op

Benchmark                      (length)  (size)  Mode  Cnt   Score  Units
Training32.split                      1       1  avgt    3    16.5  ns/op
Training32.split                      2       1  avgt    3    17.2  ns/op
Training32.split                      5       1  avgt    3    17.2  ns/op
Training32.split                     10       1  avgt    3    19.8  ns/op

Training32.split                      1       5  avgt    3   218.7  ns/op
Training32.split                      2       5  avgt    3   225.3  ns/op
Training32.split                      5       5  avgt    3   227.9  ns/op
Training32.split                     10       5  avgt    3   242.1  ns/op

Training32.split                      1      10  avgt    3   358.0  ns/op
Training32.split                      2      10  avgt    3   402.9  ns/op
Training32.split                      5      10  avgt    3   395.1  ns/op
Training32.split                     10      10  avgt    3   409.8  ns/op

Training32.split                      1      20  avgt    3   745.0  ns/op
Training32.split                      2      20  avgt    3   746.1  ns/op
Training32.split                      5      20  avgt    3   834.4  ns/op
Training32.split                     10      20  avgt    3   849.1  ns/op

Training32.tokenize                   1       1  avgt    3     9.2  ns/op
Training32.tokenize                   2       1  avgt    3     9.3  ns/op
Training32.tokenize                   5       1  avgt    3    11.4  ns/op
Training32.tokenize                  10       1  avgt    3    15.8  ns/op

Training32.tokenize                   1       5  avgt    3   133.2  ns/op
Training32.tokenize                   2       5  avgt    3   143.3  ns/op
Training32.tokenize                   5       5  avgt    3   145.9  ns/op
Training32.tokenize                  10       5  avgt    3   160.2  ns/op

Training32.tokenize                   1      10  avgt    3   277.7  ns/op
Training32.tokenize                   2      10  avgt    3   298.2  ns/op
Training32.tokenize                   5      10  avgt    3   290.9  ns/op
Training32.tokenize                  10      10  avgt    3   329.0  ns/op

Training32.tokenize                   1      20  avgt    3   552.7  ns/op
Training32.tokenize                   2      20  avgt    3   592.4  ns/op
Training32.tokenize                   5      20  avgt    3   682.7  ns/op
Training32.tokenize                  10      20  avgt    3   645.3  ns/op

public class Training35
{
    final Pattern precompiledPattern = Pattern.compile(",");
    String toSplit = "Go,Java,Rust,C";
    
    @Benchmark
    public List<String> precompiledPattern()
    {   
        final String[] cols = precompiledPattern.split(toSplit);
        return Arrays.asList(cols);
    }

     @Benchmark
     public List<String> pattern()
     {
         final Pattern p = Pattern.compile(",");
         final String[] cols = p.split(toSplit);
         
         return Arrays.asList(cols);
     }

     @Benchmark
     public List<String> tokenize()
     {
         final StringTokenizer st = new StringTokenizer(toSplit, ",");
         final List<String> result = new ArrayList<>();
         
         while (st.hasMoreTokens())
         {
             result.add(st.nextToken());
         }
         
         return result;
     }
...
}   

Back to the Goal

We wanted more than this

  • Results are supposed to be List<String>
  • Simplify a little (less data) and rewrite

Benchmark                       Mode  Cnt  Score  Units
Training35.tokenize             avgt    3  165.3  ns/op
Training35.split                avgt    3  184.4  ns/op
Training35.precompiledPattern   avgt    3  325.0  ns/op
Training35.precompiledStream    avgt    3  347.9  ns/op
Training35.pattern              avgt    3  432.1  ns/op
Training35.stream               avgt    3  478.5  ns/op

# Original test with the data of Training35
Benchmark                       Mode  Cnt  Score  Units
Training35a.tokenize            avgt    3  113.4  ns/op
Training35a.split               avgt    3  195.1  ns/op
Training35a.precompiledPattern  avgt    3  314.0  ns/op
Training35a.precompiledStream   avgt    3  278.2  ns/op
Training35a.pattern             avgt    3  435.8  ns/op
Training35a.stream              avgt    3  407.0  ns/op

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.SingleShotTime)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training40
{
    ...
    @Benchmark
    @Warmup(iterations = 0)
    @Measurement(iterations = 10)
    public List<String> tokenizeCold()
    {
        return tokenize(toSplit);
    }

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    public List<String> tokenizeWarm()
    {
        return tokenize(toSplit);
    }
    
    @Benchmark
    @Warmup(iterations = 100)
    @Measurement(iterations = 10)
    public List<String> tokenizeWarmer()
    {
        return tokenize(toSplit);
    }

    @Benchmark
    @Warmup(iterations = 10000)
    @Measurement(iterations = 10)
    public List<String> tokenizeHot()
    {
        return tokenize(toSplit);
    }      
    ...
}

Cold Performance

Checking warmup performance

  • So far, we checked the hot performance
  • All warmup pain was ignored
  • @BenchmarkMode(Mode.SingleShotTime) measures one time execution
  • @Warmup(iterations = 1) iterations define how often is measured

It gets really tricky


# Fork(1)
Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       300232.000               ns/op
Training40.splitCold          ss   10   52054.900 ± 139920.099  ns/op
Training40.splitHot           ss   10    6136.200 ±   6489.506  ns/op
Training40.splitWarm          ss   10   12353.500 ±   4100.746  ns/op
Training40.splitWarmer        ss   10   15408.600 ±  15743.686  ns/op
Training40.tokenizeIceCold    ss        73831.000               ns/op
Training40.tokenizeCold       ss   10   21710.100 ±  29223.398  ns/op
Training40.tokenizeWarm       ss   10   10834.800 ±   2105.584  ns/op
Training40.tokenizeWarmer     ss   10   12971.800 ±  13742.500  ns/op
Training40.tokenizeHot        ss   10    2547.300 ±   2586.435  ns/op

Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       292800.000               ns/op
Training40.splitCold          ss   10   51979.000 ± 159071.783  ns/op
Training40.splitHot           ss   10    5334.300 ±   4306.467  ns/op
Training40.splitWarm          ss   10   14026.800 ±   3826.923  ns/op
Training40.splitWarmer        ss   10   17236.500 ±  14239.322  ns/op
Training40.tokenizeIceCold    ss        77906.000               ns/op
Training40.tokenizeCold       ss   10   21006.500 ±  32446.097  ns/op
Training40.tokenizeWarm       ss   10    9151.900 ±   2316.037  ns/op
Training40.tokenizeWarmer     ss   10   14808.800 ±  14499.507  ns/op
Training40.tokenizeHot        ss   10    2360.800 ±   3371.624  ns/op

Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss       444264.000               ns/op
Training40.splitCold          ss   10   56199.600 ± 132136.303  ns/op
Training40.splitWarm          ss   10   15304.500 ±   4467.518  ns/op
Training40.splitWarmer        ss   10   11234.500 ±   9546.547  ns/op
Training40.splitHot           ss   10    5776.100 ±   6985.279  ns/op
Training40.tokenizeIceCold    ss        74142.000               ns/op
Training40.tokenizeCold       ss   10   19142.300 ±  32839.600  ns/op
Training40.tokenizeWarm       ss   10   13393.500 ±   4882.290  ns/op
Training40.tokenizeWarmer     ss   10    6822.200 ±   3611.820  ns/op
Training40.tokenizeHot        ss   10    1248.800 ±   1709.296  ns/op

  • Results vary a lot
  • Trend is visible
  • Use @Fork(10) to measure more often from scratch
  • Fork repeats the entire test as it would be started fresh

# Fork(10)
Benchmark                   Mode  Cnt       Score        Error  Units
Training40.splitIceCold       ss   10  380923.600 ±  96774.549  ns/op
Training40.splitCold          ss  100   88485.930 ± 102858.604  ns/op
Training40.splitWarm          ss  100   19252.030 ±   8536.751  ns/op
Training40.splitHot           ss  100    5804.700 ±   2754.319  ns/op
Training40.splitWarmer        ss  100   14554.390 ±   2554.317  ns/op
Training40.tokenizeIceCold    ss   10   78459.300 ±  33153.083  ns/op
Training40.tokenizeCold       ss  100   22712.400 ±   8913.579  ns/op
Training40.tokenizeWarm       ss  100   12422.790 ±   1382.564  ns/op
Training40.tokenizeWarmer     ss  100   13023.690 ±   3353.144  ns/op
Training40.tokenizeHot        ss  100    2047.580 ±   1066.626  ns/op

Wrong assumptions

Are we really the first?

  • Benchmark assume we are the first to use split() and StringTokenizer
  • Might not be true, just our code is cold, not everything
  • Use @Setup again to touch classes before use

public class Training41
{
    ...
    public String hold;
    
    @Setup
    public void setup()
    {
        // use String.spllt, but differently from us
        String[] result = "Test:Test".split(":");
        
        // use Tokenizer
        StringTokenizer st = new StringTokenizer(":");
        
        // ensure we are not removed
        hold = st.toString() + result;
    }
    ...
}

# Don't touch (Training40)
Benchmark                   Mode  Cnt        Score        Error  Units
Training40.splitIceCold       ss   10  380,923.600 ±  96774.549  ns/op
Training40.splitCold          ss  100   88,485.930 ± 102858.604  ns/op
Training40.splitWarm          ss  100   19,252.030 ±   8536.751  ns/op
Training40.splitHot           ss  100    5,804.700 ±   2754.319  ns/op
Training40.splitWarmer        ss  100   14,554.390 ±   2554.317  ns/op
Training40.tokenizeIceCold    ss   10   78,459.300 ±  33153.083  ns/op
Training40.tokenizeCold       ss  100   22,712.400 ±   8913.579  ns/op
Training40.tokenizeWarm       ss  100   12,422.790 ±   1382.564  ns/op
Training40.tokenizeWarmer     ss  100   13,023.690 ±   3353.144  ns/op
Training40.tokenizeHot        ss  100    2,047.580 ±   1066.626  ns/op

# Touch before use (Training41)
Benchmark                   Mode  Cnt        Score       Error  Units
Training41.splitIceCold       ss   10   58,826.800 ± 21882.562  ns/op
Training41.splitCold          ss  100   23,466.180 ±  6082.829  ns/op
Training41.splitWarm          ss  100   14,938.110 ±  1308.992  ns/op
Training41.splitWarmer        ss  100   12,317.460 ±  1884.732  ns/op
Training41.splitHot           ss  100    4,120.080 ±  2151.355  ns/op
Training41.tokenizeIceCold    ss   10   53,629.900 ± 34002.030  ns/op
Training41.tokenizeCold       ss  100   19,350.450 ±  6708.144  ns/op
Training41.tokenizeWarm       ss  100   11,405.080 ±   985.338  ns/op
Training41.tokenizeWarmer     ss  100    9,649.810 ±  2937.053  ns/op
Training41.tokenizeHot        ss  100    2,038.750 ±   692.256  ns/op
  • Matches expectations
  • Also shows that all results changed
  • Not sure if this is true

Help us test the benchmark

Need debugging and testing capabilities

  • We have not verified our code at all
  • When debugging is needed we cannot start it
  • Avoid using IDE based JMH, rather add JUnit to the mix
  • Needs a POM update too
  • Make the setup method also a JUnit setup
  • Either add a test method or use our method to test as well

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.12</version>
</dependency>

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread) // ignore for now
public class Training50
{
    private String toSplit;

    @Setup
    @Before
    public void setup()
    {
        toSplit = "C,Rust,Java";
    }
    
    @Benchmark
    public List<String> split()
    {
        final String[] cols = toSplit.split(",");
        return Arrays.asList(cols);
    }
    
    @Test
    public void testSplit()
    {
        final List<String> result = split();
        Assert.assertEquals(3, result.size());
        Assert.assertEquals("C", result.get(0));
        Assert.assertEquals("Rust", result.get(1));
        Assert.assertEquals("Java", result.get(2));
    }
}

Questions and Answers