How to do Microbenchmarking Correctly
Programming language power consumption aka efficiency
Source: https://sites.google.com/view/energy-efficiency-languages/home
What do we want to learn today?
Less typing and more reading
https://github.com/Xceptance/jmh-training
No Challenge, no Fun
Split a String
with common tooling
Java,Go,Rust
,
List<String>
String.split()
java.util.StringTokenizer
Pattern.split()
Pattern.splitAsStream()
Benchmarking needs theories and expectations
The Basics To Get Started
Why do we need a measurement framework?
System.currentTimeMillis
can be cachedhttps://github.com/Xceptance/jmh-training
The minimal benchmark setup
mvn clean compile install
java -jar target/benchmarks.jar Training01
package org.xceptance.training.jmh;
import org.openjdk.jmh.annotations.Benchmark;
/**
* Simplies benchmark ever. All default and we measure nothing
*
* @author rschwietzke
*/
public class Training01
{
@Benchmark
public void bench()
{
}
}
# JMH version: 1.21
# VM version: JDK 11.0.4, OpenJDK 64-Bit Server VM, 11.0.4
# VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
# VM options: <none>
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.xceptance.training.jmh.Training01.bench
# Run progress: 0.00% complete, ETA 00:08:20
# Fork: 1 of 5
# Warmup Iteration 1: 1583355399.039 ops/s
# Warmup Iteration 2: 1539214226.386 ops/s
# Warmup Iteration 3: 1497187122.677 ops/s
# Warmup Iteration 4: 1464143076.223 ops/s
# Warmup Iteration 5: 1408890030.487 ops/s
Iteration 1: 1400594747.986 ops/s
Iteration 2: 1443236681.579 ops/s
Iteration 3: 1394508697.454 ops/s
Iteration 4: 1423377056.796 ops/s
Iteration 5: 1418124423.941 ops/s
...
https://github.com/Xceptance/jmh-training
How to control the overall measurement
@Warmup
: How long to warm the VM@Measurement
: How often to measure@Fork
: How often to repeat in a new VM
/**
* Restrain the execution and explain warmup and measurement
* Warmup 3x for 1 sec
* Measure 5x for 1 sec and average the result
* Fork only once
*
* @author rschwietzke
*/
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training02
{
@Benchmark
public void bench()
{
}
}
# Run progress: 0.00% complete, ETA 00:00:08
# Fork: 1 of 1
# Warmup Iteration 1: 1641713453.126 ops/s
# Warmup Iteration 2: 1581322414.154 ops/s
# Warmup Iteration 3: 1549432518.891 ops/s
Iteration 1: 1523201697.274 ops/s
Iteration 2: 1559404970.899 ops/s
Iteration 3: 1543778128.057 ops/s
Iteration 4: 1597859305.474 ops/s
Iteration 5: 1584935253.225 ops/s
Result "org.xceptance.training.jmh.Training02.bench":
1561835870.986 ±(99.9%) 116412105.696 ops/s [Average]
(min, avg, max) = (1523201697.274, 1561835870.986, 1597859305.474),
stdev = 30231854.167
CI (99.9%): [1445423765.289, 1678247976.682] (assumes normal distribution)
# Run complete. Total time: 00:00:08
Benchmark Mode Cnt Score Error Units
Training02.bench thrpt 5 1561835870.986 ± 116412105.696 ops/s
https://github.com/Xceptance/jmh-training
Change the measurement mode
@BenchmarkMode
: How to measure
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training03
{
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void benchAvg() throws InterruptedException
{
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void benchSample() throws InterruptedException
{
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public void benchThroughput() throws InterruptedException
{
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void benchSingle() throws InterruptedException
{
TimeUnit.MILLISECONDS.sleep(100);
}
}
Benchmark Mode Cnt Score Error Units
Training03.benchThroughput thrpt 5 9.982 ± 0.003 ops/s
Training03.benchAvg avgt 5 100182.424 ± 69.858 us/op
Training03.benchSample sample 50 100.110 ± 0.027 ms/op
Training03.benchSample:benchSample·p0.00 sample 100.008 ms/op
Training03.benchSample:benchSample·p0.50 sample 100.139 ms/op
Training03.benchSample:benchSample·p0.90 sample 100.139 ms/op
Training03.benchSample:benchSample·p0.95 sample 100.139 ms/op
Training03.benchSample:benchSample·p0.99 sample 100.139 ms/op
Training03.benchSample:benchSample·p0.999 sample 100.139 ms/op
Training03.benchSample:benchSample·p0.9999 sample 100.139 ms/op
Training03.benchSample:benchSample·p1.00 sample 100.139 ms/op
Training03.benchSingle ss 5 100.157 ± 0.143 ms/op
https://github.com/Xceptance/jmh-training
Try some theories and start small
Don't think about it, just do it.
/**
* Small test first. Totally naive!!
*
* @author rschwietzke
*/
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training05
{
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void bench()
{
"Foo,Bar,Test".split(",");
}
}
# Run progress: 0.00% complete, ETA 00:00:13
# Fork: 1 of 1
# Warmup Iteration 1: 194.662 ns/op
# Warmup Iteration 2: 166.320 ns/op
# Warmup Iteration 3: 158.765 ns/op
Iteration 1: 155.177 ns/op
Iteration 2: 148.119 ns/op
Iteration 3: 143.208 ns/op
Iteration 4: 144.138 ns/op
Iteration 5: 143.335 ns/op
Iteration 6: 144.108 ns/op
Iteration 7: 146.223 ns/op
Iteration 8: 145.873 ns/op
Iteration 9: 142.524 ns/op
Iteration 10: 144.172 ns/op
Benchmark Mode Cnt Score Error Units
Training05.bench avgt 10 145.688 ± 5.638 ns/op
https://github.com/Xceptance/jmh-training
Try to correct the unsteady result.
/**
* Small test first. Totally naive!!
* Now with more warmup.
*
* @author rschwietzke
*/
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training06
{
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void bench()
{
"Foo,Bar,Test".split(",");
}
}
# Run progress: 0.00% complete, ETA 00:00:20
# Fork: 1 of 1
# Warmup Iteration 1: 162.427 ns/op
# Warmup Iteration 2: 152.925 ns/op
# Warmup Iteration 3: 139.471 ns/op
# Warmup Iteration 4: 139.766 ns/op
# Warmup Iteration 5: 140.489 ns/op
Iteration 1: 142.501 ns/op
Iteration 2: 140.077 ns/op
Iteration 3: 140.256 ns/op
Iteration 4: 142.049 ns/op
Iteration 5: 140.772 ns/op
Iteration 6: 142.111 ns/op
Iteration 7: 140.427 ns/op
Iteration 8: 142.458 ns/op
Iteration 9: 142.550 ns/op
Iteration 10: 141.791 ns/op
Benchmark Mode Cnt Score Error Units
Training06.bench avgt 10 141.499 ± 1.515 ns/op
https://github.com/Xceptance/jmh-training
Measure it five times
@Fork(5)
to repeat it five times.
# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 5
# Warmup Iteration 1: 209.659 ns/op
# Warmup Iteration 2: 155.465 ns/op
# Warmup Iteration 3: 139.121 ns/op
# Warmup Iteration 4: 140.133 ns/op
# Warmup Iteration 5: 140.187 ns/op
Iteration 1: 139.212 ns/op
...
Iteration 10: 139.401 ns/op
# Run progress: 20.00% complete, ETA 00:01:22
# Fork: 2 of 5
# Warmup Iteration 1: 159.677 ns/op
...
# Warmup Iteration 5: 138.522 ns/op
Iteration 1: 139.015 ns/op
...
Iteration 10: 145.531 ns/op
# Run progress: 40.00% complete, ETA 00:01:01
# Fork: 3 of 5
# Warmup Iteration 1: 168.662 ns/op
...
# Warmup Iteration 5: 141.841 ns/op
Iteration 1: 142.497 ns/op
...
Iteration 10: 143.128 ns/op
# Run progress: 60.00% complete, ETA 00:00:40
# Fork: 4 of 5
# Warmup Iteration 1: 165.220 ns/op
...
# Warmup Iteration 5: 144.540 ns/op
Iteration 1: 144.370 ns/op
...
Iteration 10: 144.914 ns/op
# Run progress: 80.00% complete, ETA 00:00:20
# Fork: 5 of 5
# Warmup Iteration 1: 175.664 ns/op
...
# Warmup Iteration 5: 134.586 ns/op
Iteration 1: 134.711 ns/op
...
Iteration 10: 136.447 ns/op
Benchmark Mode Cnt Score Error Units
Training07.bench avgt 50 140.803 ± 1.530 ns/op
https://github.com/Xceptance/jmh-training
BAT vs. AC and JDK8 vs. JDK11
# JDK 11, BAT mode
# Warmup Iteration 1: 160.005 ns/op
# Warmup Iteration 2: 158.151 ns/op
# Warmup Iteration 3: 137.997 ns/op
# Warmup Iteration 4: 137.938 ns/op
# Warmup Iteration 5: 137.956 ns/op
Iteration 1: 153.733 ns/op
Iteration 2: 145.046 ns/op
Iteration 3: 137.993 ns/op
...
Iteration 8: 137.727 ns/op
Iteration 9: 138.723 ns/op
Iteration 10: 138.151 ns/op
Benchmark Mode Cnt Score Error Units
Training06.bench avgt 10 140.685 ± 7.650 ns/op
# JDK 8, BAT mode
# Warmup Iteration 1: 175.988 ns/op
# Warmup Iteration 2: 146.560 ns/op
# Warmup Iteration 3: 144.860 ns/op
# Warmup Iteration 4: 145.489 ns/op
# Warmup Iteration 5: 147.307 ns/op
Iteration 1: 147.824 ns/op
Iteration 2: 153.391 ns/op
Iteration 3: 147.060 ns/op
...
Iteration 8: 147.824 ns/op
Iteration 9: 146.193 ns/op
Iteration 10: 147.381 ns/op
Benchmark Mode Cnt Score Error Units
Training06.bench avgt 10 147.658 ± 3.227 ns/op
# JDK 11, AC mode
# Warmup Iteration 1: 124.700 ns/op
# Warmup Iteration 2: 114.472 ns/op
# Warmup Iteration 3: 112.499 ns/op
# Warmup Iteration 4: 109.766 ns/op
# Warmup Iteration 5: 109.921 ns/op
Iteration 1: 110.670 ns/op
Iteration 2: 109.924 ns/op
Iteration 3: 110.574 ns/op
...
Iteration 8: 114.441 ns/op
Iteration 9: 110.204 ns/op
Iteration 10: 109.732 ns/op
Benchmark Mode Cnt Score Error Units
Training06.bench avgt 10 110.667 ± 2.085 ns/op
# JDK 11, AC mode, Benchmark in Background
# Warmup Iteration 1: 152.907 ns/op
# Warmup Iteration 2: 165.058 ns/op
# Warmup Iteration 3: 143.192 ns/op
# Warmup Iteration 4: 151.717 ns/op
# Warmup Iteration 5: 205.670 ns/op
Iteration 1: 265.120 ns/op
Iteration 2: 193.039 ns/op
Iteration 3: 220.406 ns/op
...
Iteration 8: 198.791 ns/op
Iteration 9: 225.030 ns/op
Iteration 10: 179.021 ns/op
Benchmark Mode Cnt Score Error Units
Training06.bench avgt 10 222.567 ± 42.339 ns/op
https://github.com/Xceptance/jmh-training
/**
* Ok, we still have to learn one thing, one thing only....
* Return or eat something otherwise this is nonsense.
*
* @author rschwietzke
*/
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training08
{
@Benchmark
public void s1()
{
new String();
}
@Benchmark
public void s2()
{
String s = "";
}
@Benchmark
public void s3()
{
String s = new String("");
}
@Benchmark
public String s1_r()
{
return new String();
}
@Benchmark
public String s2_r()
{
return "";
}
@Benchmark
public String s3_r()
{
return new String("");
}
@Benchmark
public void nothing()
{
}
}
There is still something wrong
Benchmark Mode Cnt Score Error Units
Training08.s1 avgt 3 0.503 ± 0.047 ns/op
Training08.s2 avgt 3 0.498 ± 0.019 ns/op
Training08.s3 avgt 3 0.511 ± 0.113 ns/op
Training08.s1_r avgt 3 6.446 ± 0.566 ns/op
Training08.s2_r avgt 3 3.259 ± 0.238 ns/op
Training08.s3_r avgt 3 7.439 ± 1.736 ns/op
Training08.nothing avgt 3 0.502 ± 0.141 ns/op
# Execution Interpreter only: -Xint
Benchmark Mode Cnt Score Error Units
Training08.s1 avgt 3 289.713 ± 35.695 ns/op
Training08.s2 avgt 3 86.911 ± 4.295 ns/op
Training08.s3 avgt 3 284.768 ± 46.444 ns/op
Training08.s1_r avgt 3 353.986 ± 81.802 ns/op
Training08.s2_r avgt 3 167.986 ± 38.805 ns/op
Training08.s3_r avgt 3 366.652 ± 90.978 ns/op
Training08.nothing avgt 3 92.487 ± 1.405 ns/op
https://github.com/Xceptance/jmh-training
/**
* Just returning does not help... the right thing has to be returned
*
* @author rschwietzke
*/
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training09
{
@Benchmark
public long returnLong()
{
return 1 + new Random().nextLong();
}
@Benchmark
public long returnIgnoreString()
{
final long l = new Random().nextLong();
final String s = String.valueOf(l);
return 10 + l;
}
@Benchmark
public long returnAndUseString()
{
final long l = new Random().nextLong();
final String s = String.valueOf(l);
return s.length() + l;
}
}
Well, it is important what to return
Benchmark Mode Cnt Score Error Units
Training09.returnLong avgt 3 58.611 ± 1.708 ns/op
Training09.returnIgnoreString avgt 3 101.467 ± 4.931 ns/op
Training09.returnAndUseString avgt 3 104.862 ± 33.953 ns/op
https://github.com/Xceptance/jmh-training
/**
* Blackholes to the rescue
*
* @author rschwietzke
*/
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Training10
{
@Benchmark
public void s1_b(final Blackhole b)
{
b.consume(new String());
}
@Benchmark
public void s2_b(final Blackhole b)
{
b.consume("");
}
@Benchmark
public void s3_b(final Blackhole b)
{
b.consume(new String(""));
}
}
Just make things disappear
Benchmark Mode Cnt Score Error Units
Training10.s1_r avgt 3 6.409 ± 0.487 ns/op
Training10.s2_r avgt 3 3.251 ± 0.513 ns/op
Training10.s3_r avgt 3 7.325 ± 0.890 ns/op
Training10.s1_b avgt 3 6.349 ± 0.487 ns/op
Training10.s2_b avgt 3 3.256 ± 0.439 ns/op
Training10.s3_b avgt 3 7.258 ± 0.253 ns/op
Training10.nothing avgt 3 0.504 ± 0.184 ns/op
https://github.com/Xceptance/jmh-training
After the intro, let's try really something out
/**
* Our four methods
*
* @author rschwietzke
*/
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training20
{
final String toSplit = "foo,bar,mars,sun";
@Benchmark
public void split(final Blackhole b)
{
final String[] cols = toSplit.split(",");
for (String s : cols)
{
b.consume(s);
}
}
@Benchmark
public void tokenize(final Blackhole b)
{
final StringTokenizer st = new StringTokenizer(toSplit, ",");
while (st.hasMoreTokens())
{
b.consume(st.nextToken());
}
}
@Benchmark
public void pattern(final Blackhole b)
{
final Pattern p = Pattern.compile(",");
final String[] cols = p.split(toSplit);
for (String s : cols)
{
b.consume(s);
}
}
@Benchmark
public void stream(final Blackhole b)
{
final Pattern p = Pattern.compile(",");
p.splitAsStream(toSplit).forEach(s -> b.consume(s));
}
}
Just a naive first benchmark
Benchmark Mode Cnt Score Error Units
=======================================================
Training20.split avgt 3 157.705 ± 25.934 ns/op
Training20.tokenize avgt 3 135.870 ± 13.294 ns/op
Training20.pattern avgt 3 400.350 ± 14.014 ns/op
Training20.stream avgt 3 428.894 ± 17.496 ns/op
https://github.com/Xceptance/jmh-training
We are not there yet
Benchmark Mode Cnt Score Error Units
=======================================================
Training20.split avgt 3 157.705 ± 25.934 ns/op
Training20.tokenize avgt 3 135.870 ± 13.294 ns/op
Training20.pattern avgt 3 400.350 ± 14.014 ns/op
Training20.stream avgt 3 428.894 ± 17.496 ns/op
String.split
and Pattern.split
should behave the same
$ java -jar target/benchmarks.jar -lprof
Supported profilers:
cl: Classloader profiling via standard MBeans
comp: JIT compiler profiling via standard MBeans
gc: GC profiling via standard MBeans
hs_cl: HotSpot (tm) classloader profiling via ... MBeans
hs_comp: HotSpot (tm) JIT compiler profiling via ... MBeans
hs_gc: HotSpot (tm) memory manager (GC) profiling via ... MBeans
hs_rt: HotSpot (tm) runtime profiling via implementation-specific MBeans
hs_thr: HotSpot (tm) threading subsystem via implementation-specific MBeans
pauses: Pauses profiler
perf: Linux perf Statistics
perfasm: Linux perf + PrintAssembly Profiler
perfnorm: Linux perf statistics, normalized by operation count
safepoints: Safepoints profiler
stack: Simple and naive Java stack profiler
Unsupported profilers:
dtraceasm: <none>
xperfasm: <none>
Use profilers to know more
java -jar target/benchmark.jar -lprof
java -jar <file-here> Training20 -prof gc
https://github.com/Xceptance/jmh-training
Do some GC research
java -jar ... Training20 -prof gc
Benchmark Mode Cnt Score Error Units
=============================================================================
pattern avgt 3 405.561 ± 43.193 ns/op
pattern:·gc.alloc.rate avgt 3 1442.185 ± 167.736 MB/sec
pattern:·gc.alloc.rate.norm avgt 3 920.000 ± 0.001 B/op
pattern:·gc.churn.G1_Eden_Space avgt 3 1511.200 ± 3646.752 MB/sec
pattern:·gc.churn.G1_Eden_Space.norm avgt 3 963.512 ± 2206.159 B/op
pattern:·gc.churn.G1_Old_Gen avgt 3 0.001 ± 0.024 MB/sec
pattern:·gc.churn.G1_Old_Gen.norm avgt 3 0.001 ± 0.015 B/op
pattern:·gc.count avgt 3 13.000 counts
pattern:·gc.time avgt 3 15.000 ms
split avgt 3 163.800 ± 30.267 ns/op
split:·gc.alloc.rate avgt 3 1086.629 ± 202.796 MB/sec
split:·gc.alloc.rate.norm avgt 3 280.000 ± 0.001 B/op
split:·gc.churn.G1_Eden_Space avgt 3 1169.404 ± 3357.654 MB/sec
split:·gc.churn.G1_Eden_Space.norm avgt 3 301.071 ± 821.807 B/op
split:·gc.churn.G1_Old_Gen avgt 3 ≈ 10⁻³ MB/sec
split:·gc.churn.G1_Old_Gen.norm avgt 3 ≈ 10⁻⁴ B/op
split:·gc.count avgt 3 11.000 counts
split:·gc.time avgt 3 11.000 ms
stream avgt 3 445.319 ± 41.647 ns/op
stream:·gc.alloc.rate avgt 3 1472.936 ± 139.776 MB/sec
stream:·gc.alloc.rate.norm avgt 3 1032.000 ± 0.001 B/op
stream:·gc.churn.G1_Eden_Space avgt 3 1491.875 ± 8.525 MB/sec
stream:·gc.churn.G1_Eden_Space.norm avgt 3 1045.289 ± 103.696 B/op
stream:·gc.churn.G1_Old_Gen avgt 3 ≈ 10⁻⁴ MB/sec
stream:·gc.churn.G1_Old_Gen.norm avgt 3 ≈ 10⁻⁴ B/op
stream:·gc.count avgt 3 12.000 counts
stream:·gc.time avgt 3 12.000 ms
tokenize avgt 3 136.363 ± 58.743 ns/op
tokenize:·gc.alloc.rate avgt 3 1118.825 ± 474.088 MB/sec
tokenize:·gc.alloc.rate.norm avgt 3 240.000 ± 0.001 B/op
tokenize:·gc.churn.G1_Eden_Space avgt 3 1126.091 ± 3234.817 MB/sec
tokenize:·gc.churn.G1_Eden_Space.norm avgt 3 241.927 ± 745.129 B/op
tokenize:·gc.churn.G1_Old_Gen avgt 3 0.001 ± 0.010 MB/sec
tokenize:·gc.churn.G1_Old_Gen.norm avgt 3 ≈ 10⁻⁴ B/op
tokenize:·gc.count avgt 3 11.000 counts
tokenize:·gc.time avgt 3 10.000 ms
https://github.com/Xceptance/jmh-training
Do some GC research
-XX:+UseG1GC
-Xms4g
-Xmx4g
-XX:+AlwaysPreTouch
$ java -jar target/benchmarks.jar Training20
========================================================
Benchmark Mode Cnt Score Error Units
Training20.pattern avgt 3 405.122 ± 70.341 ns/op
Training20.split avgt 3 156.316 ± 33.948 ns/op
Training20.stream avgt 3 434.671 ± 105.422 ns/op
Training20.tokenize avgt 3 133.950 ± 11.445 ns/op
$ java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch -jar ... Training20
Benchmark Mode Cnt Score Error Units
========================================================
Training20.pattern avgt 3 397.086 ± 27.243 ns/op
Training20.split avgt 3 156.265 ± 54.684 ns/op
Training20.stream avgt 3 422.262 ± 146.073 ns/op
Training20.tokenize avgt 3 131.642 ± 14.806 ns/op
https://github.com/Xceptance/jmh-training
Do some GC research
-Xms10g -Xmx10g
-XX:+AlwaysPreTouch
-XX:+UnlockExperimentalVMOptions
-XX:+UseEpsilonGC
$ java -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions \
-XX:+UseEpsilonGC -jar target/benchmarks.jar Training20
# Warmup Iteration 1: 680.389 ns/op
# Warmup Iteration 2: 589.493 ns/op
# Warmup Iteration 3: 637.540 ns/op
# Warmup Iteration 4: Terminating due to java.lang.OutOfMemoryError: Java heap space
<forked VM failed with exit code 3>
<stdout last='20 lines'>
Terminating due to java.lang.OutOfMemoryError: Java heap space
</stdout>
<stderr last='20 lines'>
</stderr>
https://github.com/Xceptance/jmh-training
public class Training21
{
final String toSplit = "foo,bar,mars,sun";
final Pattern precompiledPattern = Pattern.compile(",");
@Benchmark
public void precompiledPattern(final Blackhole b)
{
final String[] cols = precompiledPattern.split(toSplit);
for (String s : cols)
{
b.consume(s);
}
}
@Benchmark
public void precompiledStream(final Blackhole b)
{
precompiledPattern.splitAsStream(toSplit).forEach(s -> b.consume(s));
}
...
@Benchmark
public void pattern(final Blackhole b)
{
final Pattern p = Pattern.compile(",");
final String[] cols = p.split(toSplit);
for (String s : cols)
{
b.consume(s);
}
}
@Benchmark
public void stream(final Blackhole b)
{
final Pattern p = Pattern.compile(",");
p.splitAsStream(toSplit).forEach(s -> b.consume(s));
}
}
Compiling Patterns is Key
Benchmark Mode Cnt Score Error Units
Training21.pattern avgt 3 475.992 ± 67.709 ns/op
Training21.precompiledPattern avgt 3 315.103 ± 44.834 ns/op
Training21.stream avgt 3 571.862 ± 623.681 ns/op
Training21.precompiledStream avgt 3 349.376 ± 80.088 ns/op
Training21.split avgt 3 182.281 ± 15.728 ns/op
Training21.tokenize avgt 3 218.433 ± 807.886 ns/op
https://github.com/Xceptance/jmh-training
public class Training30
{
final FastRandom r = new FastRandom(7L);
final String[] data = {
"foo",
"foo,bar",
"foo,bar,buz",
"foo,bar,buz,top",
"foo,bar,buz,top,tof",
"foo,bar,buz,top,tof,rot",
"foo,bar,buz,top,tof,rot,qwe",
"foo,bar,buz,top,tof,rot,qwe,pop",
"foo,bar,buz,top,tof,rot,qwe,pop,lol",
"foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
};
String toSplit;
@Setup(Level.Invocation)
public void setup()
{
toSplit = data[r.nextInt(data.length)];
}
...
}
Not the Android of course
@Setup
permits to run any setup before
Level.Invocation
any benchmark method callLevel.Iteration
any measurement iteration
Benchmark Mode Cnt Score Error Units
Training30.pattern avgt 3 587.557 ± 74.707 ns/op
Training30.precompiledPattern avgt 3 441.839 ± 49.530 ns/op
Training30.stream avgt 3 648.489 ± 18.975 ns/op
Training30.precompiledStream avgt 3 494.524 ± 54.549 ns/op
Training30.split avgt 3 272.838 ± 37.578 ns/op
Training30.tokenize avgt 3 246.168 ± 51.308 ns/op
Benchmark Mode Cnt Score Error Units
Training21.pattern avgt 3 475.992 ± 67.709 ns/op
Training21.precompiledPattern avgt 3 315.103 ± 44.834 ns/op
Training21.stream avgt 3 571.862 ± 623.681 ns/op
Training21.precompiledStream avgt 3 349.376 ± 80.088 ns/op
Training21.split avgt 3 182.281 ± 15.728 ns/op
Training21.tokenize avgt 3 218.433 ± 807.886 ns/op
https://github.com/Xceptance/jmh-training
public class Training31
{
@Param({"1", "5", "10", "20"})
int size;
final String[] data = {
"foo",
"foo,bar,buz,top,tof,rot",
"foo,bar,buz,top,tof,rot,qwe,pop,lol,aha",
"foo,bar,buz,top,tof,rot,qwe,pop,lol,aha," +
"foo,bar,buz,top,tof,rot,qwe,pop,lol,aha"
};
String toSplit;
@Setup
public void setup()
{
switch (size)
{
case 1: toSplit = data[0]; break;
case 5: toSplit = data[1]; break;
case 10: toSplit = data[2]; break;
case 20: toSplit = data[3]; break;
};
}
...
}
Don't go over all data sizes at once
@Param
data injected by JMH
Benchmark (size) Mode Cnt Score
Training31.pattern 1 avgt 3 232.503
Training31.pattern 5 avgt 3 546.355
Training31.pattern 10 avgt 3 795.779
Training31.pattern 20 avgt 3 1354.040
Training31.precompiledPattern 1 avgt 3 75.391
Training31.precompiledPattern 5 avgt 3 433.272
Training31.precompiledPattern 10 avgt 3 625.999
Training31.precompiledPattern 20 avgt 3 1227.985
Training31.stream 1 avgt 3 275.538
Training31.stream 5 avgt 3 638.750
Training31.stream 10 avgt 3 864.800
Training31.stream 20 avgt 3 1439.868
Training31.precompiledStream 1 avgt 3 121.478
Training31.precompiledStream 5 avgt 3 498.007
Training31.precompiledStream 10 avgt 3 743.830
Training31.precompiledStream 20 avgt 3 1305.923
Training31.split 1 avgt 3 7.712
Training31.split 5 avgt 3 245.822
Training31.split 10 avgt 3 391.144
Training31.split 20 avgt 3 841.006
Training31.tokenize 1 avgt 3 12.375
Training31.tokenize 5 avgt 3 239.217
Training31.tokenize 10 avgt 3 401.357
Training31.tokenize 20 avgt 3 828.597
https://github.com/Xceptance/jmh-training
public class Training32
{
@Param({"1", "5", "10", "20"})
int size;
@Param({"1", "2", "5", "10"})
int length;
final Pattern precompiledPattern = Pattern.compile(",");
String toSplit;
@Setup
public void setup()
{
String s = "";
for (int i = 0; i < length; i++)
{
s += "a";
}
final StringJoiner sj = new StringJoiner(",");
for (int i = 0; i < size; i++)
{
sj.add(s);
}
toSplit = sj.toString();
}
...
}
Two Dimensional Testing
https://github.com/Xceptance/jmh-training
Benchmark (length) (size) Mode Cnt Score Units
Training32.split 1 1 avgt 3 16.5 ns/op
Training32.split 1 5 avgt 3 218.7 ns/op
Training32.split 1 10 avgt 3 358.0 ns/op
Training32.split 1 20 avgt 3 745.0 ns/op
Training32.split 2 1 avgt 3 17.2 ns/op
Training32.split 2 5 avgt 3 225.3 ns/op
Training32.split 2 10 avgt 3 402.9 ns/op
Training32.split 2 20 avgt 3 746.1 ns/op
Training32.split 5 1 avgt 3 17.2 ns/op
Training32.split 5 5 avgt 3 227.9 ns/op
Training32.split 5 10 avgt 3 395.1 ns/op
Training32.split 5 20 avgt 3 834.4 ns/op
Training32.split 10 1 avgt 3 19.8 ns/op
Training32.split 10 5 avgt 3 242.1 ns/op
Training32.split 10 10 avgt 3 409.8 ns/op
Training32.split 10 20 avgt 3 849.1 ns/op
Training32.tokenize 1 1 avgt 3 9.2 ns/op
Training32.tokenize 1 5 avgt 3 133.2 ns/op
Training32.tokenize 1 10 avgt 3 277.7 ns/op
Training32.tokenize 1 20 avgt 3 552.7 ns/op
Training32.tokenize 2 1 avgt 3 9.3 ns/op
Training32.tokenize 2 5 avgt 3 143.3 ns/op
Training32.tokenize 2 10 avgt 3 298.2 ns/op
Training32.tokenize 2 20 avgt 3 592.4 ns/op
Training32.tokenize 5 1 avgt 3 11.4 ns/op
Training32.tokenize 5 5 avgt 3 145.9 ns/op
Training32.tokenize 5 10 avgt 3 290.9 ns/op
Training32.tokenize 5 20 avgt 3 682.7 ns/op
Training32.tokenize 10 1 avgt 3 15.8 ns/op
Training32.tokenize 10 5 avgt 3 160.2 ns/op
Training32.tokenize 10 10 avgt 3 329.0 ns/op
Training32.tokenize 10 20 avgt 3 645.3 ns/op
Benchmark (length) (size) Mode Cnt Score Units
Training32.split 1 1 avgt 3 16.5 ns/op
Training32.split 2 1 avgt 3 17.2 ns/op
Training32.split 5 1 avgt 3 17.2 ns/op
Training32.split 10 1 avgt 3 19.8 ns/op
Training32.split 1 5 avgt 3 218.7 ns/op
Training32.split 2 5 avgt 3 225.3 ns/op
Training32.split 5 5 avgt 3 227.9 ns/op
Training32.split 10 5 avgt 3 242.1 ns/op
Training32.split 1 10 avgt 3 358.0 ns/op
Training32.split 2 10 avgt 3 402.9 ns/op
Training32.split 5 10 avgt 3 395.1 ns/op
Training32.split 10 10 avgt 3 409.8 ns/op
Training32.split 1 20 avgt 3 745.0 ns/op
Training32.split 2 20 avgt 3 746.1 ns/op
Training32.split 5 20 avgt 3 834.4 ns/op
Training32.split 10 20 avgt 3 849.1 ns/op
Training32.tokenize 1 1 avgt 3 9.2 ns/op
Training32.tokenize 2 1 avgt 3 9.3 ns/op
Training32.tokenize 5 1 avgt 3 11.4 ns/op
Training32.tokenize 10 1 avgt 3 15.8 ns/op
Training32.tokenize 1 5 avgt 3 133.2 ns/op
Training32.tokenize 2 5 avgt 3 143.3 ns/op
Training32.tokenize 5 5 avgt 3 145.9 ns/op
Training32.tokenize 10 5 avgt 3 160.2 ns/op
Training32.tokenize 1 10 avgt 3 277.7 ns/op
Training32.tokenize 2 10 avgt 3 298.2 ns/op
Training32.tokenize 5 10 avgt 3 290.9 ns/op
Training32.tokenize 10 10 avgt 3 329.0 ns/op
Training32.tokenize 1 20 avgt 3 552.7 ns/op
Training32.tokenize 2 20 avgt 3 592.4 ns/op
Training32.tokenize 5 20 avgt 3 682.7 ns/op
Training32.tokenize 10 20 avgt 3 645.3 ns/op
https://github.com/Xceptance/jmh-training
public class Training35
{
final Pattern precompiledPattern = Pattern.compile(",");
String toSplit = "Go,Java,Rust,C";
@Benchmark
public List<String> precompiledPattern()
{
final String[] cols = precompiledPattern.split(toSplit);
return Arrays.asList(cols);
}
@Benchmark
public List<String> pattern()
{
final Pattern p = Pattern.compile(",");
final String[] cols = p.split(toSplit);
return Arrays.asList(cols);
}
@Benchmark
public List<String> tokenize()
{
final StringTokenizer st = new StringTokenizer(toSplit, ",");
final List<String> result = new ArrayList<>();
while (st.hasMoreTokens())
{
result.add(st.nextToken());
}
return result;
}
...
}
We wanted more than this
Benchmark Mode Cnt Score Units
Training35.tokenize avgt 3 165.3 ns/op
Training35.split avgt 3 184.4 ns/op
Training35.precompiledPattern avgt 3 325.0 ns/op
Training35.precompiledStream avgt 3 347.9 ns/op
Training35.pattern avgt 3 432.1 ns/op
Training35.stream avgt 3 478.5 ns/op
# Original test with the data of Training35
Benchmark Mode Cnt Score Units
Training35a.tokenize avgt 3 113.4 ns/op
Training35a.split avgt 3 195.1 ns/op
Training35a.precompiledPattern avgt 3 314.0 ns/op
Training35a.precompiledStream avgt 3 278.2 ns/op
Training35a.pattern avgt 3 435.8 ns/op
Training35a.stream avgt 3 407.0 ns/op
https://github.com/Xceptance/jmh-training
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.SingleShotTime)
@Fork(1)
@State(Scope.Thread) // ignore for now
public class Training40
{
...
@Benchmark
@Warmup(iterations = 0)
@Measurement(iterations = 10)
public List<String> tokenizeCold()
{
return tokenize(toSplit);
}
@Benchmark
@Warmup(iterations = 10)
@Measurement(iterations = 10)
public List<String> tokenizeWarm()
{
return tokenize(toSplit);
}
@Benchmark
@Warmup(iterations = 100)
@Measurement(iterations = 10)
public List<String> tokenizeWarmer()
{
return tokenize(toSplit);
}
@Benchmark
@Warmup(iterations = 10000)
@Measurement(iterations = 10)
public List<String> tokenizeHot()
{
return tokenize(toSplit);
}
...
}
Checking warmup performance
@BenchmarkMode(Mode.SingleShotTime)
measures one time execution@Warmup(iterations = 1)
iterations define how often is measured
# Fork(1)
Benchmark Mode Cnt Score Error Units
Training40.splitIceCold ss 300232.000 ns/op
Training40.splitCold ss 10 52054.900 ± 139920.099 ns/op
Training40.splitHot ss 10 6136.200 ± 6489.506 ns/op
Training40.splitWarm ss 10 12353.500 ± 4100.746 ns/op
Training40.splitWarmer ss 10 15408.600 ± 15743.686 ns/op
Training40.tokenizeIceCold ss 73831.000 ns/op
Training40.tokenizeCold ss 10 21710.100 ± 29223.398 ns/op
Training40.tokenizeWarm ss 10 10834.800 ± 2105.584 ns/op
Training40.tokenizeWarmer ss 10 12971.800 ± 13742.500 ns/op
Training40.tokenizeHot ss 10 2547.300 ± 2586.435 ns/op
Benchmark Mode Cnt Score Error Units
Training40.splitIceCold ss 292800.000 ns/op
Training40.splitCold ss 10 51979.000 ± 159071.783 ns/op
Training40.splitHot ss 10 5334.300 ± 4306.467 ns/op
Training40.splitWarm ss 10 14026.800 ± 3826.923 ns/op
Training40.splitWarmer ss 10 17236.500 ± 14239.322 ns/op
Training40.tokenizeIceCold ss 77906.000 ns/op
Training40.tokenizeCold ss 10 21006.500 ± 32446.097 ns/op
Training40.tokenizeWarm ss 10 9151.900 ± 2316.037 ns/op
Training40.tokenizeWarmer ss 10 14808.800 ± 14499.507 ns/op
Training40.tokenizeHot ss 10 2360.800 ± 3371.624 ns/op
Benchmark Mode Cnt Score Error Units
Training40.splitIceCold ss 444264.000 ns/op
Training40.splitCold ss 10 56199.600 ± 132136.303 ns/op
Training40.splitWarm ss 10 15304.500 ± 4467.518 ns/op
Training40.splitWarmer ss 10 11234.500 ± 9546.547 ns/op
Training40.splitHot ss 10 5776.100 ± 6985.279 ns/op
Training40.tokenizeIceCold ss 74142.000 ns/op
Training40.tokenizeCold ss 10 19142.300 ± 32839.600 ns/op
Training40.tokenizeWarm ss 10 13393.500 ± 4882.290 ns/op
Training40.tokenizeWarmer ss 10 6822.200 ± 3611.820 ns/op
Training40.tokenizeHot ss 10 1248.800 ± 1709.296 ns/op
@Fork(10)
to measure more often from scratch
# Fork(10)
Benchmark Mode Cnt Score Error Units
Training40.splitIceCold ss 10 380923.600 ± 96774.549 ns/op
Training40.splitCold ss 100 88485.930 ± 102858.604 ns/op
Training40.splitWarm ss 100 19252.030 ± 8536.751 ns/op
Training40.splitHot ss 100 5804.700 ± 2754.319 ns/op
Training40.splitWarmer ss 100 14554.390 ± 2554.317 ns/op
Training40.tokenizeIceCold ss 10 78459.300 ± 33153.083 ns/op
Training40.tokenizeCold ss 100 22712.400 ± 8913.579 ns/op
Training40.tokenizeWarm ss 100 12422.790 ± 1382.564 ns/op
Training40.tokenizeWarmer ss 100 13023.690 ± 3353.144 ns/op
Training40.tokenizeHot ss 100 2047.580 ± 1066.626 ns/op
Are we really the first?
split()
and StringTokenizer
@Setup
again to touch classes before use
public class Training41
{
...
public String hold;
@Setup
public void setup()
{
// use String.spllt, but differently from us
String[] result = "Test:Test".split(":");
// use Tokenizer
StringTokenizer st = new StringTokenizer(":");
// ensure we are not removed
hold = st.toString() + result;
}
...
}
# Don't touch (Training40)
Benchmark Mode Cnt Score Error Units
Training40.splitIceCold ss 10 380,923.600 ± 96774.549 ns/op
Training40.splitCold ss 100 88,485.930 ± 102858.604 ns/op
Training40.splitWarm ss 100 19,252.030 ± 8536.751 ns/op
Training40.splitHot ss 100 5,804.700 ± 2754.319 ns/op
Training40.splitWarmer ss 100 14,554.390 ± 2554.317 ns/op
Training40.tokenizeIceCold ss 10 78,459.300 ± 33153.083 ns/op
Training40.tokenizeCold ss 100 22,712.400 ± 8913.579 ns/op
Training40.tokenizeWarm ss 100 12,422.790 ± 1382.564 ns/op
Training40.tokenizeWarmer ss 100 13,023.690 ± 3353.144 ns/op
Training40.tokenizeHot ss 100 2,047.580 ± 1066.626 ns/op
# Touch before use (Training41)
Benchmark Mode Cnt Score Error Units
Training41.splitIceCold ss 10 58,826.800 ± 21882.562 ns/op
Training41.splitCold ss 100 23,466.180 ± 6082.829 ns/op
Training41.splitWarm ss 100 14,938.110 ± 1308.992 ns/op
Training41.splitWarmer ss 100 12,317.460 ± 1884.732 ns/op
Training41.splitHot ss 100 4,120.080 ± 2151.355 ns/op
Training41.tokenizeIceCold ss 10 53,629.900 ± 34002.030 ns/op
Training41.tokenizeCold ss 100 19,350.450 ± 6708.144 ns/op
Training41.tokenizeWarm ss 100 11,405.080 ± 985.338 ns/op
Training41.tokenizeWarmer ss 100 9,649.810 ± 2937.053 ns/op
Training41.tokenizeHot ss 100 2,038.750 ± 692.256 ns/op
Need debugging and testing capabilities
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread) // ignore for now
public class Training50
{
private String toSplit;
@Setup
@Before
public void setup()
{
toSplit = "C,Rust,Java";
}
@Benchmark
public List<String> split()
{
final String[] cols = toSplit.split(",");
return Arrays.asList(cols);
}
@Test
public void testSplit()
{
final List<String> result = split();
Assert.assertEquals(3, result.size());
Assert.assertEquals("C", result.get(0));
Assert.assertEquals("Rust", result.get(1));
Assert.assertEquals("Java", result.get(2));
}
}