How to Read
One Billion Rows
Fast Enough
René Schwietzke, Xceptance GmbH
@ReneSchwietzke
@reneschwietzke@foojay.social
#java #performance #tuning #qa #test #performancetest #loadtest
I care and share
c b
This presentation is licensed under
Creative Commons Attribution 4.0 International License
All source code is APL 2.0.
Third-party content could use a different license.
What is 1BRC and Why the Fuzz?
A busy January 2024 for Java Devs
... write a Java program for retrieving temperature measurement values from a text file and calculate the min, mean, and max temperatures per weather station.
... one caveat, the file has 1,000,000,000 rows!
Zagreb;19.7
Havana;36.2
Mexicali;27.3
Marseille;15.4
Banjul;24.2
Canberra;0.8
Mandalay;19.3
Bridgetown;29.6
Chihuahua;10.9
Honiara;35.3
Gangtok;-1.7
Houston;32.4
Zürich;13.8
Kolkata;20.6
Tripoli;8.9
Karachi;28.3
Zagreb;13.2
Palm Springs;20.3
Lhasa;6.2
Sacramento;28.3
Edmonton;5.6
Toluca;14.0
El Paso;24.0
Antsiranana;24.9
Tromsø;-3.9
# This is TreeMap::toString
# Data shown here is formatted for screen
{
Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1,
Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1,
Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7,
...
Willemstad=-7.9,28.2,57.7, Winnipeg=-28.1,3.0,39.4,
Wrocław=-27.5,9.6,50.2, Xi'an=-22.5,14.0,49.1,
Yakutsk=-42.2,-8.8,26.4, Yangon=-2.6,27.6,59.0,
Yaoundé=-10.1,23.6,58.3, Yellowknife=-38.4,-4.0,40.0,
Yerevan=-18.4,12.4,49.6, Yinchuan=-25.4,9.5,42.0,
Zagreb=-25.3,10.6,48.4, Zanzibar City=-5.0,26.1,61.2,
Zürich=-20.9,9.4,43.5, Ürümqi=-26.8,7.4,43.7,
İzmir=-15.8,18.0,50.1
}
\n
What do you think? Guess some numbers.
Test | Cores | Time Digital Ocean Intel | Time Hetzner AMD |
---|---|---|---|
Baseline | 1 | 200.7 s | 150.4 s |
Baseline | 8 | 99.8 s | 93.1 s |
Thomas Wuerthinger | 1 | 8.9 s | 7.3 s |
Thomas Wuerthinger | 8 | 1.2 s | 1.6 s |
Today's 20% Goal | 1 | 40.0 s | 30.1 s |
Digital Ocean, Amsterdam, Dedicated Machine, Intel CPU-optimized, 16 core, 32 GB, May 2024
Hetzner, Finland, Dedicated Machine, AMD, 8 core, 32 GB, September 2024
What We Will Do and What We Won't
Just to Make Things Easier
Some limitations apply today
Goal: All code is still easily understandable.
The Original Code
public String run(final String fileName) throws IOException {
Collector<Measurement, MeasurementAggregator, ResultRow> collector = Collector.of(MeasurementAggregator::new,
(agg, m) -> {
agg.min = Math.min(agg.min, m.value);
agg.max = Math.max(agg.max, m.value);
agg.sum += m.value;
agg.count++;
}, (agg1, agg2) -> {
var res = new MeasurementAggregator();
res.min = Math.min(agg1.min, agg2.min);
res.max = Math.max(agg1.max, agg2.max);
res.sum = agg1.sum + agg2.sum;
res.count = agg1.count + agg2.count;
return res;
}, agg -> {
return new ResultRow(agg.min, (Math.round(agg.sum * 10.0) / 10.0) / agg.count, agg.max);
});
// Measurement is a record
var result = Files.lines(Paths.get(fileName)).parallel() // <-- optional
.map(l -> l.split(";"))
.map(l -> new Measurement(l))
.collect(groupingBy(m -> m.station(), collector));
return new TreeMap<>(result).toString();
}
org.rschwietzke.devoxxpl24.BRC01_BaselineST
github.com/rschwietzke/1brc-the-first-80-meters.git
Test | Cores | Time DO | Time HE |
---|---|---|---|
Baseline | 1 | 205.0 s | 146.8 s |
Baseline | 8 | 96.8 s | 84.1 s |
Numbers between slides might vary.
github.com/rschwietzke/1brc-the-first-80-meters.git
run | 78.4 % |
readLine | 17.7 % |
split | 31.7 % |
parseDouble | 11.7 % |
store/reduce | 14.4 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC01_BaselineST measurements.txt 2 2 \
Stream Tuning is Too Complicated
/**
* A more classic version
*/
public String run(final String fileName) throws IOException
{
final Map<String, Temperatures> cities = new HashMap<>();
try (var reader = Files.newBufferedReader(Paths.get(fileName)))
{
String line;
while ((line = reader.readLine()) != null)
{
// split the line
final String[] cols = line.split(";");
// get us the data to store
final String city = cols[0];
final double temperature = Double.parseDouble(cols[1]);
// store and sum up the data
// Map::merge(key, value, remappingFunction)
cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
}
}
// ok, we got everything, now we need to order it
return new TreeMap<String, Temperatures>(cities).toString();
}
org.rschwietzke.devoxxpl24.BRC03_NoStreamST
github.com/rschwietzke/1brc-the-first-80-meters.git
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.8 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
Area |
01 | 03 |
---|---|---|
run |
78.4 % | 80.4 % |
readLine |
17.7 % | 17.9 % |
split |
31.7 % | 26.8 % |
parseDouble |
11.7 % | 12.2 % |
merge |
14.4 % | 21.0 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC03_NoStreamST measurements.txt 2 2 \
Get Rid Off split
public String run(final String fileName) throws IOException
{
// our cities with temperatures
final Map<String, Temperatures> cities = new HashMap<>();
try (var reader = Files.newBufferedReader(Paths.get(fileName)))
{
String line;
while ((line = reader.readLine()) != null)
{
// split the line
final int pos = line.indexOf(';');
// get us the city
final String city = line.substring(0, pos);
final String temperatureAsString = line.substring(pos + 1);
// parse our temperature
final double temperature = Double.parseDouble(temperatureAsString);
// merge the data into the captured data
cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
}
}
// ok, we got everything, now we need to order it and print it
return new TreeMap<String, Temperatures>(cities).toString();
}
org.rschwietzke.devoxxpl24.BRC05_ReplaceSplitST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
split
shortcut?
private String[] split(String regex, int limit, boolean withDelimiters) {
/* fastpath if the regex is a
* (1) one-char String and this character is not one of the
* RegEx's meta characters ".$|()[{^?*+\\", or
* (2) two-char String and the first char is the backslash and
* the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.length() == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
// All the checks above can potentially be constant folded by
// a JIT/AOT compiler when the regex is a constant string.
// ...
// This split now creates objects
return split(ch, limit, withDelimiters);
}
...
}
java.util.String::split
Area |
01 | 03 | 05 |
---|---|---|---|
run |
78.4 % | 80.4 % | 81.3 % |
readLine |
17.7 % | 17.9 % | 31.6 % |
split |
31.7 % | 26.8 % | n/a |
indexOf |
n/a | n/a | 2.2 % |
subString |
n/a | 9.2 % | 7.6 % |
parseDouble |
11.7 % | 12.2 % | 14.8 % |
merge |
14.4 % | 21.0 % | 23.9 % |
Make double
Parsing Less Generic
static ASCIIToBinaryConverter readJavaFormatString( String in ) throws NumberFormatException {
boolean isNegative = false;
boolean signSeen = false;
int decExp;
char c;
parseNumber:
try{
in = in.trim(); // don't fool around with white space.
// throws NullPointerException if null
int len = in.length();
if ( len == 0 ) {
throw new NumberFormatException("empty String");
}
int i = 0;
switch (in.charAt(i)){
case '-':
isNegative = true;
//FALLTHROUGH
case '+':
i++;
signSeen = true;
}
c = in.charAt(i);
if(c == 'N') { // Check for NaN
if((len-i)==NAN_LENGTH && in.indexOf(NAN_REP,i)==i) {
return A2BC_NOT_A_NUMBER;
}
// something went wrong, throw exception
break parseNumber;
} else if(c == 'I') { // Check for Infinity strings
if((len-i)==INFINITY_LENGTH && in.indexOf(INFINITY_REP,i)==i) {
return isNegative? A2BC_NEGATIVE_INFINITY : A2BC_POSITIVE_INFINITY;
}
// something went wrong, throw exception
break parseNumber;
} else if (c == '0') { // check for hexadecimal floating-point number
if (len > i+1 ) {
char ch = in.charAt(i+1);
if (ch == 'x' || ch == 'X' ) { // possible hex string
return parseHexString(in);
}
}
} // look for and process decimal floating-point string
char[] digits = new char[ len ];
boolean decSeen = false;
int nDigits = 0;
int decPt = 0;
int nLeadZero = 0;
int nTrailZero = 0;
skipLeadingZerosLoop:
while (i < len) {
c = in.charAt(i);
if (c == '0') {
nLeadZero++;
} else if (c == '.') {
if (decSeen) {
// already saw one ., this is the 2nd.
throw new NumberFormatException("multiple points");
}
decPt = i;
if (signSeen) {
decPt -= 1;
}
decSeen = true;
} else {
break skipLeadingZerosLoop;
}
i++;
}
digitLoop:
while (i < len) {
c = in.charAt(i);
if (c >= '1' && c <= '9') {
digits[nDigits++] = c;
nTrailZero = 0;
} else if (c == '0') {
digits[nDigits++] = c;
nTrailZero++;
} else if (c == '.') {
if (decSeen) {
// already saw one ., this is the 2nd.
throw new NumberFormatException("multiple points");
}
decPt = i;
if (signSeen) {
decPt -= 1;
}
decSeen = true;
} else {
break digitLoop;
}
i++;
}
nDigits -=nTrailZero;
//
// At this point, we've scanned all the digits and decimal
// point we're going to see. Trim off leading and trailing
// zeros, which will just confuse us later, and adjust
// our initial decimal exponent accordingly.
// To review:
// we have seen i total characters.
// nLeadZero of them were zeros before any other digits.
// nTrailZero of them were zeros after any other digits.
// if ( decSeen ), then a . was seen after decPt characters
// ( including leading zeros which have been discarded )
// nDigits characters were neither lead nor trailing
// zeros, nor point
//
//
// special hack: if we saw no non-zero digits, then the
// answer is zero!
// Unfortunately, we feel honor-bound to keep parsing!
//
boolean isZero = (nDigits == 0);
if ( isZero && nLeadZero == 0 ){
// we saw NO DIGITS AT ALL,
// not even a crummy 0!
// this is not allowed.
break parseNumber; // go throw exception
}
//
// Our initial exponent is decPt, adjusted by the number of
// discarded zeros. Or, if there was no decPt,
// then its just nDigits adjusted by discarded trailing zeros.
//
if ( decSeen ){
decExp = decPt - nLeadZero;
} else {
decExp = nDigits + nTrailZero;
}
//
// Look for 'e' or 'E' and an optionally signed integer.
//
if ( (i < len) && (((c = in.charAt(i) )=='e') || (c == 'E') ) ){
int expSign = 1;
int expVal = 0;
int reallyBig = Integer.MAX_VALUE / 10;
boolean expOverflow = false;
switch( in.charAt(++i) ){
case '-':
expSign = -1;
//FALLTHROUGH
case '+':
i++;
}
int expAt = i;
expLoop:
while ( i < len ){
if ( expVal >= reallyBig ){
// the next character will cause integer
// overflow.
expOverflow = true;
}
c = in.charAt(i++);
if(c>='0' && c<='9') {
expVal = expVal*10 + ( (int)c - (int)'0' );
} else {
i--; // back up.
break expLoop; // stop parsing exponent.
}
}
int expLimit = BIG_DECIMAL_EXPONENT + nDigits + nTrailZero;
if (expOverflow || (expVal > expLimit)) {
// There is still a chance that the exponent will be safe to
// use: if it would eventually decrease due to a negative
// decExp, and that number is below the limit. We check for
// that here.
if (!expOverflow && (expSign == 1 && decExp < 0)
&& (expVal + decExp) < expLimit) {
// Cannot overflow: adding a positive and negative number.
decExp += expVal;
} else {
//
// The intent here is to end up with
// infinity or zero, as appropriate.
// The reason for yielding such a small decExponent,
// rather than something intuitive such as
// expSign*Integer.MAX_VALUE, is that this value
// is subject to further manipulation in
// doubleValue() and floatValue(), and I don't want
// it to be able to cause overflow there!
// (The only way we can get into trouble here is for
// really outrageous nDigits+nTrailZero, such as 2
// billion.)
//
decExp = expSign * expLimit;
}
} else {
// this should not overflow, since we tested
// for expVal > (MAX+N), where N >= abs(decExp)
decExp = decExp + expSign*expVal;
}
// if we saw something not a digit ( or end of string )
// after the [Ee][+-], without seeing any digits at all
// this is certainly an error. If we saw some digits,
// but then some trailing garbage, that might be ok.
// so we just fall through in that case.
// HUMBUG
if ( i == expAt ) {
break parseNumber; // certainly bad
}
}
//
// We parsed everything we could.
// If there are leftovers, then this is not good input!
//
if ( i < len &&
((i != len - 1) ||
(in.charAt(i) != 'f' &&
in.charAt(i) != 'F' &&
in.charAt(i) != 'd' &&
in.charAt(i) != 'D'))) {
break parseNumber; // go throw exception
}
if(isZero) {
return isNegative ? A2BC_NEGATIVE_ZERO : A2BC_POSITIVE_ZERO;
}
return new ASCIIToBinaryBuffer(isNegative, decExp, digits, nDigits);
} catch ( StringIndexOutOfBoundsException e ){ }
throw new NumberFormatException("For input string: \"" + in + "\"");
}
jdk.internal.math.FloatingDecimal::readJavaFormatString/p>
public static double parseDouble(final String s) {
return parseDouble(s, 0, s.length() - 1);
}
private static final double[] multipliers = {
1, 1, 0.1, 0.01, 0.001, 0.000_1, 0.000_01, 0.000_001, 0.000_000_1, 0.000_000_01,
0.000_000_001, 0.000_000_000_1, 0.000_000_000_01, 0.000_000_000_001, 0.000_000_000_000_1,
0.000_000_000_000_01, 0.000_000_000_000_001, 0.000_000_000_000_000_1, 0.000_000_000_000_000_01};
public static double parseDouble(final String s, final int offset, final int end) {
final int negative = s.charAt(offset) == '-' ? offset + 1 : offset;
long value = 0;
int decimalPos = end;
for (int i = negative; i <= end; i++) {
final int d = s.charAt(i);
if (d == '.') {
decimalPos = i;
continue;
}
final int v = d - DIGITOFFSET;
value = ((value << 3) + (value << 1));
value += v;
}
// adjust the decimal places
value = negative != offset ? -value : value;
return value * multipliers[end - decimalPos + 1];
}
org.rschwietzke.devoxxpl24.BRC06_NewDoubleParsingST
github.com/rschwietzke/1brc-the-first-80-meters.git
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
Area |
05 | 06 |
---|---|---|
run |
81.3 % | 76.2 % |
readLine |
31.6 % | 32.8 % |
indexOf |
2.2 % | 2.2 % |
subString |
7.6 % | 9.9 % |
parseDouble |
14.8 % | 4.4 % |
merge |
23.9 % | 25.8 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC06_NewDoubleParsingST measurements.txt 2 2 \
One Less String
while ((line = reader.readLine()) != null)
{
// split the line
final int pos = line.indexOf(';');
// get us the city
final String city = line.substring(0, pos);
// parse our temperature inline without an instance of a string for temperature
final double temperature = ParseDouble.parseDouble(line, pos + 1, line.length() - 1);
// merge the data into the captured data
cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
}
org.rschwietzke.devoxxpl24.BRC07_NoCopyForDoubleST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
Area |
05 | 06 | 07 |
---|---|---|---|
run |
81.3 % | 76.2 % | 76.4 % |
readLine |
31.6 % | 32.8 % | 29.6 % |
indexOf |
2.2 % | 2.2 % | 2.6 % |
subString |
7.6 % | 9.9 % | 5.8 % |
parseDouble |
14.8 % | 4.4 % | 6.0 % |
merge |
23.9 % | 25.8 % | 28.7 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC07_NoCopyForDoubleST measurements.txt 2 2 \
Ignore double
as longint
as Possible
/**
* Parses a double but ends up with an int, only because we know
* the format of the results -99.9 to 99.9
*/
public static int parseInteger(final String s, final int offset, final int end)
{
final int negative = s.charAt(offset) == '-' ? offset + 1 : offset;
int value = 0;
for (int i = negative; i <= end; i++)
{
final int d = s.charAt(i);
if (d == '.')
{
continue;
}
final int v = d - DIGITOFFSET;
// multiply by 10 = multiply by 8 + multiply by 2
value = ((value << 3) + (value << 1));
value += v;
}
value = negative != offset ? -value : value;
return value;
}
org.rschwietzke.devoxxpl24.BRC08_GoIntST
Measured on Digital Ocean
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
Area |
05 | 06 | 07 | 08 |
---|---|---|---|---|
run |
81.3 % | 76.2 % | 76.4 % | 79.3 % |
readLine |
31.6 % | 32.8 % | 29.6 % | 34.4 % |
indexOf |
2.2 % | 2.2 % | 2.6 % | 2.5 % |
subString |
7.6 % | 9.9 % | 5.8 % | 6.0 % |
parseDouble |
14.8 % | 4.4 % | 6.0 % | 8.2 % |
merge |
23.9 % | 25.8 % | 28.7 % | 24.2 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC08_GoIntST measurements.txt 2 2 \
Explicit Merging
while ((line = reader.readLine()) != null)
{
// split the line
final int pos = line.indexOf(';');
// get us the city
final String city = line.substring(0, pos);
// parse our temperature inline without an instance of a string for temperature
final int temperature = ParseDouble.parseInteger(line, pos + 1, line.length() - 1);
// get city and update
final var v = cities.get(city);
final var t = new Temperatures(temperature);
cities.put(city, v != null ? v.merge(t) : t);
}
org.rschwietzke.devoxxpl24.BRC09_NoMergeST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
run | 80.1 % |
readLine | 39.1 % |
indexOf | 2.8 % |
substring | 7.9 % |
parseInteger | 6.3 % |
get | 10.9 % |
merge | 2.3 % |
put | 5.2 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC09_NoMergeST measurements.txt 2 2 \
Less Memory Churn
while ((line = reader.readLine()) != null)
{
...
// get city
final Temperatures v = cities.get(city);
if (v != null)
{
// know it, put both together
v.add(temperature);
}
else
{
// we have not seen that city yet, create a container and store it
cities.put(city, new Temperatures(temperature));
}
}
org.rschwietzke.devoxxpl24.BRC10_MutateST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
Area |
09 | 10 |
---|---|---|
run |
80.3 % | 71.3 % |
readLine |
39.1 % | 38.1 % |
indexOf |
2.8 % | 2.4 % |
substring |
7.9 % | 6.2 % |
parseInteger |
6.3 % | 6.6 % |
get |
10.9 % | 15.7 % |
merge |
2.3 % | 0.0 % |
put |
5.2 % | 0.0 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC10_MutateST measurements.txt 2 2 \
Simplify the Map
// https://github.com/mikvor/hashmapTest/blob/master/src/main/java/map/objobj/ObjObjMap.java
public class FastHashMap<K, V> {
private static final Object FREE_KEY = new Object();
private Object[] m_data;
...
public V get( final K key ) {
int ptr = (key.hashCode() & m_mask) << 1;
Object k = m_data[ ptr ];
if ( k == FREE_KEY ) {
return null; //end of chain already
}
if ( k.hashCode() == key.hashCode() && k.equals( key ) ) {
return (V) m_data[ ptr + 1 ];
}
while ( true ) {
ptr = (ptr + 2) & m_mask2; //that's next index
k = m_data[ ptr ];
if ( k == FREE_KEY ) {
return null;
}
if (k.equals( key )) {
return (V) m_data[ ptr + 1 ];
}
}
}
...
}
org.rschwietzke.devoxxpl24.BRC12_NewMapST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
Area |
12 |
---|---|
run | 74.1 % |
readLine | 37.9 % |
indexOf | 2.0 % |
substring | 7.7 % |
parseInteger | 8.3 % |
get | 15.4 % |
merge | 0.0 % |
put | 0.0 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC12_NewMapST measurements.txt 2 2 \
Eliminate Slots
public void getPutOrUpdate( final String city, int value ) {
final int hash = city.hashCode();
int ptr = hash & m_mask;
Temperatures k = m_data[ ptr ];
if ( k == FREE_KEY ) {
put(new Temperatures(city, value));
return;
}
if ( k.equals( city ) ) {
k.add(value);
return;
}
while ( true ) {
ptr = (ptr + 1) & m_mask; //that's next index
k = m_data[ ptr ];
if ( k == FREE_KEY ) {
put(new Temperatures(city, value));
return;
}
if ( k.equals( city ) ) {
k.add(value);
return;
}
}
}
org.rschwietzke.devoxxpl24.BRC13_HardcodedSetST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
BRC13_HardcodedSetST | 102.5 s | 46.8 % | 77.7 s | 52.9 % |
run | 72.1 % |
readLine | 38.2 % |
indexOf | 3.2 % |
substring | 4.5 % |
parseInteger | 6.7 % |
putOrUpdate | 16.1 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC13_HardcodedSetST measurements.txt 2 2 \
Approach IO Differently
Sit back, relax, and enjoy some music
String
char
If you are in control of the data, you can skip error handling.
try (var raf = new RandomAccessFile(fileName, "r");
var channel = raf.getChannel();)
{
final Line line = new Line(channel);
while (true)
{
line.readFromChannel();
if (line.hasNewLine)
{
// parse our temperature inline without an instance of a string for temperature
final int temperature = ParseDouble.parseInteger(line.data,
line.semicolonPos + 1, line.newlinePos - 1);
// find and update
cities.getPutOrUpdate(line, temperature);
}
if (line.EOF)
{
break;
}
}
}
org.rschwietzke.devoxxpl24.BRC14_ReadBytesST
private ByteBuffer buffer = ByteBuffer.allocate(MIN_BUFFERSIZE);
private byte[] data = buffer.array();
private void readFromChannel()
{
hashCode = -1;
hasNewLine = false;
try {
// do we near the end of the buffer?
if (end - pos < REMAINING_MIN_BUFFERSIZE)
{
// we move the buffer indirectly, because
// the ByteBuffer just wraps our array,
// nothing for the tenderhearted
System.arraycopy(data, pos,
data, 0, data.length - pos);
end = end - pos;
pos = 0;
buffer.position(end);
// fill the buffer up
final int readBytes = channel.read(buffer);
if (readBytes == -1) {
EOF = true;
}
end = buffer.position();
}
}
catch (IOException e) {
...
}
lineStartPos = pos;
// look for semicolon and new line
int i = pos;
for (; i < end; i++)
{
final byte b = data[i];
if (b == ';')
{
semicolonPos = i;
break;
}
}
i++;
for (; i < end; i++)
{
final byte b = data[i];
if (b == '\n')
{
newlinePos = i;
pos = i + 1;
hasNewLine = true;
return;
}
}
}
org.rschwietzke.devoxxpl24.BRC14_ReadBytesST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
BRC13_HardcodedSetST | 102.5 s | 46.8 % | 77.7 s | 52.9 % |
BRC14_ReadBytesST | 53.7 s | 24.5 % | 45.6 s | 31.1 % |
run | 80.2 % |
read | 2.1 % |
parse line | 18.4 % |
parseInteger | 8.3 % |
putOrUpdate | 48.6 % |
hashCode | 26.6 % |
equals | 14.4 % |
put/update | 7.6 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC14_ReadBytesST measurements.txt 2 2 \
Enjoy a Treat
A benefit we have not asked for
# Run with max heap 4 GB for 10 M lines
java -Xlog:gc*=debug \
-Xms4m -Xmx4g \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseEpsilonGC \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC01_BaselineST \
data-10m.txt 0 1
[3.726s][info][gc] Heap: 4096M reserved,
3204M (78.22%) committed,
3199M (78.11%) used
1000 M lines = 100x 3.2 GB = 320 GB
java ... \
org.rschwietzke.devoxxpl24.BRC13_HardcodedSetST \
data-10m.txt 0 1
[2,035s][info][gc] Heap: 4096M reserved,
1129M (27,57%) committed,
1037M (25,33%) used
1000 M lines = 100x 1.2 GB = 120 GB
# Run with max heap 4 MB for 1000 M lines
java -Xlog:gc*=debug \
-Xms4m -Xmx4m \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseEpsilonGC \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC14_ReadBytesST \
data-1000m.txt 0 1
[55.401s][info][gc] Heap: 4096K reserved,
4096K (100.00%) committed,
3634K (88.73%) used
1000 M lines = 3.7 MB
Attempting to Save on Number Parsing
/**
* Parses a double but ends up with an int, only because we know
* the format of the results -99.9 to 99.9
*/
public static int parseIntegerFixed(final byte[] b, final int offset, final int end) {
final int length = end - offset; // one is missing, we care for that later
// we know the first three pieces already 9.9
int p0 = b[end];
int p1 = b[end - 2] * 10;
int value = p0 + p1 - (DIGITOFFSET + DIGITOFFSET * 10);
// we are 9.9
if (length == 2) {
return value;
}
// ok, we are either -9.9 or 99.9 or -99.9
if (b[offset] != (byte)'-') {
// we are 99.9
value += b[end - 3] * 100 - DIGITOFFSET * 100;
return value;
}
// we are either -99.9 or -9.9
if (length == 3) {
// -9.9
return -value;
}
// -99.9
value += b[end - 3] * 100 - DIGITOFFSET * 100;
return -value;
}
org.rschwietzke.devoxxpl24.BRC15_ParseDoubleFixedST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
BRC13_HardcodedSetST | 102.5 s | 46.8 % | 77.7 s | 52.9 % |
BRC14_ReadBytesST | 53.7 s | 24.5 % | 45.6 s | 31.1 % |
BRC15_ParseDoubleFixedST | 51.1 s | 23.3 % | 42.8 s | 29.2 % |
Area |
14 | 15 |
---|---|---|
run |
80.2 % | 78.0 % |
read |
2.1 % | 3.9 % |
parse line |
18.4 % | 18.3 % |
parseInteger |
8.3 % | 9.0 % |
putOrUpdate |
48.6 % | 44.3 % |
hashCode |
26.6 % | 26.6 % |
equals |
14.4 % | 14.4 % |
put/update |
7.6 % | 7.6 % |
java \
-agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
-cp target/classes/ \
org.rschwietzke.devoxxpl24.BRC15_ParseDoubleFixedST measurements.txt 2 2 \
Tuning is Not That Simple Anymore
Hot but not really hot, because we need that code
hashCode
still on topequals
for the byte arraySimpler Min/Max
public void add(final int value)
{
this.min = Math.min(this.min, value);
this.max = Math.max(this.max, value);
this.total += value;
this.count++;
}
public void add(final int value)
{
if (value < this.min)
{
this.min = value;
}
else if (value > this.max)
{
this.max = value;
}
this.total += value;
this.count++;
}
org.rschwietzke.devoxxpl24.BRC21_ManualMinMaxST
HashCode
// OLD: look for semicolon and new line
int i = pos;
for (; i < end; i++)
{
final byte b = data[i];
if (b == ';')
{
semicolonPos = i;
break;
}
}
// Later
public int ByteBuffer::hashCode() {
int h = 1;
int p = position();
for (int i = limit() - 1; i >= p; i--)
h = 31 * h + (int)get(i);
return h;
}
// We loop twice, when searching the ; and when
// calculating the hashcode later.
// Do it in one go, will safe also a comparison
// NEW: look for semicolon and new line
// when checking for semicolon, we do the
// hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
final byte b = data[i];
if (b == ';')
{
semicolonPos = i;
break;
}
h = 31 * h + b;
}
this.hashCode = h;
org.rschwietzke.devoxxpl24.BRC22_EarlyHashCodeST
HashCode
// we loop twice, when searching the ; and when calculating the hashcode later
// do it in one go, will safe also a comparison
// look for semicolon and new line
// when checking for semicolon, we do the hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
final byte b = data[i];
if (b == ';')
{
semicolonPos = i;
break;
}
h = 31 * h + b;
}
this.hashCode = h;
// remove the multiplication and
// use bit shift and subtraction
// look for semicolon and new line
// when checking for semicolon, we do the hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
final byte b = data[i];
if (b == ';')
{
semicolonPos = i;
break;
}
h = (h << 5) - h + b;
}
this.hashCode = h;
org.rschwietzke.devoxxpl24.BRC23_NoMulST
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
BRC13_HardcodedSetST | 102.5 s | 46.8 % | 77.7 s | 52.9 % |
BRC14_ReadBytesST | 53.7 s | 24.5 % | 45.6 s | 31.1 % |
BRC15_ParseDoubleFixedST | 51.1 s | 23.3 % | 42.8 s | 29.2 % |
BRC21_ManualMinMaxST | 52.0 s | 23.7 % | 42.2 s | 28.8 % |
BRC22_EarlyHashCodeST | 40.0 s | 18.3 % | 35.5 s | 24.2 % |
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.9 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC06_NewDoubleParsingST | 127.9 s | 58.4 % | 99.7 s | 67.9 % |
BRC07_NoCopyForDoubleST | 120.4 s | 55.0 % | 89.5 s | 61.0 % |
BRC08_GoIntST | 117.4 s | 53.6 % | 88.3 s | 60.1 % |
BRC09_NoMergeST | 121.3 s | 55.4 % | 95.6 s | 65.0 % |
BRC10_MutateST | 100.6 s | 45.9 % | 85.6 s | 58.3 % |
BRC12_NewMapST | 105.0 s | 47.9 % | 79.3 s | 54.0 % |
BRC13_HardcodedSetST | 102.5 s | 46.8 % | 77.7 s | 52.9 % |
BRC14_ReadBytesST | 53.7 s | 24.5 % | 45.6 s | 31.1 % |
BRC15_ParseDoubleFixedST | 51.1 s | 23.3 % | 42.8 s | 29.2 % |
BRC21_ManualMinMaxST | 52.0 s | 23.7 % | 42.2 s | 28.8 % |
BRC22_EarlyHashCodeST | 40.0 s | 18.3 % | 35.5 s | 24.2 % |
BRC23_NoMulST | 40.3 s | 18.4 % | 32.6 s | 22.2 % |
BRC24_DifferentBranchInHashCodeST | 40.4 s | 18.5 % | 34.0 s | 23.1 % |
BRC25_SmallAddReordingST | 39.9 s | 18.2 % | 32.2 s | 21.9 % |
BRC26_MoreMapSpaceST | 39.0 s | 17.8 % | 33.5 s | 22.8 % |
BRC27_SmallPutST | 37.9 s | 17.3 % | 33.6 s | 22.8 % |
BRC30_DangerNoEqualsST | 28.4 s | 13.0 % | 22.9 s | 15.6 % |
Test | Time DO | % | Time HE | % |
---|---|---|---|---|
BRC01_BaselineST | 219.1 s | 100.0 % | 146.8 s | 100.0 % |
BRC03_NoStreamST | 197.9 s | 90.3 % | 140.9 s | 96.0 % |
BRC05_ReplaceSplitST | 151.1 s | 69.0 % | 112.3 s | 76.5 % |
BRC14_ReadBytesST | 53.7 s | 24.5 % | 45.6 s | 31.1 % |
BRC26_MoreMapSpaceST | 39.0 s | 17.8 % | 33.4 s | 22.8 % |
BRC27_SmallPutST | 37.9 s | 17.3 % | 33.4 s | 22.9 % |
BRC28_FineTuningST | 38.7 s | 17.0 % | 32.2 s | 22.0 % |
BRC29a_ParseDoubleTuningST | 38.8 s | 17.0 % | 31.1 s | 21.2 % |
BRC29g_FixedIntParsing | 38.7 s | 17.0 % | 31.0 s | 21.1 % |
BRC45_DoubleTheSetSize | 27.0 s | 18.4 % | ||
BRC49_OffsetSubtraction | 26.8 s | 18.3 % |
A Few Last Data Tables
# org.rschwietzke.devoxxpl24.BRC01_BaselineST
# Hetzner, AMD, 21.0.4-tem
# Measurement Runtime: 150,932 ms 100%
160,915.21 ms task-clock # 1.066 CPUs utilized
37,195 context-switches # 231.147 /sec
6,232 cpu-migrations # 38.728 /sec
236,617 page-faults # 1.470 K/sec
539,415,718,644 cycles # 3.352 GHz
41,233,849,888 stalled-cycles-frontend # 7.64% frontend cycles idle
2,258,010,166,520 instructions # 4.19 insn per cycle
# 0.02 stalled cycles per insn
522,584,792,160 branches # 3.248 G/sec
2,479,441,210 branch-misses # 0.47% of all branches
150.932228741 seconds time elapsed
145.057695000 seconds user
11.626114000 seconds sys
# org.rschwietzke.devoxxpl24.BRC45_DoubleTheSetSize
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 27,061 ms 17.9% (to baseline)
27,538.78 ms task-clock # 1.018 CPUs utilized
1,029 context-switches # 37.365 /sec
29 cpu-migrations # 1.053 /sec
7,885 page-faults # 286.324 /sec
98,276,507,946 cycles # 3.569 GHz
10,431,399,857 stalled-cycles-frontend # 10.61% frontend cycles idle
303,659,521,159 instructions # 3.09 insn per cycle
# 0.03 stalled cycles per insn
55,868,360,010 branches # 2.029 G/sec
1,540,980,291 branch-misses # 2.76% of all branches
27.061235813 seconds time elapsed
26.366348000 seconds user
1.062075000 seconds sys
There is still room
byte
limits, reads only 8 bit at a time but could do 64 bit instead (factor 8!)int
or long
would be better (need for UNSAFE or different ByteBuffer
usage)mmap
a file instead of filling a buffer
# org.rschwietzke.devoxxpl24.BRC45_DoubleTheSetSize
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 27,061 ms 17.9% (to baseline)
27,538.78 msec task-clock # 1.018 CPUs utilized
1,029 context-switches # 37.365 /sec
29 cpu-migrations # 1.053 /sec
7,885 page-faults # 286.324 /sec
98,276,507,946 cycles # 3.569 GHz
10,431,399,857 stalled-cycles-frontend # 10.61% frontend cycles idle
303,659,521,159 instructions # 3.09 insn per cycle
# 0.03 stalled cycles per insn
55,868,360,010 branches # 2.029 G/sec
1,540,980,291 branch-misses # 2.76% of all branches
27.061235813 seconds time elapsed
26.366348000 seconds user
1.062075000 seconds sys
# Thomas Wuerthinger, 1 core
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 11,281 ms 7.4% (to baseline), 41.7%
11,269.99 msec task-clock # 0.999 CPUs utilized
1,748 context-switches # 155.102 /sec
1 cpu-migrations # 0.089 /sec
223,183 page-faults # 19.803 K/sec
39,555,927,795 cycles # 3.510 GHz
2,945,609,903 stalled-cycles-frontend # 7.45% frontend cycles idle
164,487,950,474 instructions # 4.16 insn per cycle
# 0.02 stalled cycles per insn
15,668,783,505 branches # 1.390 G/sec
149,371,238 branch-misses # 0.95% of all branches
11.281867347 seconds time elapsed
0.076590000 seconds user
0.130600000 seconds sys
Your are about to enter an unstable area!
Test | What | Machine | % |
---|---|---|---|
BRC01_BaselineST | Thinkpad T14s | 218.8 s | 100.0 % |
BRC01_BaselineST | GCP-C2-4c-32g | 278.4 s | 100.0 % |
BRC01_BaselineST | GCP-C2D-4c-32g | 172.4 s | 100.0 % |
BRC01_BaselineST | Hetzner AMD 8c-32g | 146.7 s | 100.0 % |
BRC01_BaselineST | DO Intel Premium 16c-32g | 154.2 s | 100.0 % |
BRC26_MoreMapSpaceST | Thinkpad T14s | 35.3 s | 16.1 % |
BRC26_MoreMapSpaceST | GCP-C2-4c-32g | 43.4 s | 15.6 % |
BRC26_MoreMapSpaceST | GCP-C2D-4c-32g | 34.8 s | 20.2 % |
BRC26_MoreMapSpaceST | Hetzner AMD 8c-32g | 33.5 s | 22.8 % |
BRC26_MoreMapSpaceS | DO Intel Premium 16c-32g | 31.1 s | 19.8 % |
Your milage will vary.
Test | What | Time DO | % |
---|---|---|---|
BRC01_BaselineST | 21.0.3-tem | 215.5 s | 100.0 % |
BRC26_MoreMapSpaceST | 21.0.3-tem | 38.4 s | 17.8 % |
BRC01_BaselineST | 21.0.2-graal | 212.7 s | 100.0 % |
BRC26_MoreMapSpaceST | 21.0.2-graal | 39.8 s | 17.6 % |
BRC26_MoreMapSpaceST | graal-native-o3 | 41.4 s | |
BRC26_MoreMapSpaceST | graal-native-pgo | 44.7 s |   |
Test | What | Time HE |
---|---|---|
BRC26_MoreMapSpaceST | 21.0.1-tem | 32.7 s |
BRC26_MoreMapSpaceST | 21.0.4-tem | 32.8 s |
BRC26_MoreMapSpaceST | 21.0.4-graal | 29.7 s |
BRC26_MoreMapSpaceST | 17.0.12-tem | 32.6 s |
BRC26_MoreMapSpaceST | 23.ea.29-open | 33.2 s |
P.S: Alina, I have no clue why the native images perform so badly.
The cloud sucks when benchmarking
perf
# BRC01_BaselineST - DO CPU 16c
Warmup Runtime: 219,665 ms
Warmup Runtime: 220,341 ms
Warmup Runtime: 219,312 ms
Measurement Runtime: 219,020 ms
Measurement Runtime: 223,180 ms
Measurement Runtime: 226,418 ms
# BRC27_SmallPutST - DO CPU 16c
Warmup Runtime: 38,792 ms
Warmup Runtime: 37,929 ms
Warmup Runtime: 39,136 ms
Measurement Runtime: 39,197 ms
Measurement Runtime: 39,103 ms
Measurement Runtime: 39,106 ms
The cloud sucks when benchmarking
# Google Cloud C2 - 4c - 32 GB
# BRC01_BaselineST - DO CPU 16c
Measurement Runtime: 281,448 ms
Measurement Runtime: 278,622 ms
Measurement Runtime: 281,130 ms
Measurement Runtime: 280,942 ms
Measurement Runtime: 281,287 ms
# BRC26_MoreMapSpaceST - DO CPU 16c
Measurement Runtime: 43,390 ms
Measurement Runtime: 44,003 ms
Measurement Runtime: 43,634 ms
Measurement Runtime: 43,682 ms
Measurement Runtime: 44,305 ms
# Google Cloud C2D - 4c - 32 GB
# BRC01_BaselineST - DO CPU 16c
Measurement Runtime: 172,498 ms
Measurement Runtime: 197,795 ms
Measurement Runtime: 195,888 ms
Measurement Runtime: 195,546 ms
Measurement Runtime: 195,896 ms
# BRC26_MoreMapSpaceST - DO CPU 16c
Measurement Runtime: 38,891 ms
Measurement Runtime: 34,864 ms
Measurement Runtime: 35,682 ms
Measurement Runtime: 35,691 ms
Measurement Runtime: 35,712 ms
The first 80% are not too hard.
Most of the concepts seen here are applicable to other problems.
Change one thing at a time and measure.
Most expensive first, unless something is simpler to fix.