#1BRC - The First 80%

How to Read
One Billion Rows
Fast Enough

René Schwietzke, Xceptance GmbH

About René Schwietzke

  • Master of Computer Science (Dipl.-Inf.)
  • Programmer since 1989
  • Java since Java 1.0, 1996
  • QA and Testing since 1998
  • Performance Tester since 1999
  • Co-Founder of
  •    @ReneSchwietzke
  • @reneschwietzke@foojay.social

About

  • Founded 2004
  • Headquarters in Jena, Germany; Subsidiary in Cambridge, MA, USA
  • Specialized in Software Testing and Quality Assurance
  • Performance testing since 2004
  • Over 150 performance test projects every year
  • World-wide customer base including APAC and South America
  • Performance Test Tool , Java-based, APL 2.0

License

I care and share

c b

This presentation is licensed under
Creative Commons Attribution 4.0 International License

#1BRC - Introduction

What is 1BRC and Why the Fuzz?

The Origin

A busy January 2024 for Java Devs

  • Gunnar Morling got bored during the holidays
  • His wife suggested to do something nice and fun
  • So he asked the community the #1brc question

... write a Java program for retrieving temperature measurement values from a text file and calculate the min, mean, and max temperatures per weather station.
... one caveat, the file has 1,000,000,000 rows!

Data

Input


Zagreb;19.7
Havana;36.2
Mexicali;27.3
Marseille;15.4
Banjul;24.2
Canberra;0.8
Mandalay;19.3
Bridgetown;29.6
Chihuahua;10.9
Honiara;35.3
Gangtok;-1.7
Houston;32.4
Zürich;13.8
Kolkata;20.6
Tripoli;8.9
Karachi;28.3
Zagreb;13.2
Palm Springs;20.3
Lhasa;6.2
Sacramento;28.3
Edmonton;5.6
Toluca;14.0
El Paso;24.0
Antsiranana;24.9
Tromsø;-3.9
        

Output


# This is TreeMap::toString
# Data shown here is formatted for screen
{
   Abha=5.0/18.0/27.4, Abidjan=15.7/26.0/34.1,
   Abéché=12.1/29.4/35.6, Accra=14.7/26.4/33.1,
   Addis Ababa=2.1/16.0/24.3, Adelaide=4.1/17.3/29.7,
   ...
   Willemstad=-7.9,28.2,57.7, Winnipeg=-28.1,3.0,39.4,
   Wrocław=-27.5,9.6,50.2, Xi'an=-22.5,14.0,49.1,
   Yakutsk=-42.2,-8.8,26.4, Yangon=-2.6,27.6,59.0,
   Yaoundé=-10.1,23.6,58.3, Yellowknife=-38.4,-4.0,40.0,
   Yerevan=-18.4,12.4,49.6, Yinchuan=-25.4,9.5,42.0,
   Zagreb=-25.3,10.6,48.4, Zanzibar City=-5.0,26.1,61.2,
   Zürich=-20.9,9.4,43.5, Ürümqi=-26.8,7.4,43.7,
   İzmir=-15.8,18.0,50.1
}
    

Rules

  • Java Only
  • JDKs via SDKMAN only, any version!
  • Any CLI parameters permitted
  • No external dependencies
  • Results measured on bare metal
  • File in RAM (RAMdisk or file cache)
  • No preprocessing of the data permitted
  • Five measurements, middle three averaged
  • Max 10k stations - initial version had 413
  • Only valid data in the file, UTF-8
  • Line ending is \n
  • City names max 100 bytes (unicode) long
  • Temperatures -99.9 to 99.9 (e.g. 5.0, 0.0)
  • IEEE 754 rounding-direction "roundTowardPositive"

The Results

What do you think? Guess some numbers.

TestCoresTime
Digital Ocean Intel
Time
Hetzner AMD
Baseline1200.7 s150.4 s
Baseline899.8 s93.1 s
Thomas Wuerthinger18.9 s7.3 s
Thomas Wuerthinger81.2 s1.6 s
Today's 20% Goal140.0 s30.1 s

Let's Challenge Ourselves

What We Will Do and What We Won't

Some Updated Rules

Just to Make Things Easier

  • Permit 1M, 10M, 100M and 1000M files
  • 10M is 1M plus new 9M and so on
  • Measure without process startup
  • Isolate test code for reuse
  • Allow to learn more about the JVM
  • One Simple Benchmark Framework - 1SBF
  • Supports warmup and measurements rounds
  • Calculates mean of measurements
  • New instance of the test class per round
  • SHA-512 of result to discover differences
  • Supports batchmode

Limitations

Some limitations apply today

  • Talk time is not enough for all details
  • 80% doesn't require all tricks
  • No Über-Code today!
  • Über-code becomes hard to understand
  • Real world gains vary a lot

Goal: All code is still easily understandable.

  • No Unsafe
  • No Vector API
  • No Graal
  • Almost No Bit-Magic
  • No Branchless Code
  • No Memory-Mapped Files
  • No JNI/Panama

The Baseline

The Original Code


public String run(final String fileName) throws IOException {
    Collector<Measurement, MeasurementAggregator, ResultRow> collector = Collector.of(MeasurementAggregator::new,
            (agg, m) -> {
                agg.min = Math.min(agg.min, m.value);
                agg.max = Math.max(agg.max, m.value);
                agg.sum += m.value;
                agg.count++;
            }, (agg1, agg2) -> {
                var res = new MeasurementAggregator();
                res.min = Math.min(agg1.min, agg2.min);
                res.max = Math.max(agg1.max, agg2.max);
                res.sum = agg1.sum + agg2.sum;
                res.count = agg1.count + agg2.count;

                return res;
            }, agg -> {
                return new ResultRow(agg.min, (Math.round(agg.sum * 10.0) / 10.0) / agg.count, agg.max);
            });

    // Measurement is a record
    var result = Files.lines(Paths.get(fileName)).parallel() // <-- optional
            .map(l -> l.split(";"))
            .map(l -> new Measurement(l))
            .collect(groupingBy(m -> m.station(), collector));

    return new TreeMap<>(result).toString();
}
    

The Results

TestCoresTime DOTime HE
Baseline 1 205.0 s 146.8 s
Baseline 8 96.8 s 84.1 s

Flamegraph

run78.4 %
readLine17.7 %
split31.7 %
parseDouble11.7 %
store/reduce14.4 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC01_BaselineST measurements.txt 2 2 \

The Second Baseline

Stream Tuning is Too Complicated


/**
 * A more classic version
 */
public String run(final String fileName) throws IOException
{
    final Map<String, Temperatures> cities = new HashMap<>();

    try (var reader = Files.newBufferedReader(Paths.get(fileName)))
    {
        String line;
        while ((line = reader.readLine()) != null)
        {
            // split the line
            final String[] cols = line.split(";");

            // get us the data to store
            final String city = cols[0];
            final double temperature = Double.parseDouble(cols[1]);

            // store and sum up the data
            // Map::merge(key, value, remappingFunction)
            cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
        }
    }

    // ok, we got everything, now we need to order it
    return new TreeMap<String, Temperatures>(cities).toString();
}
    

The Results

TestTime DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.8 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %

Flamegraph

Area 01 03
run 78.4 % 80.4 %
readLine 17.7 % 17.9 %
split 31.7 % 26.8 %
parseDouble 11.7 % 12.2 %
merge 14.4 % 21.0 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC03_NoStreamST measurements.txt 2 2 \

Split is Slow

Get Rid Off split


public String run(final String fileName) throws IOException
{
    // our cities with temperatures
    final Map<String, Temperatures> cities = new HashMap<>();

    try (var reader = Files.newBufferedReader(Paths.get(fileName)))
    {
        String line;
        while ((line = reader.readLine()) != null)
        {
            // split the line
            final int pos = line.indexOf(';');

            // get us the city
            final String city = line.substring(0, pos);
            final String temperatureAsString = line.substring(pos + 1);

            // parse our temperature
            final double temperature = Double.parseDouble(temperatureAsString);

            // merge the data into the captured data
            cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
        }
    }

    // ok, we got everything, now we need to order it and print it
    return new TreeMap<String, Temperatures>(cities).toString();
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %

But doesn't split shortcut?


private String[] split(String regex, int limit, boolean withDelimiters) {
    /* fastpath if the regex is a
     * (1) one-char String and this character is not one of the
     *     RegEx's meta characters ".$|()[{^?*+\\", or
     * (2) two-char String and the first char is the backslash and
     *     the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.length() == 1 &&
            ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
            (regex.length() == 2 &&
                    regex.charAt(0) == '\\' &&
                    (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
                    ((ch-'a')|('z'-ch)) < 0 &&
                    ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
                    ch > Character.MAX_LOW_SURROGATE))
    {
        // All the checks above can potentially be constant folded by
        // a JIT/AOT compiler when the regex is a constant string.
        // ...
        // This split now creates objects
        return split(ch, limit, withDelimiters);
    }
    ...
}
    

Flamegraph

Area 01 03 05
run 78.4 % 80.4 % 81.3 %
readLine 17.7 % 17.9 % 31.6 %
split 31.7 % 26.8 % n/a
indexOf n/a n/a 2.2 %
subString n/a 9.2 % 7.6 %
parseDouble 11.7 % 12.2 % 14.8 %
merge 14.4 % 21.0 % 23.9 %

Double Parsing

Make double Parsing Less Generic

The Problem in 224 Lines


static ASCIIToBinaryConverter readJavaFormatString( String in ) throws NumberFormatException {
    boolean isNegative = false;
    boolean signSeen   = false;
    int     decExp;
    char    c;

parseNumber:
    try{
        in = in.trim(); // don't fool around with white space.
                        // throws NullPointerException if null
        int len = in.length();
        if ( len == 0 ) {
            throw new NumberFormatException("empty String");
        }
        int i = 0;
        switch (in.charAt(i)){
        case '-':
            isNegative = true;
            //FALLTHROUGH
        case '+':
            i++;
            signSeen = true;
        }
        c = in.charAt(i);
        if(c == 'N') { // Check for NaN
            if((len-i)==NAN_LENGTH && in.indexOf(NAN_REP,i)==i) {
                return A2BC_NOT_A_NUMBER;
            }
            // something went wrong, throw exception
            break parseNumber;
        } else if(c == 'I') { // Check for Infinity strings
            if((len-i)==INFINITY_LENGTH && in.indexOf(INFINITY_REP,i)==i) {
                return isNegative? A2BC_NEGATIVE_INFINITY : A2BC_POSITIVE_INFINITY;
            }
            // something went wrong, throw exception
            break parseNumber;
        } else if (c == '0')  { // check for hexadecimal floating-point number
            if (len > i+1 ) {
                char ch = in.charAt(i+1);
                if (ch == 'x' || ch == 'X' ) { // possible hex string
                    return parseHexString(in);
                }
            }
        }  // look for and process decimal floating-point string

        char[] digits = new char[ len ];
        boolean decSeen = false;
        int nDigits = 0;
        int decPt = 0;
        int nLeadZero = 0;
        int nTrailZero = 0;

    skipLeadingZerosLoop:
        while (i < len) {
            c = in.charAt(i);
            if (c == '0') {
                nLeadZero++;
            } else if (c == '.') {
                if (decSeen) {
                    // already saw one ., this is the 2nd.
                    throw new NumberFormatException("multiple points");
                }
                decPt = i;
                if (signSeen) {
                    decPt -= 1;
                }
                decSeen = true;
            } else {
                break skipLeadingZerosLoop;
            }
            i++;
        }
    digitLoop:
        while (i < len) {
            c = in.charAt(i);
            if (c >= '1' && c <= '9') {
                digits[nDigits++] = c;
                nTrailZero = 0;
            } else if (c == '0') {
                digits[nDigits++] = c;
                nTrailZero++;
            } else if (c == '.') {
                if (decSeen) {
                    // already saw one ., this is the 2nd.
                    throw new NumberFormatException("multiple points");
                }
                decPt = i;
                if (signSeen) {
                    decPt -= 1;
                }
                decSeen = true;
            } else {
                break digitLoop;
            }
            i++;
        }
        nDigits -=nTrailZero;
        //
        // At this point, we've scanned all the digits and decimal
        // point we're going to see. Trim off leading and trailing
        // zeros, which will just confuse us later, and adjust
        // our initial decimal exponent accordingly.
        // To review:
        // we have seen i total characters.
        // nLeadZero of them were zeros before any other digits.
        // nTrailZero of them were zeros after any other digits.
        // if ( decSeen ), then a . was seen after decPt characters
        // ( including leading zeros which have been discarded )
        // nDigits characters were neither lead nor trailing
        // zeros, nor point
        //
        //
        // special hack: if we saw no non-zero digits, then the
        // answer is zero!
        // Unfortunately, we feel honor-bound to keep parsing!
        //
        boolean isZero = (nDigits == 0);
        if ( isZero &&  nLeadZero == 0 ){
            // we saw NO DIGITS AT ALL,
            // not even a crummy 0!
            // this is not allowed.
            break parseNumber; // go throw exception
        }
        //
        // Our initial exponent is decPt, adjusted by the number of
        // discarded zeros. Or, if there was no decPt,
        // then its just nDigits adjusted by discarded trailing zeros.
        //
        if ( decSeen ){
            decExp = decPt - nLeadZero;
        } else {
            decExp = nDigits + nTrailZero;
        }

        //
        // Look for 'e' or 'E' and an optionally signed integer.
        //
        if ( (i < len) &&  (((c = in.charAt(i) )=='e') || (c == 'E') ) ){
            int expSign = 1;
            int expVal  = 0;
            int reallyBig = Integer.MAX_VALUE / 10;
            boolean expOverflow = false;
            switch( in.charAt(++i) ){
            case '-':
                expSign = -1;
                //FALLTHROUGH
            case '+':
                i++;
            }
            int expAt = i;
        expLoop:
            while ( i < len  ){
                if ( expVal >= reallyBig ){
                    // the next character will cause integer
                    // overflow.
                    expOverflow = true;
                }
                c = in.charAt(i++);
                if(c>='0' && c<='9') {
                    expVal = expVal*10 + ( (int)c - (int)'0' );
                } else {
                    i--;           // back up.
                    break expLoop; // stop parsing exponent.
                }
            }
            int expLimit = BIG_DECIMAL_EXPONENT + nDigits + nTrailZero;
            if (expOverflow || (expVal > expLimit)) {
                // There is still a chance that the exponent will be safe to
                // use: if it would eventually decrease due to a negative
                // decExp, and that number is below the limit.  We check for
                // that here.
                if (!expOverflow && (expSign == 1 && decExp < 0)
                        && (expVal + decExp) < expLimit) {
                    // Cannot overflow: adding a positive and negative number.
                    decExp += expVal;
                } else {
                    //
                    // The intent here is to end up with
                    // infinity or zero, as appropriate.
                    // The reason for yielding such a small decExponent,
                    // rather than something intuitive such as
                    // expSign*Integer.MAX_VALUE, is that this value
                    // is subject to further manipulation in
                    // doubleValue() and floatValue(), and I don't want
                    // it to be able to cause overflow there!
                    // (The only way we can get into trouble here is for
                    // really outrageous nDigits+nTrailZero, such as 2
                    // billion.)
                    //
                    decExp = expSign * expLimit;
                }
            } else {
                // this should not overflow, since we tested
                // for expVal > (MAX+N), where N >= abs(decExp)
                decExp = decExp + expSign*expVal;
            }

            // if we saw something not a digit ( or end of string )
            // after the [Ee][+-], without seeing any digits at all
            // this is certainly an error. If we saw some digits,
            // but then some trailing garbage, that might be ok.
            // so we just fall through in that case.
            // HUMBUG
            if ( i == expAt ) {
                break parseNumber; // certainly bad
            }
        }
        //
        // We parsed everything we could.
        // If there are leftovers, then this is not good input!
        //
        if ( i < len &&
            ((i != len - 1) ||
            (in.charAt(i) != 'f' &&
             in.charAt(i) != 'F' &&
             in.charAt(i) != 'd' &&
             in.charAt(i) != 'D'))) {
            break parseNumber; // go throw exception
        }
        if(isZero) {
            return isNegative ? A2BC_NEGATIVE_ZERO : A2BC_POSITIVE_ZERO;
        }
        return new ASCIIToBinaryBuffer(isNegative, decExp, digits, nDigits);
    } catch ( StringIndexOutOfBoundsException e ){ }
    throw new NumberFormatException("For input string: \"" + in + "\"");
}

    

public static double parseDouble(final String s) {
    return parseDouble(s, 0, s.length() - 1);
}

private static final double[] multipliers = {
    1, 1, 0.1, 0.01, 0.001, 0.000_1, 0.000_01, 0.000_001, 0.000_000_1, 0.000_000_01,
    0.000_000_001, 0.000_000_000_1, 0.000_000_000_01, 0.000_000_000_001, 0.000_000_000_000_1,
    0.000_000_000_000_01, 0.000_000_000_000_001, 0.000_000_000_000_000_1, 0.000_000_000_000_000_01};

public static double parseDouble(final String s, final int offset, final int end) {
    final int negative = s.charAt(offset) == '-' ? offset + 1 : offset;

    long value = 0;
    int decimalPos = end;

    for (int i = negative; i <= end; i++) {
        final int d = s.charAt(i);
        if (d == '.') {
            decimalPos = i;
            continue;
        }
        final int v = d - DIGITOFFSET;
        value = ((value << 3) + (value << 1));
        value += v;
    }

    // adjust the decimal places
    value = negative != offset ? -value : value;
    return value * multipliers[end - decimalPos + 1];
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %

Flamegraph

Area 05 06
run 81.3 % 76.2 %
readLine 31.6 % 32.8 %
indexOf 2.2 % 2.2 %
subString 7.6 % 9.9 %
parseDouble 14.8 % 4.4 %
merge 23.9 % 25.8 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC06_NewDoubleParsingST measurements.txt 2 2 \

Substring

One Less String


while ((line = reader.readLine()) != null)
{
    // split the line
    final int pos = line.indexOf(';');

    // get us the city
    final String city = line.substring(0, pos);

    // parse our temperature inline without an instance of a string for temperature
    final double temperature = ParseDouble.parseDouble(line, pos + 1, line.length() - 1);

    // merge the data into the captured data
    cities.merge(city, new Temperatures(temperature), (t1, t2) -> t1.merge(t2));
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %

Flamegraph

Area 05 06 07
run 81.3 % 76.2 % 76.4 %
readLine 31.6 % 32.8 % 29.6 %
indexOf 2.2 % 2.2 % 2.6 %
subString 7.6 % 9.9 % 5.8 %
parseDouble 14.8 % 4.4 % 6.0 %
merge 23.9 % 25.8 % 28.7 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC07_NoCopyForDoubleST measurements.txt 2 2 \

Double is Double the Work

Ignore double as longint as Possible


/**
 * Parses a double but ends up with an int, only because we know
 * the format of the results -99.9 to 99.9
 */
public static int parseInteger(final String s, final int offset, final int end)
{
    final int negative = s.charAt(offset) == '-' ? offset + 1 : offset;

    int value = 0;

    for (int i = negative; i <= end; i++)
    {
        final int d = s.charAt(i);
        if (d == '.')
        {
            continue;
        }
        final int v = d - DIGITOFFSET;
        // multiply by 10 = multiply by 8 + multiply by 2
        value = ((value << 3) + (value << 1));
        value += v;
    }

    value = negative != offset ? -value : value;
    return value;
}
    

The Results

Measured on Digital Ocean

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %

Flamegraph

Area 05 06 07 08
run 81.3 % 76.2 % 76.4 % 79.3 %
readLine 31.6 % 32.8 % 29.6 % 34.4 %
indexOf 2.2 % 2.2 % 2.6 % 2.5 %
subString 7.6 % 9.9 % 5.8 % 6.0 %
parseDouble 14.8 % 4.4 % 6.0 % 8.2 %
merge 23.9 % 25.8 % 28.7 % 24.2 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC08_GoIntST measurements.txt 2 2 \

Remove Lambda

Explicit Merging


while ((line = reader.readLine()) != null)
{
    // split the line
    final int pos = line.indexOf(';');

    // get us the city
    final String city = line.substring(0, pos);

    // parse our temperature inline without an instance of a string for temperature
    final int temperature = ParseDouble.parseInteger(line, pos + 1, line.length() - 1);

    // get city and update
    final var v = cities.get(city);
    final var t = new Temperatures(temperature);
    cities.put(city, v != null ? v.merge(t) : t);
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %

Flamegraph

run 80.1 %
readLine 39.1 %
indexOf 2.8 %
substring 7.9 %
parseInteger6.3 %
get 10.9 %
merge 2.3 %
put 5.2 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC09_NoMergeST measurements.txt 2 2 \

Mutablility to the Rescue

Less Memory Churn


while ((line = reader.readLine()) != null)
{
    ...
    // get city
    final Temperatures v = cities.get(city);
    if (v != null)
    {
        // know it, put both together
        v.add(temperature);
    }
    else
    {
        // we have not seen that city yet, create a container and store it
        cities.put(city, new Temperatures(temperature));
    }
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %

Flamegraph

Area 09 10
run 80.3 % 71.3 %
readLine 39.1 % 38.1 %
indexOf 2.8 % 2.4 %
substring 7.9 % 6.2 %
parseInteger 6.3 % 6.6 %
get 10.9 % 15.7 %
merge 2.3 % 0.0 %
put 5.2 % 0.0 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC10_MutateST measurements.txt 2 2 \

Open Hashing

Simplify the Map


// https://github.com/mikvor/hashmapTest/blob/master/src/main/java/map/objobj/ObjObjMap.java
public class FastHashMap<K, V> {
    private static final Object FREE_KEY = new Object();
    private Object[] m_data;
...
    public V get( final K key ) {
        int ptr = (key.hashCode() & m_mask) << 1;
        Object k = m_data[ ptr ];

        if ( k == FREE_KEY ) {
            return null;  //end of chain already
        }

        if ( k.hashCode() == key.hashCode() && k.equals( key ) ) {
            return (V) m_data[ ptr + 1 ];
        }

        while ( true ) {
            ptr = (ptr + 2) & m_mask2; //that's next index
            k = m_data[ ptr ];
            if ( k == FREE_KEY ) {
                return null;
            }
            if (k.equals( key )) {
                return (V) m_data[ ptr + 1 ];
            }
        }
    }
...
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %

Flamegraph

Area 12
run74.1 %
readLine 37.9 %
indexOf 2.0 %
substring 7.7 %
parseInteger8.3 %
get 15.4 %
merge 0.0 %
put 0.0 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC12_NewMapST measurements.txt 2 2 \

Set instead of Map

Eliminate Slots


public void getPutOrUpdate( final String city, int value ) {
    final int hash = city.hashCode();
    int ptr = hash & m_mask;
    Temperatures k = m_data[ ptr ];

    if ( k == FREE_KEY ) {
        put(new Temperatures(city, value));
        return;
    }

    if ( k.equals( city ) ) {
        k.add(value);
        return;
    }

    while ( true ) {
        ptr = (ptr + 1) & m_mask; //that's next index
        k = m_data[ ptr ];
        if ( k == FREE_KEY ) {
            put(new Temperatures(city, value));
            return;
        }
        if ( k.equals( city ) ) {
            k.add(value);
            return;
        }
    }
}
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %
BRC13_HardcodedSetST 102.5 s 46.8 % 77.7 s 52.9 %

Flamegraph

run 72.1 %
readLine 38.2 %
indexOf 3.2 %
substring 4.5 %
parseInteger6.7 %
putOrUpdate16.1 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC13_HardcodedSetST measurements.txt 2 2 \

Heavy Lifting

Approach IO Differently

Think Harder

Sit back, relax, and enjoy some music

  • Think about the data
  • String;Double\n
  • Bytes
  • Bytes;Bytes\n
  • BytesByteBytesByte
  • Bytes{1-100};-?[0-9]{1-2}.[0-9]\n
  • Forget String
  • Forget char
  • Delay Unicode processing
  • Forget error handling
  • Read and process just bytes

try (var raf = new RandomAccessFile(fileName, "r");
     var channel = raf.getChannel();)
{
    final Line line = new Line(channel);

    while (true)
    {
        line.readFromChannel();

        if (line.hasNewLine)
        {
            // parse our temperature inline without an instance of a string for temperature
            final int temperature = ParseDouble.parseInteger(line.data,
                line.semicolonPos + 1, line.newlinePos - 1);

            // find and update
            cities.getPutOrUpdate(line, temperature);
        }
        if (line.EOF)
        {
            break;
        }
    }
}
    

private ByteBuffer buffer = ByteBuffer.allocate(MIN_BUFFERSIZE);
private byte[] data = buffer.array();

private void readFromChannel()
{
    hashCode = -1;
    hasNewLine = false;

    try {
        // do we near the end of the buffer?
        if (end - pos < REMAINING_MIN_BUFFERSIZE)
        {
            // we move the buffer indirectly, because
            // the ByteBuffer just wraps our array,
            // nothing for the tenderhearted
            System.arraycopy(data, pos,
                data, 0, data.length - pos);
            end = end - pos;
            pos = 0;
            buffer.position(end);

            // fill the buffer up
            final int readBytes = channel.read(buffer);
            if (readBytes == -1) {
                EOF = true;
            }

            end = buffer.position();
        }
    }
    catch (IOException e) {
        ...
    }
    
    lineStartPos = pos;

    // look for semicolon and new line
    int i = pos;
    for (; i < end; i++)
    {
    	final byte b = data[i];
    	if (b == ';')
    	{
    		semicolonPos = i;
    		break;
    	}
    }

    i++;
    for (; i < end; i++)
    {
        final byte b = data[i];
    	if (b == '\n')
    	{
    		newlinePos = i;
    		pos = i + 1;
    		hasNewLine = true;
    		return;
    	}
    }
}





    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %
BRC13_HardcodedSetST 102.5 s 46.8 % 77.7 s 52.9 %
BRC14_ReadBytesST 53.7 s 24.5 % 45.6 s 31.1 %

Flamegraph

run 80.2 %
read 2.1 %
parse line 18.4 %
parseInteger8.3 %
putOrUpdate48.6 %
hashCode26.6 %
equals14.4 %
put/update7.6 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC14_ReadBytesST measurements.txt 2 2 \

A Surprise

Enjoy a Treat

Some things in life come free

A benefit we have not asked for


# Run with max heap 4 GB for 10 M lines
java -Xlog:gc*=debug \
     -Xms4m -Xmx4g \
     -XX:+UnlockExperimentalVMOptions \
     -XX:+UseEpsilonGC \
     -cp target/classes/ \
     org.rschwietzke.devoxxpl24.BRC01_BaselineST \
     data-10m.txt 0 1

[3.726s][info][gc] Heap: 4096M reserved,
                   3204M (78.22%) committed,
                   3199M (78.11%) used
        

1000 M lines = 100x 3.2 GB = 320 GB


java ... \
     org.rschwietzke.devoxxpl24.BRC13_HardcodedSetST \
     data-10m.txt 0 1

[2,035s][info][gc] Heap: 4096M reserved,
                   1129M (27,57%) committed,
                   1037M (25,33%) used
        

1000 M lines = 100x 1.2 GB = 120 GB


# Run with max heap 4 MB for 1000 M lines
java -Xlog:gc*=debug \
     -Xms4m -Xmx4m \
     -XX:+UnlockExperimentalVMOptions \
     -XX:+UseEpsilonGC \
     -cp target/classes/ \
     org.rschwietzke.devoxxpl24.BRC14_ReadBytesST \
     data-1000m.txt 0 1

[55.401s][info][gc] Heap: 4096K reserved,
                    4096K (100.00%) committed,
                    3634K (88.73%) used
        

1000 M lines = 3.7 MB

Some Cheapish Tuning

Attempting to Save on Number Parsing


/**
 * Parses a double but ends up with an int, only because we know
 * the format of the results -99.9 to 99.9
 */
public static int parseIntegerFixed(final byte[] b, final int offset, final int end) {
    final int length = end - offset; // one is missing, we care for that later

    // we know the first three pieces already 9.9
    int p0 = b[end];
    int p1 = b[end - 2] * 10;
    int value = p0 + p1 - (DIGITOFFSET + DIGITOFFSET * 10);

    // we are 9.9
    if (length == 2) {
    	return value;
    }

    // ok, we are either -9.9 or 99.9 or -99.9
    if (b[offset] != (byte)'-') {
    	// we are 99.9
    	value += b[end - 3] * 100 - DIGITOFFSET * 100;
    	return value;
    }

    // we are either -99.9 or -9.9
    if (length == 3) {
    	// -9.9
    	return -value;
    }

    // -99.9
    value += b[end - 3] * 100 - DIGITOFFSET * 100;
    return -value;
}

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %
BRC13_HardcodedSetST 102.5 s 46.8 % 77.7 s 52.9 %
BRC14_ReadBytesST 53.7 s 24.5 % 45.6 s 31.1 %
BRC15_ParseDoubleFixedST 51.1 s 23.3 % 42.8 s 29.2 %

Flamegraph

Area 14 15
run 80.2 % 78.0 %
read 2.1 % 3.9 %
parse line 18.4 % 18.3 %
parseInteger 8.3 % 9.0 %
putOrUpdate 48.6 % 44.3 %
hashCode 26.6 % 26.6 %
equals 14.4 % 14.4 %
put/update 7.6 % 7.6 %

java \
    -agentpath:async-profiler/lib/libasyncProfiler.so=start,event=cpu,sig,flamegraph,file=$1-cpu.html \
    -cp target/classes/ \
    org.rschwietzke.devoxxpl24.BRC15_ParseDoubleFixedST measurements.txt 2 2 \

Changing Gears

Tuning is Not That Simple Anymore

More Ideas

Hot but not really hot, because we need that code

  • hashCode still on top
  • equals for the byte array
  • parsing the line is still expensive
  • Think harder first
  • Resort to other metrics for further tuning

Simpler Min/Max


public void add(final int value)
{
    this.min = Math.min(this.min, value);
    this.max = Math.max(this.max, value);
    this.total += value;
    this.count++;
}





 
    

public void add(final int value)
{
    if (value < this.min)
    {
        this.min = value;
    }
    else if (value > this.max)
    {
        this.max = value;
    }
    this.total += value;
    this.count++;
}
    

HashCode


// OLD: look for semicolon and new line
int i = pos;
for (; i < end; i++)
{
    final byte b = data[i];
    if (b == ';')
    {
        semicolonPos = i;
        break;
    }
}

// Later
public int ByteBuffer::hashCode() {
    int h = 1;
    int p = position();
    for (int i = limit() - 1; i >= p; i--)
        h = 31 * h + (int)get(i);

    return h;
}

// We loop twice, when searching the ; and when
// calculating the hashcode later.
// Do it in one go, will safe also a comparison

// NEW: look for semicolon and new line
// when checking for semicolon, we do the
// hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
    final byte b = data[i];
    if (b == ';')
    {
        semicolonPos = i;
        break;
    }
    h = 31 * h + b;
}
this.hashCode = h;
 
        

HashCode


// we loop twice, when searching the ; and when calculating the hashcode later
// do it in one go, will safe also a comparison

// look for semicolon and new line
// when checking for semicolon, we do the hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
    final byte b = data[i];
    if (b == ';')
    {
        semicolonPos = i;
        break;
    }
    h = 31 * h + b;
}
this.hashCode = h;
    

// remove the multiplication and
// use bit shift and subtraction

// look for semicolon and new line
// when checking for semicolon, we do the hashcode right away
int h = 1;
int i = pos;
for (; i < end; i++)
{
    final byte b = data[i];
    if (b == ';')
    {
        semicolonPos = i;
        break;
    }
    h = (h << 5) - h + b;
}
this.hashCode = h;
    

The Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %
BRC13_HardcodedSetST 102.5 s 46.8 % 77.7 s 52.9 %
BRC14_ReadBytesST 53.7 s 24.5 % 45.6 s 31.1 %
BRC15_ParseDoubleFixedST 51.1 s 23.3 % 42.8 s 29.2 %
BRC21_ManualMinMaxST 52.0 s 23.7 % 42.2 s 28.8 %
BRC22_EarlyHashCodeST 40.0 s 18.3 % 35.5 s 24.2 %

More Playing Around and Results

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.9 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC06_NewDoubleParsingST 127.9 s 58.4 % 99.7 s 67.9 %
BRC07_NoCopyForDoubleST 120.4 s 55.0 % 89.5 s 61.0 %
BRC08_GoIntST 117.4 s 53.6 % 88.3 s 60.1 %
BRC09_NoMergeST 121.3 s 55.4 % 95.6 s 65.0 %
BRC10_MutateST 100.6 s 45.9 % 85.6 s 58.3 %
BRC12_NewMapST 105.0 s 47.9 % 79.3 s 54.0 %
BRC13_HardcodedSetST 102.5 s 46.8 % 77.7 s 52.9 %
BRC14_ReadBytesST 53.7 s 24.5 % 45.6 s 31.1 %
BRC15_ParseDoubleFixedST 51.1 s 23.3 % 42.8 s 29.2 %
BRC21_ManualMinMaxST 52.0 s 23.7 % 42.2 s 28.8 %
BRC22_EarlyHashCodeST 40.0 s 18.3 % 35.5 s 24.2 %
BRC23_NoMulST 40.3 s 18.4 % 32.6 s 22.2 %
BRC24_DifferentBranchInHashCodeST 40.4 s 18.5 % 34.0 s 23.1 %
BRC25_SmallAddReordingST 39.9 s 18.2 % 32.2 s 21.9 %
BRC26_MoreMapSpaceST 39.0 s 17.8 % 33.5 s 22.8 %
BRC27_SmallPutST 37.9 s 17.3 % 33.6 s 22.8 %
BRC30_DangerNoEqualsST 28.4 s 13.0 % 22.9 s 15.6 %

The Results - More Fiddling

Test Time DO % Time HE %
BRC01_BaselineST 219.1 s 100.0 % 146.8 s 100.0 %
BRC03_NoStreamST 197.9 s 90.3 % 140.9 s 96.0 %
BRC05_ReplaceSplitST 151.1 s 69.0 % 112.3 s 76.5 %
BRC14_ReadBytesST 53.7 s 24.5 % 45.6 s 31.1 %
BRC26_MoreMapSpaceST 39.0 s 17.8 % 33.4 s 22.8 %
BRC27_SmallPutST 37.9 s 17.3 % 33.4 s 22.9 %
BRC28_FineTuningST 38.7 s 17.0 % 32.2 s 22.0 %
BRC29a_ParseDoubleTuningST 38.8 s 17.0 % 31.1 s 21.2 %
BRC29g_FixedIntParsing 38.7 s 17.0 % 31.0 s 21.1 %
BRC45_DoubleTheSetSize 27.0 s 18.4 %
BRC49_OffsetSubtraction 26.8 s 18.3 %

Final Thoughts and Numbers

A Few Last Data Tables

Ideas we utilized

  • Use the set boundaries to your advantage (data format)
  • Replace standard JDK functionality
  • Low or no allocation of memory
  • Utilize memory caching and keeping data together
  • Wrappers are bad
  • Immutability is bad
  • Less branches (if, loops) and no branch misses
  • CPU pipelining, do more than one thing at a time
  • Utilize inlining

Behind the Scenes


# org.rschwietzke.devoxxpl24.BRC01_BaselineST
# Hetzner, AMD, 21.0.4-tem
# Measurement Runtime: 150,932 ms 100%

       160,915.21 ms task-clock              #   1.066 CPUs utilized
           37,195    context-switches        # 231.147 /sec
            6,232    cpu-migrations          #  38.728 /sec
          236,617    page-faults             #   1.470 K/sec
  539,415,718,644    cycles                  #   3.352 GHz
   41,233,849,888    stalled-cycles-frontend #   7.64% frontend cycles idle
2,258,010,166,520    instructions            #    4.19 insn per cycle
                                             #    0.02 stalled cycles per insn
  522,584,792,160    branches                #   3.248 G/sec
    2,479,441,210    branch-misses           #   0.47% of all branches

    150.932228741 seconds time elapsed

    145.057695000 seconds user
     11.626114000 seconds sys


# org.rschwietzke.devoxxpl24.BRC45_DoubleTheSetSize
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 27,061 ms 17.9% (to baseline)

       27,538.78 ms task-clock              #   1.018 CPUs utilized
           1,029    context-switches        #  37.365 /sec
              29    cpu-migrations          #   1.053 /sec
           7,885    page-faults             # 286.324 /sec
  98,276,507,946    cycles                  #   3.569 GHz
  10,431,399,857    stalled-cycles-frontend #  10.61% frontend cycles idle
 303,659,521,159    instructions            #    3.09 insn per cycle
                                            #    0.03 stalled cycles per insn
  55,868,360,010    branches                #   2.029 G/sec
   1,540,980,291    branch-misses           #   2.76% of all branches

    27.061235813 seconds time elapsed

    26.366348000 seconds user
     1.062075000 seconds sys

What is left?

There is still room

  • byte limits, reads only 8 bit at a time but could do 64 bit instead (factor 8!)
  • int or long would be better (need for UNSAFE or different ByteBuffer usage)
  • mmap a file instead of filling a buffer
  • Multi-threading
  • Tuning for inlining and CPU pipelining still possible
  • Reduce branching further, especially misses

Behind the Scenes


# org.rschwietzke.devoxxpl24.BRC45_DoubleTheSetSize
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 27,061 ms 17.9% (to baseline)

      27,538.78 msec task-clock              #   1.018 CPUs utilized
          1,029      context-switches        #  37.365 /sec
             29      cpu-migrations          #   1.053 /sec
          7,885      page-faults             # 286.324 /sec
 98,276,507,946      cycles                  #   3.569 GHz
 10,431,399,857      stalled-cycles-frontend #  10.61% frontend cycles idle
303,659,521,159      instructions            #    3.09 insn per cycle
                                             #    0.03 stalled cycles per insn
 55,868,360,010      branches                #   2.029 G/sec
  1,540,980,291      branch-misses           #   2.76% of all branches

   27.061235813 seconds time elapsed

   26.366348000 seconds user
    1.062075000 seconds sys

# Thomas Wuerthinger, 1 core
# Hetzner, AMD, 21.0.4-tem
# Mean Measurement Runtime: 11,281 ms 7.4% (to baseline), 41.7%

      11,269.99 msec task-clock              #   0.999 CPUs utilized
          1,748      context-switches        # 155.102 /sec
              1      cpu-migrations          #   0.089 /sec
        223,183      page-faults             #  19.803 K/sec
 39,555,927,795      cycles                  #   3.510 GHz
  2,945,609,903      stalled-cycles-frontend #   7.45% frontend cycles idle
164,487,950,474      instructions            #    4.16 insn per cycle
                                             #    0.02 stalled cycles per insn
 15,668,783,505      branches                #   1.390 G/sec
    149,371,238      branch-misses           #   0.95% of all branches

   11.281867347 seconds time elapsed

    0.076590000 seconds user
    0.130600000 seconds sys

Warning, Warning

Your are about to enter an unstable area!

Different Machines

TestWhatMachine%
BRC01_BaselineST Thinkpad T14s218.8 s100.0 %
BRC01_BaselineST GCP-C2-4c-32g278.4 s100.0 %
BRC01_BaselineST GCP-C2D-4c-32g172.4 s100.0 %
BRC01_BaselineST Hetzner AMD 8c-32g 146.7 s 100.0 %
BRC01_BaselineST DO Intel Premium 16c-32g 154.2 s 100.0 %
BRC26_MoreMapSpaceST Thinkpad T14s35.3 s 16.1 %
BRC26_MoreMapSpaceST GCP-C2-4c-32g43.4 s 15.6 %
BRC26_MoreMapSpaceST GCP-C2D-4c-32g34.8 s 20.2 %
BRC26_MoreMapSpaceST Hetzner AMD 8c-32g 33.5 s 22.8 %
BRC26_MoreMapSpaceS DO Intel Premium 16c-32g 31.1 s 19.8 %

Different JDKs

TestWhatTime DO%
BRC01_BaselineST 21.0.3-tem215.5 s100.0 %
BRC26_MoreMapSpaceST 21.0.3-tem38.4 s 17.8 %
BRC01_BaselineST 21.0.2-graal212.7 s100.0 %
BRC26_MoreMapSpaceST 21.0.2-graal39.8 s 17.6 %
BRC26_MoreMapSpaceST graal-native-o341.4 s  
BRC26_MoreMapSpaceST graal-native-pgo44.7 s  
TestWhatTime HE
BRC26_MoreMapSpaceST 21.0.1-tem 32.7 s
BRC26_MoreMapSpaceST 21.0.4-tem 32.8 s
BRC26_MoreMapSpaceST 21.0.4-graal 29.7 s
BRC26_MoreMapSpaceST 17.0.12-tem 32.6 s
BRC26_MoreMapSpaceST 23.ea.29-open 33.2 s

Disclaimer and Warning

The cloud sucks when benchmarking

  • Shared hardware is bad for small numbers
  • Burst capacity might be a problem
  • Other users running stuff on that host
  • Some tools fail aka perf

# BRC01_BaselineST - DO CPU 16c
Warmup Runtime:      219,665 ms
Warmup Runtime:      220,341 ms
Warmup Runtime:      219,312 ms
Measurement Runtime: 219,020 ms
Measurement Runtime: 223,180 ms
Measurement Runtime: 226,418 ms

# BRC27_SmallPutST - DO CPU 16c
Warmup Runtime:      38,792 ms
Warmup Runtime:      37,929 ms
Warmup Runtime:      39,136 ms
Measurement Runtime: 39,197 ms
Measurement Runtime: 39,103 ms
Measurement Runtime: 39,106 ms

Disclaimer and Warning

The cloud sucks when benchmarking


# Google Cloud C2 - 4c - 32 GB
# BRC01_BaselineST - DO CPU 16c
Measurement Runtime: 281,448 ms
Measurement Runtime: 278,622 ms
Measurement Runtime: 281,130 ms
Measurement Runtime: 280,942 ms
Measurement Runtime: 281,287 ms

# BRC26_MoreMapSpaceST - DO CPU 16c
Measurement Runtime:  43,390 ms
Measurement Runtime:  44,003 ms
Measurement Runtime:  43,634 ms
Measurement Runtime:  43,682 ms
Measurement Runtime:  44,305 ms

# Google Cloud C2D - 4c - 32 GB
# BRC01_BaselineST - DO CPU 16c
Measurement Runtime: 172,498 ms
Measurement Runtime: 197,795 ms
Measurement Runtime: 195,888 ms
Measurement Runtime: 195,546 ms
Measurement Runtime: 195,896 ms

# BRC26_MoreMapSpaceST - DO CPU 16c
Measurement Runtime:  38,891 ms
Measurement Runtime:  34,864 ms
Measurement Runtime:  35,682 ms
Measurement Runtime:  35,691 ms
Measurement Runtime:  35,712 ms

The first 80% are not too hard.

Most of the concepts seen here are applicable to other problems.

Change one thing at a time and measure.

Most expensive first, unless something is simpler to fix.