What you need to know to build fast and reliable software
@ReneSchwietzke
www.xceptance.com
Always attribute other people's work
More information is taken from JDK sources, tweets, and related talks and videos
Java is quite efficient despite what many tell you
Source: https://sites.google.com/view/energy-efficiency-languages/home
A few words first
Java memory management is an ongoing challenge and a skill that must be mastered to have properly tuned applications that function in a scalable manner. Fundamentally, it is the process of allocating new objects and properly removing unused objects.
betsol.com
https://betsol.com/2017/06/java-memory-management-for-java-virtual-machine-jvm/
Just a quick view on the agenda
We won't cover native memory access.
What is this Memory Thing?
The very much unknown of modern languages
We ignore persistent memory for the moment.
Oversimplified Memory Areas of Modern Hardware
Simplified!
Oversimplified OS View
Simplified!
A rough view on the Java process memory areas (simplified)
https://www.slideshare.net/KubaKubryski/jvm-dive-for-mere-mortals
The basic memory sections of the JVM
new
are done here, except when Escape Analysis kicks in, layout depends on GC algorithm-XX:+PrintCodeCache
)java -XX:+PrintFlagsFinal -version
When, Where, and Cost
Where does the JVM allocate?
final static FastRandom r = new FastRandom(7L);
int a; int b;
@Setup
public void setup() {
a = r.nextInt(1000) + 128;
b = r.nextInt(1000) + 128;
}
@Benchmark
public int newInteger() {
Integer A = new Integer(a);
Integer B = new Integer(b);
return A + B;
}
@Benchmark
public int integerValueOf() {
Integer A = Integer.valueOf(a);
Integer B = Integer.valueOf(b);
return A + B;
}
@Benchmark
public int ints() {
int A = a;
int B = b;
return A + B;
}
final static FastRandom r = new FastRandom(7L);
int a; int b;
@Setup
public void setup() {
a = r.nextInt(1000) + 128;
b = r.nextInt(1000) + 128;
}
@Benchmark
public int newInteger() {
Integer A = new Integer(a);
Integer B = new Integer(b);
return A + B;
}
@Benchmark
public int integerValueOf() {
Integer A = Integer.valueOf(a);
Integer B = Integer.valueOf(b);
return A + B;
}
@Benchmark
public int ints() {
int A = a;
int B = b;
return A + B;
}
Benchmark Mode Cnt Score Error Units
newInteger avgt 3 24.187 ± 29.639 ns/op
newInteger:·gc.alloc.rate.norm avgt 3 32.000 ± 0.001 B/op
integerValueOf avgt 3 5.778 ± 10.026 ns/op
integerValueOf:·gc.alloc.rate.norm avgt 3 ≈ 10⁻⁶ B/op
ints avgt 3 5.145 ± 2.536 ns/op
ints:·gc.alloc.rate.norm avgt 3 ≈ 10⁻⁶ B/op
ints()
0x...fbcc: mov 0x10(%rsi),%eax
0x...fbcf: add 0xc(%rsi),%eax ; - ints@12 (line 55)
integerValueOf()
0x...264c: mov 0x10(%rsi),%eax ; - integerValueOf@-1 (line 44)
0x...264f: add 0xc(%rsi),%eax ; - integerValueOf@24 (line 47)
JMH: -prof perfasm
Where most of the garbage ends
https://umumble.com/blogs/java/how-does-jvm-allocate-objects%3F/
final static FastRandom r = new FastRandom(7L);
int a; int b;
@Setup
public void setup() {
a = r.nextInt(1000) + 128;
b = r.nextInt(1000) + 128;
}
@Benchmark
public int newInteger() {
Integer A = new Integer(a);
Integer B = new Integer(b);
return A + B;
}
https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/
0.50% 0x...3f0: mov $0x13a68,%r11d ; {metadata('java/lang/Integer')}
0.73% 0x...3f6: movabs $0x800000000,%rbp
0.03% 0x...400: add %r11,%rbp
0.26% 0x...403: mov 0x118(%r15),%rax ; TLAB "current"
0.30% 0x...40a: mov %rax,%r10 ; tmp = current
0.86% 0x...40d: add $0x10,%r10 ; tmp += 16 (object size)
0.30% 0x...411: cmp 0x128(%r15),%r10 ; tmp > tlab_size?
╭ 0x...418: jae 0x...4cc ; TLAB full, jump and request another one
0.50% │ 0x...41e: mov %r10,0x118(%r15) ; current = tmp (TLAB is fine, alloc!)
0.26% │ 0x...425: prefetchw 0xc0(%r10)
3.67% │ 0x...42d: mov 0xb8(%rbp),%r10
0.40% │ 0x...434: mov %r10,(%rax) ; store header to (obj+0)
4.14% │ 0x...437: movl $0x13a68,0x8(%rax) ; store klass {metadata(java/lang/Integer;)}
1.03% │ 0x...43e: movl $0x0,0xc(%rax) ; zero out the rest of the object
1.16% │ ↗ 0x...445: mov %rax,0x8(%rsp) ;*new
│ │ ; - org.sample.Allocation::newInteger@0 (line 36)
0.03% │ │ 0x...44a: mov (%rsp),%r10
0.43% │ │ 0x...44e: mov 0xc(%r10),%edx ;*getfield a
│ │ ; - org.sample.Allocation::newInteger@5 (line 36)
0.60% │ │ 0x...452: mov %rax,%rsi
0.66% │ │ 0x...455: xchg %ax,%ax
│ │ 0x...457: callq 0x00007fcf90401dc0 ; ImmutableOopMap{[0]=Oop [8]=Oop }
│ │ ;*invokespecial <init>
│ │ ; - org.sample.Allocation::newInteger@8 (line 36)
│ │ ; {optimized virtual_call}
1.19% │ │ 0x...45c: mov 0x118(%r15),%rax
1.59% │ │ 0x...463: mov %rax,%r10
│ │ 0x...466: add $0x10,%r10
0.93% │ │ 0x...46a: cmp 0x128(%r15),%r10
│╭│ 0x...471: jae 0x...4e1
0.66% │││ 0x...473: mov %r10,0x118(%r15)
0.46% │││ 0x...47a: prefetchw 0xc0(%r10)
4.70% │││ 0x...482: mov 0xb8(%rbp),%r10
0.03% │││ 0x...489: mov %r10,(%rax)
2.09% │││ 0x...48c: movl $0x13a68,0x8(%rax) ; {metadata('java/lang/Integer')}
1.06% │││ 0x...493: movl $0x0,0xc(%rax)
0.96% │││ 0x...49a: mov %rax,%rbp ;*new
│││ ; - org.sample.Allocation::newInteger@12 (line 37)
0.03% │││ 0x...49d: mov (%rsp),%r10
0.73% │││ 0x...4a1: mov 0x10(%r10),%edx ;*getfield b
│││ ; - org.sample.Allocation::newInteger@17 (line 37)
0.63% │││ 0x...4a5: mov %rbp,%rsi
│││ 0x...4a8: data16 xchg %ax,%ax
0.33% │││ 0x...4ab: callq 0x00007fcf90401dc0 ; ImmutableOopMap{rbp=Oop [8]=Oop }
│││ ;*invokespecial <init>
│││ ; - org.sample.Allocation::newInteger@20 (line 37)
│││ ; {optimized virtual_call}
1.16% │││ 0x...4b0: mov 0xc(%rbp),%eax
0.96% │││ 0x...4b3: mov 0x8(%rsp),%r10
0.50% │││ 0x...4b8: add 0xc(%r10),%eax ;*iadd
│││ ; - org.sample.Allocation::newInteger@32 (line 39
Stack vs. Heap is not longer really true
@Benchmark
public long array64()
{
int[] a = new int[64];
a[0] = r.nextInt();
a[1] = r.nextInt();
return a[0] + a[1];
}
@Benchmark
public long array65()
{
int[] a = new int[65];
a[0] = r.nextInt();
a[1] = r.nextInt();
return a[0] + a[1];
}
@Benchmark
public int[] array64Returned()
{
int[] a = new int[64];
a[0] = r.nextInt();
a[1] = r.nextInt();
a[3] = a[0] + a[1];
return a;
}
# -prof gc
Benchmark Mode Cnt Score Units
EscapeAnalysis.array64 avgt 2 30.378 ns/op
EscapeAnalysis.array64:·gc.alloc.rate.norm avgt 2 ≈ 10⁻⁵ B/op
EscapeAnalysis.array64Returned avgt 2 63.799 ns/op
EscapeAnalysis.array64Returned:·gc.alloc.rate.norm avgt 2 272.000 B/op
EscapeAnalysis.array65 avgt 2 65.819 ns/op
EscapeAnalysis.array65:·gc.alloc.rate.norm avgt 2 280.000 B/op
# -XX:EliminateAllocationArraySizeLimit=70
Benchmark Mode Cnt Score Units
EscapeAnalysis.array64Returned avgt 2 65.898 ns/op
EscapeAnalysis.array64Returned:·gc.alloc.rate.norm avgt 2 272.000 B/op
EscapeAnalysis.array65 avgt 2 30.855 ns/op
EscapeAnalysis.array65:·gc.alloc.rate.norm avgt 2 ≈ 10⁻⁶ B/op
org/sample/EscapeAnalysis.java
What the spec guarantees
No out-of-thin-air reads: Each read of a variable must see a value written by a write to that variable.
Java Language Spec 17.4.8-1, https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
We have to pay for well defined initial states
static Unsafe U;
int[] a;
@Param({ "1", "100", "1000", "10000", "1000000"})
int size;
long bytes;
long address;
@Setup(Level.Invocation)
public void setup() {
bytes = 4 * size + 4 + 12;
address = 0;
}
@Benchmark
public long unsafe() {
address = U.allocateMemory(bytes);
return address;
}
@Benchmark
public long unsafeInitialized() {
address = U.allocateMemory(bytes);
U.setMemory(address, bytes, (byte) 0);
return address;
}
@Benchmark
public long safe() {
a = new int[size];
address = 0;
return address;
}
Benchmark (size) Mode Cnt Score Units
safe 1 avgt 5 38.608 ns/op
safe 100 avgt 5 71.374 ns/op
safe 1,000 avgt 5 552.221 ns/op
safe 10,000 avgt 5 4,571.595 ns/op
safe 1,000,000 avgt 5 853,938.677 ns/op
unsafe 1 avgt 5 98.436 ns/op
unsafe 100 avgt 5 91.668 ns/op
unsafe 1,000 avgt 5 148.716 ns/op
unsafe 10,000 avgt 5 127.362 ns/op
unsafe 1,000,000 avgt 5 156.854 ns/op
unsafeInitialized 1 avgt 5 158.442 ns/op
unsafeInitialized 100 avgt 5 173.470 ns/op
unsafeInitialized 1,000 avgt 5 437.876 ns/op
unsafeInitialized 10,000 avgt 5 2,570.801 ns/op
unsafeInitialized 1,000,000 avgt 5 387,303.491 ns/op
Benchmark (size) Mode Cnt Score Units
safe:·gc.alloc.rate 1 avgt 5 144.246 MB/sec
safe:·gc.alloc.rate 100 avgt 5 1,550.342 MB/sec
safe:·gc.alloc.rate 1,000 avgt 5 3,854.734 MB/sec
safe:·gc.alloc.rate 10,000 avgt 5 5,459.355 MB/sec
safe:·gc.alloc.rate 1,000,000 avgt 5 2,995.923 MB/sec
https://shipilev.net/jvm/anatomy-quarks/7-initialization-costs/
Java is hardware neutral!?
final int SIZE = 100_000;
final int[] src = new int[SIZE];
@Benchmark
public int step1() {
int sum = 0;
for (int i = 0; i < SIZE; i++)
{
sum += src[i];
}
return sum;
}
@Benchmark
public int step16() {
int sum = 0;
for (int i = 0; i < SIZE; i = i + 16)
{
sum += src[i];
}
return sum;
}
@Benchmark
public int step32() {
int sum = 0;
for (int i = 0; i < SIZE; i = i + 32)
{
sum += src[i];
}
return sum;
}
step16
must be faster, about 16 times
Benchmark Mode Cnt Score Units Expected Actual
step1 avgt 2 41,528 ns/op 100%
step1Reverse avgt 2 41,657 ns/op 100.0% 100%
step1 avgt 2 41,528 ns/op 100%
step16 avgt 2 12,636 ns/op 6.3% 30.4%
step16 avgt 2 12,636 ns/op 6.3% 30.4%
step32 avgt 2 6,993 ns/op 50.0% 55.3%
org.sample.ArraysAndHardware.java
Benchmark Mode Cnt Score Units
step1 avgt 2 41,528 ns/op
step1:CPI avgt 0.866 #/op
step1:IPC avgt 1.154 #/op
step1:L1-dcache-load-misses avgt 6,062 #/op
step1:L1-dcache-loads avgt 96,620 #/op
step1:LLC-load-misses avgt 2 #/op
step1:LLC-loads avgt 32 #/op
step1:cycles avgt 100,055 #/op
step1:instructions avgt 115,507 #/op
step16 avgt 2 12,636 ns/op
step16:CPI avgt 1.244 #/op
step16:IPC avgt 0.804 #/op
step16:L1-dcache-load-misses avgt 6,132 #/op
step16:L1-dcache-loads avgt 6,181 #/op
step16:LLC-load-misses avgt 0 #/op
step16:LLC-loads avgt 4,111 #/op
step16:cycles avgt 30,922 #/op
step16:instructions avgt 24,855 #/op
step32 avgt 2 6,993 ns/op
step32:CPI avgt 1.375 #/op
step32:IPC avgt 0.727 #/op
step32:L1-dcache-load-misses avgt 3,047 #/op
step32:L1-dcache-loads avgt 3,122 #/op
step32:LLC-load-misses avgt 0 #/op
step32:LLC-loads avgt 2,808 #/op
step32:cycles avgt 17,244 #/op
step32:instructions avgt 12,542 #/op
step1Reverse avgt 2 41,657 ns/op
step1Reverse:CPI avgt 0.760 #/op
step1Reverse:IPC avgt 1.316 #/op
step1Reverse:L1-dcache-load-misses avgt 6,153 #/op
step1Reverse:L1-dcache-loads avgt 97,628 #/op
step1Reverse:LLC-load-misses avgt 1 #/op
step1Reverse:LLC-loads avgt 42 #/op
step1Reverse:cycles avgt 102,766 #/op
step1Reverse:instructions avgt 135,300 #/op
And now, we are very close to the hardware
JMH: -prof perfnorm
, kernel tooling required
Overview metrics provided by the OS
# step1
------------------------------------------------------------------------------
3801.896800 task-clock (msec) # 0,257 CPUs utilized
9,437,992,729 cycles # 2,482 GHz
10,850,555,712 instructions # 1,15 insn per cycle
582,384,283 branches # 153,183 M/sec
1,372,540 branch-misses # 0,24% of all branches
9,094,752,245 L1-dcache-loads # 2392,162 M/sec
569,785,405 L1-dcache-load-misses # 6,26% of all L1-dcache hits
2,360,265 LLC-loads # 0,621 M/sec
302,910 LLC-load-misses # 12,83% of all LL-cache hits
# step16
------------------------------------------------------------------------------
3841.234628 task-clock (msec) # 0,259 CPUs utilized
9,534.459,587 cycles # 2,482 GHz
7,401.297,634 instructions # 0,78 insn per cycle
1,847.661,610 branches # 481,007 M/sec
2.556,479 branch-misses # 0,14% of all branches
1,871,960,523 L1-dcache-loads # 487,333 M/sec
1,848,316,476 L1-dcache-load-misses # 98,74% of all L1-dcache hits
1,240,647,139 LLC-loads # 322,981 M/sec
167,504 LLC-load-misses # 0,01% of all LL-cache hits
JMH: -prof perf
, kernel tooling required
What it costs to leave the CPU
Real | Humanized | |
---|---|---|
CPU Cycle | 0.4 ns | 1 s |
L1 Cache | 0.9 ns | 2 s |
L2 Cache | 2.8 ns | 7 s |
L3 Cache | 28 ns | 1 min |
Memory Access | 100 ns | 4 min |
NVM SSD | 25 μs | 17 h |
SSD | 50–150 μs | 1.5-4 days |
HDD | 1–10 ms | 1-9 months |
TCP SF to NY | 65 ms | 5 years |
TCP SF to Hong Kong | 141 ms | 11 years |
Data: She brought me closer to humanity than I ever thought possible, and for a time...I was tempted by her offer.
Picard: How long a time?
Data: 0.68 seconds, sir. For an android, that is nearly an eternity.
http://www.prowesscorp.com/computer-latency-at-a-human-scale/
The Hidden Secrets of Memory Consumption
Just an Object
, JVM Linux <32GB
=== Instance Layout ====================================
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 00 10 00 00 (00000000 00010000 00000000 00000000) (4096)
12 4 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
java.lang.Object@4efac082d footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 java.lang.Object
1 16 (total)
JOL - http://openjdk.java.net/projects/code-tools/jol/
Just an Object
, JVM Linux >32GB or -XX:-UseCompressedOops
=== Instance Layout ====================================
java.lang.Object object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 50 8f 87 80 (01010000 10001111 10000111 10000000) (-2138599600)
12 4 (object header) da 7f 00 00 (11011010 01111111 00000000 00000000) (32730)
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
==== Instance Graphed Layout=========================
java.lang.Object@59505b48d footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 java.lang.Object
1 16 (total)
Compressed oops (ordinary object pointer) represent managed pointers (in many but not all places in the JVM) as 32-bit values which must be scaled by a factor of 8 and added to a 64-bit base address to find the object they refer to. This allows applications to address up to four billion objects (not bytes), or a heap size of up to about 32Gb. At the same time, data structure compactness is competitive with ILP32 mode.
https://wiki.openjdk.java.net/display/HotSpot/CompressedOops
The basic set of information of every object
64 bits:
--------
unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
size:64 ----------------------------------------------------->| (CMS free block)
unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object)
JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object)
narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/oops/markOop.hpp
An empty String
=== Instance Layout JDK11 - 32bit ====================================
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 09 00 00 00 (00001001 00000000 00000000 00000000) (9)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 08 18 00 00 (00001000 00011000 00000000 00000000) (6152)
12 4 byte[] String.value []
16 4 int String.hash 0
20 1 byte String.coder 0
21 3 (loss due to the next object alignment)
Instance size: 24 bytes
==== Instance Graphed Layout=========================
java.lang.String@d86a6fd footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 [B
1 24 24 java.lang.String
2 40 (total)
=== Instance Layout JDK 11 - 64bit ===================================
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 11 00 00 00 (00010001 00000000 00000000 00000000) (17)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) d8 c1 11 04 (11011000 11000001 00010001 00000100) (68272600)
12 4 (object header) 90 7f 00 00 (10010000 01111111 00000000 00000000) (32656)
16 8 byte[] String.value []
24 4 int String.hash 0
28 1 byte String.coder 0
29 3 (loss due to the next object alignment)
Instance size: 32 bytes
==== Instance Graphed Layout=========================
java.lang.String@2892d68d footprint:
COUNT AVG SUM DESCRIPTION
1 24 24 [B
1 32 32 java.lang.String
2 56 (total)
JOL - http://openjdk.java.net/projects/code-tools/jol/
An empty byte[]
=== Instance Layout Compressed ====================================
OFF SZ TYPE DESCRIPTION VALUE
0 8 (object header: mark) 0x0000000000000001 (non-biasable; age: 0)
8 4 (object header: class) 0x00000820
12 4 (array length) 0
12 4 (alignment/padding gap)
16 0 byte [B. N/A
Instance size: 16 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
[B@59717824d footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 [B
1 16 (total)
=== Instance Layout 64 bit ====================================
[B object internals:
OFF SZ TYPE DESCRIPTION VALUE
0 8 (object header: mark) 0x0000000000000001 (non-biasable; age: 0)
8 8 (object header: class) 0x00007fdbc4ca8820
16 4 (array length) 0
16 8 (alignment/padding gap)
24 0 byte [B. N/A
Instance size: 24 bytes
Space losses: 8 bytes internal + 0 bytes external = 8 bytes total
==== Instance Graphed Layout=========================
[B@59717824d footprint:
COUNT AVG SUM DESCRIPTION
1 24 24 [B
1 24 (total)
String - Foobar
- A JDK 11 Motivation too!
=== Instance Layout JDK8 - 32bit====================================
java.lang.String object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) da 02 00 f8 (11011010 00000010 00000000 11111000) (-134216998)
12 4 char[] String.value [F, o, o, b, a, r]
16 4 int String.hash 0
20 4 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
java.lang.String@3f8f9dd6d footprint:
COUNT AVG SUM DESCRIPTION
1 32 32 [C
1 24 24 java.lang.String
2 56 (total)
=== Instance Layout JDK11 - 32bit====================================
java.lang.String object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 08 18 00 00 (00001000 00011000 00000000 00000000) (6152)
12 4 byte[] String.value [70, 111, 111, 98, 97, 114]
16 4 int String.hash 0
20 1 byte String.coder 0
21 3 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
==== Instance Graphed Layout=========================
java.lang.String@3e27aa33d footprint:
COUNT AVG SUM DESCRIPTION
1 24 24 [B
1 24 24 java.lang.String
2 48 (total)
char[6] - Foobar
=== Instance Layout ====================================
[C object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 38 02 00 00 (00111000 00000010 00000000 00000000) (568)
12 4 (object header) 06 00 00 00 (00000110 00000000 00000000 00000000) (6)
16 12 char [C.<elements> 70, 0, 111, 0, 111, 0, 98, 0, 97, 0, 114, 0
28 4 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
[C@13fd2ccdd footprint:
COUNT AVG SUM DESCRIPTION
1 32 32 [C
1 32 (total)
JDK 11 will use a byte array instead to save when no UTF-8 is needed.
String - Foobar
vs. Foob😁r
=== Instance Layout JDK11 - 32bit ====================================
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 08 18 00 00 (00001000 00011000 00000000 00000000) (6152)
12 4 byte[] String.value [70, 111, 111, 98, 97, 114]
16 4 int String.hash 0
20 1 byte String.coder 0
21 3 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
==== Instance Graphed Layout=========================
java.lang.String@3e27aa33d footprint:
COUNT AVG SUM DESCRIPTION
1 24 24 [B
1 24 24 java.lang.String
2 48 (total)
=== Instance Layout JDK11 - 32bit ====================================
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 08 18 00 00 (00001000 00011000 00000000 00000000) (6152)
12 4 byte[] String.value [70, 0, 111, 0, 111, 0, 98, 0, 61, -40, 1, -34, 114, 0]
16 4 int String.hash 0
20 1 byte String.coder 1
21 3 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
==== Instance Graphed Layout=========================
java.lang.String@3e27aa33d footprint:
COUNT AVG SUM DESCRIPTION
1 32 32 [B
1 24 24 java.lang.String
2 56 (total)
A class with a boolean and its loss
class A
{
boolean bar = true;
}
=== Instance Layout ============ 32 bit =======================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 4 (object header)
12 1 boolean A.bar
13 3 (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
=== Instance Layout ============ 64 bit =======================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 4 (object header)
12 4 (object header)
16 1 boolean A.bar
17 7 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
You gotta give
class A
{
String s;
boolean b1;
}
=== Instance Layout == 32 bit =================================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 4 (object header)
12 1 boolean A.b1
13 3 (alignment/padding gap)
16 4 java.lang.String A.s
20 4 (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
=== Instance Layout == 64 bit =================================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 4 (object header)
12 4 (object header)
16 1 boolean A.b
17 7 (alignment/padding gap)
24 8 java.lang.String A.s
Instance size: 32 bytes
Space losses: 7 bytes internal + 0 bytes external = 7 bytes total
JVM changes order to optimize layout
class A
{
String s1;
boolean b1;
int i1;
String s2;
boolean b2;
long i2;
}
=== Instance Layout ====================================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header
4 4 (object header)
8 4 (object header)
12 4 int A.i1
16 8 long A.i2
24 1 boolean A.b1
25 1 boolean A.b2
26 2 (alignment/padding gap)
28 4 java.lang.String A.s1
32 4 java.lang.String A.s2
36 4 (loss due to the next object alignment)
Instance size: 40 bytes
Space losses: 2 bytes internal + 4 bytes external = 6 bytes total
class A
{
String s1;
boolean b1;
// int i1;
String s2;
boolean b2;
long i2;
}
=== Instance Layout ====================================
org.jol.A object internals:
OFFSET SIZE TYPE DESCRIPTION
0 4 (object header)
4 4 (object header)
8 4 (object header)
12 1 boolean A.b1
13 1 boolean A.b2
14 2 (alignment/padding gap)
16 8 long A.i2
24 4 java.lang.String A.s1
28 4 java.lang.String A.s2
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
To Raise Awareness
A empty HashMap
=== Instance Layout ====================================
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000)
8 4 (object header) a5 37 00 f8 (10100101 00110111 00000000 11111000)
12 4 java.util.Set AbstractMap.keySet null
16 4 java.util.Collection AbstractMap.values null
20 4 int HashMap.size 0
24 4 int HashMap.modCount 0
28 4 int HashMap.threshold 0
32 4 float HashMap.loadFactor 0.75
36 4 java.util.HashMap.Node[] HashMap.table null
40 4 java.util.Set HashMap.entrySet null
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
java.util.HashMap@1c655221d footprint:
COUNT AVG SUM DESCRIPTION
1 48 48 java.util.HashMap
1 48 (total)
And yes... null
is not free at all.
A HashMap
with foo=bar?
=== Instance Layout ====================================
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000)
8 4 (object header) a5 37 00 f8 (10100101 00110111 00000000 11111000)
12 4 java.util.Set AbstractMap.keySet null
16 4 java.util.Collection AbstractMap.values null
20 4 int HashMap.size 1
24 4 int HashMap.modCount 1
28 4 int HashMap.threshold 12
32 4 float HashMap.loadFactor 0.75
36 4 java.util.HashMap.Node[] HashMap.table [null, null, null, null, null, null, null, (object), null, null, null, null, null, null, null, null]
40 4 java.util.Set HashMap.entrySet null
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
java.util.HashMap@10b48321d footprint:
COUNT AVG SUM DESCRIPTION
2 24 48 [B
1 80 80 [Ljava.util.HashMap$Node;
2 24 48 java.lang.String
1 48 48 java.util.HashMap
1 32 32 java.util.HashMap$Node
7 256 (total)
A HashMap
with foo=bar and some usage?
=== Instance Layout ====================================
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) a5 37 00 f8 (10100101 00110111 00000000 11111000) (-134203483)
12 4 java.util.Set AbstractMap.keySet (object)
16 4 java.util.Collection AbstractMap.values (object)
20 4 int HashMap.size 1
24 4 int HashMap.modCount 1
28 4 int HashMap.threshold 12
32 4 float HashMap.loadFactor 0.75
36 4 java.util.HashMap.Node[] HashMap.table [null, null, null, null, null, null, null, (object), null, null, null, null, null, null, null, null]
40 4 java.util.Set HashMap.entrySet (object)
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
==== Instance Graphed Layout=========================
java.util.HashMap@10b48321d footprint:
COUNT AVG SUM DESCRIPTION
2 24 48 [B
1 80 80 [Ljava.util.HashMap$Node;
2 24 48 java.lang.String
1 48 48 java.util.HashMap
1 16 16 java.util.HashMap$EntrySet
1 16 16 java.util.HashMap$KeySet
1 32 32 java.util.HashMap$Node
1 16 16 java.util.HashMap$Values
10 304 (total)
Just some of the previous facts summarized
GC can be your friend to achieve higher speed
Simplified memory layout for our test
public class GCAndAccessSpeed
{
private final static int SIZE = 100_000;
private final List<String> STRINGS = new ArrayList<>();
private final List<String> ordered = new ArrayList<>();
private final List<String> nonOrdered = new ArrayList<>();
@Param({"false", "true"}) boolean gc;
@Param({"1", "10"}) int COUNT;
@Param({"false", "true"}) boolean drop;
@Setup
public void setup() throws InterruptedException
{
final FastRandom r = new FastRandom(7);
for (int i = 0; i < COUNT * SIZE; i++)
{
STRINGS.add(RandomUtils.randomString(r, 1, 20));
}
for (int i = 0; i < SIZE; i++)
{
ordered.add(STRINGS.get(i * COUNT));
}
nonOrdered.addAll(ordered);
Collections.shuffle(nonOrdered, new Random(r.nextInt()));
if (drop)
{
STRINGS.clear();
}
if (gc)
{
for (int c = 0; c < 5; c++)
{
System.gc();
TimeUnit.SECONDS.sleep(2);
}
}
}
@Benchmark
public int walk[Ordered|NonOrdered]()
{
int sum = 0;
for (int i = 0; i < [ordered|nonOrdered].size(); i++)
{
sum += [ordered|nonOrdered].get(i).length();
}
return sum;
}
}
Applied Memory Knowledge
# G1 GC
Benchmark (COUNT) (drop) (gc) Mode Cnt Score Units
walkNonOrdered 1 false false avgt 5 1,596,315.2 ns/op
walkOrdered 1 false false avgt 5 611,137.7 ns/op
walkNonOrdered 1 false true avgt 5 1,172,951.3 ns/op
walkOrdered 1 false true avgt 5 431,143.5 ns/op
walkNonOrdered 1 true false avgt 5 1,562,844.1 ns/op
walkOrdered 1 true false avgt 5 605,119.4 ns/op
walkNonOrdered 1 true true avgt 5 1,243,973.9 ns/op
walkOrdered 1 true true avgt 5 400,721.9 ns/op
walkNonOrdered 10 false false avgt 5 1,903,731.9 ns/op
walkOrdered 10 false false avgt 5 1,229,945.1 ns/op
walkNonOrdered 10 false true avgt 5 2,026,861.7 ns/op
walkOrdered 10 false true avgt 5 1,809,961.9 ns/op
walkNonOrdered 10 true false avgt 5 1,920,658.4 ns/op
walkOrdered 10 true false avgt 5 1,239,658.5 ns/op
walkNonOrdered 10 true true avgt 5 1,160,229.2 ns/op
walkOrdered 10 true true avgt 5 403,949.5 ns/op
# -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
Benchmark (COUNT) (drop) (gc) Mode Cnt Score Units
walkNonOrdered 1 false false avgt 5 1,667,611.7 ns/op
walkNonOrdered 1 false true avgt 5 1,820,968.2 ns/op
walkNonOrdered 1 true false avgt 5 1,928,822.7 ns/op
walkNonOrdered 1 true true avgt 5 1,777,251.4 ns/op
walkOrdered 1 false false avgt 5 931,728.5 ns/op
walkOrdered 1 false true avgt 5 902,433.7 ns/op
walkOrdered 1 true false avgt 5 930,294.3 ns/op
walkOrdered 1 true true avgt 5 907,886.5 ns/op
org.sample.GCAndAccessSpeed
Inspired by: Aleksey Shipilёv, Moving GC and Locality
The most distinct Java feature
What you have to know in general
GC is not a bad thing, you just have to understand it
What is the base motivation behind our current GCs
new
http://www.angelikalanger.com/Articles/EffectiveJava/49.GC.GenerationalGC/49.GC.GenerationalGC.html
What you and the GC have to know
What we have at our disposal today
[1] https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html
[2] https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html#concurrent_mark_sweep_cms_collector
[3] https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc.html#garbage_first_garbage_collection
[4] https://wiki.openjdk.java.net/display/zgc/Main
[5] https://wiki.openjdk.java.net/display/shenandoah/Main
Playing GC
What ParallelGC and CMS do
Let's use our memory
Contention free allocation
TLAB is a JVM default
Eden becomes full
Rinse and repeat
Do another round
Assume we Eden GCed for a while...
Fragmentation is the main reason for long GC pauses with Parallel and CMS.
Disadvantages of the classic GCs
Fix the Problems
Keep things running in parallel to avoid stopping
https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector
What makes the G1 the G1
https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector
The part that is very important for G1
Xms | Xmx | Size | Regions |
---|---|---|---|
1G | 1G | 1,024K | 1,024/1,024 |
1G | 2G | 1,024K | 1,024/2,048 |
1G | 10G | 1,024K | 512/5,120 |
2G | 5G | 1,024K | 2,048/5,120 |
5G | 5G | 2,048K | 2,560/2,560 |
10G | 10G | 4,048K | 2,560/2,560 |
24G | 24G | 8,192K | 3,720/3,720 |
openjdk/hotspot/src/share/vm/gc_implementation/g1/heapRegion.cpp::setup_heap_region_size
A region has a purpose
Large can be too large
-XX:+G1ReclaimDeadHumongousObjectsAtYoungGC
(default)https://www.redhat.com/en/blog/collecting-and-reading-g1-garbage-collector-logs-part-2
When Eden is full
Simplified!
When Eden is full again
Simplified!
When we run out of space
Simplified!
Reflects previous concept of Humongous during old collection
Reflects previous concept of Humongous during old collection
-XX:MaxGCPauseMillis=200
Yes, G1 has full GC, but tries to avoid it
[G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: Metadata GC Threshold]
-XX:MetaspaceSize=100M
, default is about 20M on 64bit62.578: [GC pause (young) (to-space exhausted), 0.0406 secs]
62.691: [Full GC 10G->5813M(12G), 15.7221 secs]
-XX:G1ReservePercent=20
, 10% is default[1] https://stackoverflow.com/questions/25251388/what-is-the-metadata-gc-threshold-and-how-do-i-tune-it
[2] https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector
Still G1 is not perfect
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
Just an important internal fact
-XX:+PrintSafepointStatistics
-XX:+PrintGCApplicationStoppedTime
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
18,373: ThreadDump [ 19 6 6 ] [ 3 0 3 0 0 ] 5
18,378: ThreadDump [ 19 6 6 ] [ 0 0 0 0 0 ] 5
18,462: ParallelGCFailedAllocation [ 19 8 9 ] [ 1 0 1 0 0 ] 8
18,617: ParallelGCFailedAllocation [ 19 4 6 ] [ 0 0 0 0 0 ] 4
18,768: ParallelGCFailedAllocation [ 19 9 10 ] [ 0 0 0 0 0 ] 9
18,918: ParallelGCFailedAllocation [ 19 4 5 ] [ 2 0 2 0 0 ] 4
How does the JVM and the OS interact?
Xmx
plus overhead when it startsXms
is virtual as well aka not turned into RSS right awayXms
is the first limit, JVM tries to live with thatXmx
, trying to get back to Xms
if state permits-XX:+AlwaysPreTouch
to avoid that, but you need the physical memory or OS will start paging when the JVM comes upRSS - Resident Set Size is the portion of memory occupied by a process that is held in main memory.
Some things to know when dealing with memory
new int[1024]
: 4112 bytes of memory are touched instantly by writing 0 to itRSS - Resident Set Size