Java Memory Management

What you need to know to build fast and reliable software

Disclaimer

A few words first

Java memory management is an ongoing challenge and a skill that must be mastered to have properly tuned applications that function in a scalable manner. Fundamentally, it is the process of allocating new objects and properly removing unused objects.

betsol.com

  • There is no universal truth
  • These problems are universal and apply to all languages to a certain extent
  • Java makes things comfortable but comfort must be paid for
  • This course is about understanding the cost structure

How big are things?

An empty String on a JVM Linux <32GB

=== Instance Layout ====================================
java.lang.String object internals:
 OFFSET  SIZE     TYPE DESCRIPTION                               VALUE
      0     4          (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4          (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4          (object header)                           da 02 00 f8 (11011010 00000010 00000000 11111000) (-134216998)
     12     4   char[] String.value                              []
     16     4      int String.hash                               0
     20     4          (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.lang.String@3f8f9dd6d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        16        16   [C
         1        24        24   java.lang.String
         2                  40   (total)

Object Header

The basic information for every object

  • Store information about class in the class
  • Can come as 8, 12, or 16 bytes
  • 16 bytes on 64bit without compresssed class pointers, 12 bytes otherwise
  • Under 32 GB, Java stores references in 4 bytes
  • >= 32 GB, Java stores references in 8 bytes
  • Object header: markword, classword
  • markword: metainfo about the object
  • classword: the reference to the class
//  32 bits:
//  --------
//  hash:25 ------------>| age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:23 epoch:2 age:4    biased_lock:1 lock:2 (biased object)
//  size:32 ------------------------------------------>| (CMS free block)
//  PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
//  64 bits:
//  --------
//  unused:25 hash:31 -->| unused:1   age:4    biased_lock:1 lock:2 (normal object)
//  JavaThread*:54 epoch:2 unused:1   age:4    biased_lock:1 lock:2 (biased object)
//  PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
//  size:64 ----------------------------------------------------->| (CMS free block)
//
//  unused:25 hash:31 -->| cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && normal object)
//  JavaThread*:54 epoch:2 cms_free:1 age:4    biased_lock:1 lock:2 (COOPs && biased object)
//  narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
//  unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)

How big are things?

An empty String on a JVM Linux >32GB, -XX:-UseCompressedOops

=== Instance Layout ====================================
java.lang.String object internals:
 OFFSET  SIZE     TYPE DESCRIPTION                               VALUE
      0     4          (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4          (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4          (object header)                           c0 5f c5 95 (11000000 01011111 11000101 10010101) (-1782227008)
     12     4          (object header)                           c1 7f 00 00 (11000001 01111111 00000000 00000000) (32705)
     16     8   char[] String.value                              []
     24     4      int String.hash                               0
     28     4          (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.lang.String@aec6354d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        24        24   [C
         1        32        32   java.lang.String
         2                  56   (total)

Size of String

String - Foobar

=== Instance Layout ====================================
java.lang.String object internals:
 OFFSET  SIZE     TYPE DESCRIPTION                               VALUE
      0     4          (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4          (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4          (object header)                           da 02 00 f8 (11011010 00000010 00000000 11111000) (-134216998)
     12     4   char[] String.value                              [F, o, o, b, a, r]
     16     4      int String.hash                               0
     20     4          (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.lang.String@3f8f9dd6d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        32        32   [C
         1        24        24   java.lang.String
         2                  56   (total)

Size of char[6]

Our char[6] with Foobar

=== Instance Layout ====================================
[C object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
      0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                           41 00 00 f8 (01000001 00000000 00000000 11111000) (-134217663)
     12     4        (object header)                           06 00 00 00 (00000110 00000000 00000000 00000000) (6)
     16    12   char [C.<elements>                             N/A
     28     4        (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Size of a wrapped primitive boolean

A class with a boolean and its loss

class A 
{
    boolean bar = true;
}
=== Instance Layout ============  32 bit =======================
org.jol.A object internals:
 OFFSET  SIZE      TYPE DESCRIPTION                           
      0     4           (object header)                       
      4     4           (object header)                       
      8     4           (object header)                       
     12     1   boolean A.bar                                 
     13     3           (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
=== Instance Layout ============  64 bit =======================
org.jol.A object internals:
 OFFSET  SIZE      TYPE DESCRIPTION                               
      0     4           (object header)                           
      4     4           (object header)                           
      8     4           (object header)                           
     12     4           (object header)                           
     16     1   boolean A.bar                                     
     17     7           (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total

More Alignment

You gotta give

class A 
{
    String s;
    boolean b1;
}
=== Instance Layout == 32 bit =================================
org.jol.A object internals:
 OFFSET  SIZE               TYPE DESCRIPTION               
      0     4                    (object header)           
      4     4                    (object header)           
      8     4                    (object header)           
     12     1            boolean A.b1                     
     13     3                    (alignment/padding gap)                  
     16     4   java.lang.String A.s                       
     20     4                    (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
=== Instance Layout == 64 bit =================================
org.jol.A object internals:
 OFFSET  SIZE               TYPE DESCRIPTION                              
      0     4                    (object header)                          
      4     4                    (object header)                          
      8     4                    (object header)                          
     12     4                    (object header)                          
     16     1            boolean A.b                                      
     17     7                    (alignment/padding gap)                  
     24     8   java.lang.String A.s                                      
Instance size: 32 bytes
Space losses: 7 bytes internal + 0 bytes external = 7 bytes total
  • References can only be stored with 4 byte alignment (32 bit) or 8 byte (64 bit)
  • Full objects can only be stored with 8 byte (64 bit) alignment to each other
  • Full 64 bit values can also only be stored fully aligned (double, long)

Reduce waste

JVM does not preserve order to optimize layout

class A 
{
    String s1;
    boolean b1;
    int i1;

    String s2;
    boolean b2;
    long i2;
}
=== Instance Layout ====================================
org.jol.A object internals:
OFFSET  SIZE               TYPE DESCRIPTION                              
  0     4                    (object header
  4     4                    (object header)    
  8     4                    (object header)            
 12     4                int A.i1                                      
 16     8               long A.i2                                      
 24     1            boolean A.b1                                      
 25     1            boolean A.b2                                      
 26     2                    (alignment/padding gap)                  
 28     4   java.lang.String A.s1                                      
 32     4   java.lang.String A.s2                                      
 36     4                    (loss due to the next object alignment)
Instance size: 40 bytes
Space losses: 2 bytes internal + 4 bytes external = 6 bytes total
class A 
{
    String s1;
    boolean b1;

    String s2;
    boolean b2;
    long i2;
}

=== Instance Layout ====================================
org.jol.A object internals:
 OFFSET  SIZE               TYPE DESCRIPTION                               
      0     4                    (object header)                           
      4     4                    (object header)                   
      8     4                    (object header)                       
     12     1            boolean A.b1                                      
     13     1            boolean A.b2                                      
     14     2                    (alignment/padding gap)                  
     16     8               long A.i2                                      
     24     4   java.lang.String A.s1                                      
     28     4   java.lang.String A.s2                                      
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total


Different memory size, different memory use

CompressedOops vs. Non-CompressedOops of new int[15]

32 bit

=== Instance Layout ====================================
[I object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                               
      0     4        (object header)          // markword low
      4     4        (object header)          // markword high 
      8     4        (object header)          // classword compressed
     12     4        (object header)          // array size (int)
     16    60    int [I.<elements>                             
     76     4        (loss due to the next object alignment)
Instance size: 80 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total


64 bit

=== Instance Layout ====================================
[I object internals:
 OFFSET  SIZE   TYPE DESCRIPTION              
      0     4        (object header)          // markword low
      4     4        (object header)          // markword high
      8     4        (object header)          // classword low
     12     4        (object header)          // classword high
     16     4        (object header)          // array size (int)
     20     4        (alignment/padding gap)  
     24    60    int [I.<elements>
     84     4        (loss due to the next object alignment)
Instance size: 88 bytes
Space losses: 4 bytes internal + 4 bytes external = 8 bytes total

How big is...?

A empty HashMap

=== Instance Layout ====================================
java.util.HashMap object internals:
 OFFSET  SIZE                       TYPE DESCRIPTION                               VALUE
      0     4                            (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                            (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                            (object header)                           a5 37 00 f8 (10100101 00110111 00000000 11111000) (-134203483)
     12     4              java.util.Set AbstractMap.keySet                        null
     16     4       java.util.Collection AbstractMap.values                        null
     20     4                        int HashMap.size                              0
     24     4                        int HashMap.modCount                          0
     28     4                        int HashMap.threshold                         0
     32     4                      float HashMap.loadFactor                        0.75
     36     4   java.util.HashMap.Node[] HashMap.table                             null
     40     4              java.util.Set HashMap.entrySet                          null
     44     4                            (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.util.HashMap@1c655221d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        48        48   java.util.HashMap
         1                  48   (total)

How big is...?

A HashMap with foo=bar?

=== Instance Layout ====================================
java.util.HashMap object internals:
 OFFSET  SIZE                       TYPE DESCRIPTION                               VALUE
      0     4                            (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                            (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                            (object header)                           a5 37 00 f8 (10100101 00110111 00000000 11111000) (-134203483)
     12     4              java.util.Set AbstractMap.keySet                        null
     16     4       java.util.Collection AbstractMap.values                        null
     20     4                        int HashMap.size                              1
     24     4                        int HashMap.modCount                          1
     28     4                        int HashMap.threshold                         12
     32     4                      float HashMap.loadFactor                        0.75
     36     4   java.util.HashMap.Node[] HashMap.table                             [null, null, null, null, null, null, null, (object), null, null, null, null, null, null, null, null]
     40     4              java.util.Set HashMap.entrySet                          null
     44     4                            (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.util.HashMap@1c655221d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         2        24        48   [C
         1        80        80   [Ljava.util.HashMap$Node;
         2        24        48   java.lang.String
         1        48        48   java.util.HashMap
         1        32        32   java.util.HashMap$Node
         7                 256   (total)

How big is...?

A HashMap with foo=bar and some usage?

=== Instance Layout ====================================
java.util.HashMap object internals:
 OFFSET  SIZE                       TYPE DESCRIPTION                               VALUE
      0     4                            (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                            (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                            (object header)                           a5 37 00 f8 (10100101 00110111 00000000 11111000) (-134203483)
     12     4              java.util.Set AbstractMap.keySet                        (object)
     16     4       java.util.Collection AbstractMap.values                        (object)
     20     4                        int HashMap.size                              1
     24     4                        int HashMap.modCount                          1
     28     4                        int HashMap.threshold                         12
     32     4                      float HashMap.loadFactor                        0.75
     36     4   java.util.HashMap.Node[] HashMap.table                             [null, null, null, null, null, null, null, (object), null, null, null, null, null, null, null, null]
     40     4              java.util.Set HashMap.entrySet                          (object)
     44     4                            (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

==== Instance Graphed Layout=========================
java.util.HashMap@1c655221d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         2        24        48   [C
         1        80        80   [Ljava.util.HashMap$Node;
         2        24        48   java.lang.String
         1        48        48   java.util.HashMap
         1        16        16   java.util.HashMap$EntrySet
         1        16        16   java.util.HashMap$KeySet
         1        32        32   java.util.HashMap$Node
         1        16        16   java.util.HashMap$Values
        10                 304   (total)

Stack, Heap, Non-Heap

The basic memory sections of the JVM

  • Stack: Local variables. references and method frames (call stack) are stored
  • Heap: All new are done here, except when Escape Analysis kicks in, layout depends on GC algorithm
  • Non-Heap: Java stores classes and other meta-data here (constant pool, field and method data...)
  • Other: JVM code, internal runtime structures
  • Code Cache. Might belong to others, compiled code is stored here, if exhausted, Java stops compiling
  • Default stack on 64 bit Linux is 1024KB
  • Heap can be sized with initial and max size, Java will determine the use of this large area mostly on its own
  • Meta space (non-heap) can be sized as well, especially when a lot of classes are loaded, by default unlimited and growth as needed
  • Initial code cache is 245,760 Kb, max is 251,658,240 Kb (PrintFlagsFinal does not match -XX:+PrintCodeCache)

Object Types

Basic data groups

  • Java has primitive and object types
  • Object types are mostly allocated on heap
  • Primitive types in objects live on the heap
  • Method local primitive types are kept on stack or in registers
  • Method local object types live mostly on heap BUT the reference is stored on stack or kept in registers
  • Primitive Types: int, double, long...
  • Object Types: Object, String, Integer, arrays(!)
  • Objects have overhead, 12 to 24 bytes, does not include the stored data
  • Everything is aligned to 8 bytes (aka 64 bit)
  • So a boolean is one byte but when stored alone, you waste 7 bytes

Escape Analysis

public class StackVsHeapInteger
{
    Integer x;
    static final int SIZE = 10000;
    
    @Benchmark
    public int stack()
    {
        int sum = 0;
        for (int i = 0; i < SIZE; i++) {
            Integer x = new Integer(i);
            sum = sum + x.intValue();
        }
        return sum;
    }
    
    @Benchmark
    public int heap()
    {
        int sum = 0;
        for (int i = 0; i < SIZE; i++) {
            x = new Integer(i);
            sum = sum + x.intValue();
        }
        return sum;
    }
    
    @Benchmark
    public Integer stackAndEscape()
    {
        Integer x = null;
        
        int sum = 0;
        for (int i = 0; i < SIZE; i++) {
            x = new Integer(i);
            sum = sum + x.intValue();
        }
        return x;
    }
}
-XX:-EscapeAnalysis
Benchmark                          Mode  Cnt      Score   Error  Units
StackVsHeapInteger.heap            avgt    2  26316.832          ns/op
StackVsHeapInteger.stack           avgt    2  26677.675          ns/op
StackVsHeapInteger.stackAndEscape  avgt    2  25172.548          ns/op

-XX:+EscapeAnalysis (default)
Benchmark                          Mode  Cnt      Score   Error  Units
StackVsHeapInteger.heap            avgt    2  25585.433          ns/op
StackVsHeapInteger.stack           avgt    2   3420.975          ns/op
StackVsHeapInteger.stackAndEscape  avgt    2  24158.490          ns/op
Benchmark                                                        Mode  Cnt       Score   Error   Units
StackVsHeapInteger.heap:·gc.alloc.rate                           avgt    2    5300.472          MB/sec
StackVsHeapInteger.heap:·gc.alloc.rate.norm                      avgt    2  160000.002            B/op
StackVsHeapInteger.heap:·gc.count                                avgt    2     108.000          counts

StackVsHeapInteger.stack:·gc.alloc.rate                          avgt    2      ≈ 10⁻⁴          MB/sec
StackVsHeapInteger.stack:·gc.alloc.rate.norm                     avgt    2      ≈ 10⁻⁴            B/op
StackVsHeapInteger.stack:·gc.count                               avgt    2         ≈ 0          counts

StackVsHeapInteger.stackAndEscape:·gc.alloc.rate                 avgt    2    5426.663          MB/sec
StackVsHeapInteger.stackAndEscape:·gc.alloc.rate.norm            avgt    2  160000.002            B/op
StackVsHeapInteger.stackAndEscape:·gc.count                      avgt    2     110.000          counts
  • JVM determines if something can escape
  • If it does not escape, it will be placed on the stack
  • JVM can re-capture return elements after inlining

Garbage Collections

The most distinct Java feature

GC in General

What you have to know in general

  • Garbage Collection (GC) is nothing specific to Java
  • Java was just making the topic popular
  • Less trouble than manual managing the memory aka leaks
  • Less chances of memory corruption
  • Automatic freeing of not longer needed memory
  • The luxury is paid for in parts by CPU and time
  • Go has garbage collection
  • JavaScript has garbage collection
  • Python has garbage collection
  • C# has garbage collection
  • Rust has smart pointers and compiler managed memory
  • Swift does ARC (Automatic Reference Counting)

A Theory First

What is the base motivation behind our current GCs

  • Objects have different lifetime
  • Imagine a cache or a method local new
  • Objects have different size aka an array vs. an Integer
  • Using one memory section will lead to heavy fragmentation aka contiguous memory section are getting smaller and later allocations will fail
  • Empirical studies showed that most objects don't live that long
  • Storing long and short-lived objects next to each other is a bad idea
  • Generational theory and garbage collection was developed

References

What you and the GC have to know

  • A reference is the information where object data is located
  • In C terms, a pointer to memory
  • In Java terms... have not found that much about keeping this data
  • Because memory location can change and not all references can be updated during GC, there must be a translation table
  • Strongly reachable: An object is directly reachable from GC root without any reference types, no GC
  • Softly reachable: An object is reachable by traversing a soft reference, GC possible when memory pressure or timer is up
  • Weakly reachable: An object is reachable by traversing a weak reference, this object will be GCed when only weakly reachable (good for canonicalizing mappings)
  • Phantom reachable: Not reachable otherwise, was finalized and a phantom reference refers to it
  • Unreachable: When none of the above applies, qualifies for GC

Our Garbage Collectors today

What we have at our disposal today

  • ParallelGC: Generational Collector, runs parallel for young and old[1]
  • CMS: Generational Collector, parallel old, auto-combined with ParNew for parallel new [2]
  • G1: Generational Collector, parallel, evacuating and compacting, made for large heaps [3]
  • Ergonomics: All GC will automatically adjust the heap to adjust to the workload and find the best balance

Let's be a Collector

Playing GC

Generational GC Lifecycle - Step 0

What ParallelGC and CMS do

  • Heap is split into four resizable areas
  • Eden: All allocations happen here, also known as "New"
  • Survivor S0: Space for objects to live a little longer
  • Survivor S1: Similar to S0, but only one is in active use at the time
  • Tenured: Where surviving objects continue to live and where large allocations are made, also known as "Old"

Generational GC Lifecycle - Step 1

Let's use our memory

  • Let's allocate some objects
  • To simplify things, we ignore the size for now
  • Allocation is TLAB (Thread Local Allocation Buffer) based
  • Important: If the object is too big, it will be allocated in Tenured directly

TLAB

Contention free allocation

  • TLAB - Thread Local Allocation Buffer
  • To avoid Eden contention, we split up Eden
  • Area per thread exclusively assigned
  • No synchronization needed
  • Thread just bumps the pointer
  • TLABs are resized dynamically
  • Allocation is almost O(1) now

Generational GC Lifecycle - Step 2

Eden becomes full

  • When the next allocation fails, Eden will be gc-ed
  • Either all memory is allocated...
  • ... or the next allocation request is bigger than free space
  • Depending on algorithm, we either already marked objects or we stop and mark
  • Unreferenced objects are discarded
  • Referenced objects are hibernated in S0
  • If the object does not fit, it goes to Tenured directly
  • Eden has 0' now, the object that caused the GC

Generational GC Lifecycle - Step 3

Rinse and repeat

  • Let's fill up Eden again
  • When we run into GC, besides what we did before, we now also GC S0 and what survives there goes to S1
  • S0/S1 object swapping happens until the survivor counter is up
  • Object goes into Tenured afterwards
  • When an object is in Tenured, the generation does not matter anymore

Generational GC Lifecycle - Step 4

Do another round

  • Let's fill up Eden again and GC
  • Assume that we have only two generations in S0/S1 so we move some to Tenured

Generational GC Lifecycle - Step 5

Assume we Eden GCed for a while...

  • Now Tenured becomes full
  • Either we reach a set limit or we cannot accommodate any new direct allocations or S0/S1 survivors
  • CMS and ParallelGC start to mark live objects in parallel to running threads
  • If your Tenured cannot accept survivors anymore, you have a Stop-the-World (STW) aka you run behind
  • If you allocate large objects directly, your Tenured might be too fragmented, hence you STW to get compaction (CMS does not compact, ParallelGC does, it is STW all the time)

Issues and Challenges

Disadvantages of the classic GCs

  • Either Eden or Tenured are cleaned
  • All of that space is cleaned at once
  • Limited capabilities to deal with large heaps aka 4GB or more
  • Survivor spaces are limited in size (ratio to Eden)
  • CMS trades pause time for fragmentation
  • ParallelGC compacts but has issues with larger heaps
  • Objects that don't fit Eden or TLAB section of Eden go directly to Tenured
  • Many surviving objects might skip Survivor space because the Survivor size is too small
  • Tenured has to finish quickly to avoid being the bottleneck when Eden has to copy through or allocate directly

G1 to the rescue

Fix the Problems

Addressing Known Issues

Keep things running in parallel to avoid stopping

  • Garbage First (G1) was created to achieve these goals
    • Easier tuning (CMS has 60 switches!)
    • No premature promotion aka no S0/S1 notion
    • Supporting big heaps better aka scale better
    • Avoid fall back to STW collections (which happens with CMS when it cannot keep up or fragmentation is too big)

G1 - The Differences

What makes the G1 the G1

  • Divide the heap in many regions of smaller size
  • Every region serves a certain purpose (Eden, Survivor, Tenured, Humongous)
  • Assignments are changed on the fly
  • Survivor spaces aren't limited anymore
  • Huge objects will be directly allocated in Humongous
  • Object is humongous when larger than half a region
  • Always fully evacuate a region (the compaction part) during GC
  • Clean Tenured with a lot of garbage first (hence the name)

Regions

The part that is very important for G1

  • 2k regions preferred, more possible, less might impact performance
  • Area size 1MB, 2MB, 4MB, 8MB, 16MB, or 32MB
  • Determined on startup based on Xms and Xmx
  • Cannot be changed on the fly!!!
  • Type of the region can change when region is empty
  • Larger region size means more copy work during evacuation

 

Xms Xmx Size Regions
1G1G1,024K1,024/1,024
1G2G1,024K1,024/2,048
1G10G1,024K512/5,120
2G5G1,024K2,048/5,120
5G5G2,048K2,560/2,560
10G10G4,048K2,560/2,560
24G24G8,192K3,720/3,720

Region Types

A region has a purpose

  • Eden: Where new objects are created
  • One region is selected and TLAB is used to allocate efficiently
  • Eden region count is adjusted when needed (5% at start)
  • Survivor: Where object live after an Eden collection (Young collection)
  • Objects might survive several GC Young cycles
  • At a certain threshold, survivor become Tenured
  • Cycle count is dynamically adjusted, region count as well
  • Tenured: Where the old objects live
  • Long living objects, several seconds to minutes
  • Humongous: Where the large objects live
  • They get their own(!) regions when they are larger than 50% of region size
  • Can span several regions
  • Unused space is wasted
  • Collected during Young
  • Allocation of a humongous object triggers a mixed GC cycle

G1 - Humongous Challenges

Large can be too large

  • Humongous objects waste space
    • Humongous > 50% of a region
    • Imagine 2.2 MB object with 1 MB region size
    • Need three regions for it, wasting 0.8 MB
    • Mostly arrays (also indirectly such as Strings)
  • Humongous requires contiguous space
  • Cause of fragmentation
  • Humongous objects are kind of generation-less
  • Used to be old, but nowadays collected during young
  • Every humongous allocation trigger marking cycle
  • -XX:+G1ReclaimDeadHumongousObjectsAtYoungGC (default)
  • Instead of reclaiming humongous late we try to do it during Young collection now
  • Still waste and fragmentation are a burden (first tuning target)

G1 - Young Collection

When Eden is full

  • Eden is full (5% of heap initially)
  • Scan all Eden regions
  • Copy live objects to a survivor region
  • Copy live objects from existing survivor regions (increment counter)
  • No Eden is used anymore
  • Calculate new statistics to tune Eden count

G1 - Young Collection II

When Eden is full again

  • Survivors are older aka count is up
  • Some Survivors become Tenured

G1 - Mixed Collection I

When we run out of space

  • When we are at 45% total usage, we start marking in parallel
  • Snapshot-at-the-beginning (SATB) and marking if objects are live or dead
  • Objects created after SATB are not considered, aka always live
  • We are calculating the liveness per region continuously
  • Start a young cycle first after marking
  • After that we do a old collection
  • Collect most "dead" regions first hence the name Garbage First Collector
  • Not all Tenured will be collected, count is based on target pause time and adjusted frequently

G1 - Mixed Collection II

 

Young Collection

G1 - Mixed Collection III

 

Tenured Collection

G1 - Mixed Collection

  • Will continue mixed collection (Eden and Tenured) until below threshold
  • Going Young Collection only when "clean enough"
  • G1 adjusts waste threshold and cycle count/frequency based on pause target goal
  • -XX:MaxGCPauseMillis=200
  • Desired pause goal in milliseconds
  • Not guaranteed to be achieved

G1 -Full GC

Yes, G1 has full GC, but tries to avoid it

Metaspace

  • Class unloading is not a FullGC anymore but the first reach of the metaspace will trigger a GC
  • [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: Metadata GC Threshold]
  • Size the first occurrence properly, after first GC, JVM will manage the mark
  • -XX:MetaspaceSize=100M, default is about 20M on 64bit

to-Space exhausted

  • We need space to copy to (evacuation collector!)
  • If this space is too small, we fall back to FullGC
  • 62.578: [GC pause (young) (to-space exhausted), 0.0406 secs] 62.691: [Full GC 10G->5813M(12G), 15.7221 secs]
  • -XX:G1ReservePercent=20, 10% is default

G1 - Remaining Problems

Still G1 is not perfect

  • Humongous objects are a burden
  • Can cause fragmentation, never moved aka compacted
  • Humongous can start an Mixed GC when the threshold is reached
  • Humongous can trigger FullGC
  • Humongous objects waste memory because they own their own regions and don't share that with others
  • When we evacuate too much, we can fail and FullGC
  • Tuning would fill another training
  • Main goals: Avoid humongous objects and running into to-space exhaustion
  • Besides that: Only no garbage is good garbage.

Some more Memory Stuff

Safepoints

Just an important internal fact

  • JVM cannot halt threads, they have to cooperate
  • For GC, all have to give up to get a consistent heap state
  • Safepoint flag
  • Each thread checks it frequently
  • Can take some time, in theory also never happen
  • See overhead with -XX:+PrintSafepointStatistics -XX:+PrintGCApplicationStoppedTime
  • In between two bytecodes or automatically inserted into compiled code
    • Return from a method
    • Each loop operation
  • Automatically when blocked on monitor
  • Automatically when in native code
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
18,373: ThreadDump                       [      19          6              6    ]      [     3     0     3     0     0    ]  5   
18,378: ThreadDump                       [      19          6              6    ]      [     0     0     0     0     0    ]  5   
18,462: ParallelGCFailedAllocation       [      19          8              9    ]      [     1     0     1     0     0    ]  8   
18,617: ParallelGCFailedAllocation       [      19          4              6    ]      [     0     0     0     0     0    ]  4   
18,768: ParallelGCFailedAllocation       [      19          9             10    ]      [     0     0     0     0     0    ]  9   
18,918: ParallelGCFailedAllocation       [      19          4              5    ]      [     2     0     2     0     0    ]  4   

JVM Memory and the OS

How does the JVM and the OS interact?

  • JVM asks for all of Xmx plus overhead when it starts
  • OS assigns virtual memory to it, when available or let's the process die
  • Xms is virtual as well aka not turned into RSS right away
  • Xms is the first limit, JVM tries to live with that
  • If not possible we go to Xmx, trying to get back to Xms if state permits
  • OS memory can be fragmented due to the slow transition from virtual to RSS
  • Use -XX:+AlwaysPreTouch to avoid that, but you need the physical memory or OS will start paging when the JVM comes up
  • With a lot of rarely accessed Tenured objects a no GC, a GC on Tenured can cause the OS to page
  • Pre-G1 GC often touched all of in use memory and caused heavy paging when memory is scarce
  • Theoretically G1 touches more memory more often

Upcoming Collectors

Some new kids will be around soon

  • EpsilonGC: A GC that handles memory allocation but does not implement any actual memory reclamation mechanism. Designed for testing or short-lived operations.
  • ZGC: The Z Garbage Collector is a scalable low latency garbage collector. Main goal: Pause times do not exceed 10 ms when using heaps from some hundred megabytes to terabytes.
  • Shenandoah: Shenandoah is an ultra-low pause time garbage collector that reduces GC pause times by performing more garbage collection work concurrently with the running Java program. Project by RedHat.
  • You can try all three on Linux/x84
  • Epsilon is fully done and part of JDK 11, testable starting JDK 9

Important to Know

Some things to know when dealing with memory

  • Java gives out only clean memory
  • new int[1024]: 4112 bytes of memory are touched instantly by writing 0 to it
  • This turns virtual memory into RSS if not already RSS
  • If you only use the first 100 bytes, you wasted a lot of cycles
  • Off-Heap memory is allocated directly from the OS, but controlled by GC aka a reference to it is on the heap
  • G1 is NUMA-aware aka memory closer to a CPU is used
  • Might deliver up to 30% more performance
  • Java fully manages the memory in use, not the OS (off-heap allocations excluded)

Questions and Answers