Java Memory Model - JMM

Know the Internals, so You Don't Have to Care

Motivation I

Java is highly efficient, you want to know why and how to keep it that way

Motivation II

Write correct code, make it scale, keep it efficient

Facts

  • Multi-Cores are standard
  • Hardware has evolved and works differently than most people expect
  • Efficiency is a must nowadays
  • Java is an older language, hence it gives you low-level syntax
  • The compiler doesn't see everything (unfortunately)
  • Correctness cannot be tested for (fully)

Goals

  • Learn to pay attention to shared states
  • Learn to review concurrent code
  • Learn what is more efficient
  • See what is wrong and why
  • We got only 60 min!

State Your Sources

Always attribute other people's work

A lot of the content has been borrowed, validated, and modified from Concurrency Concepts in Java by Douglas Hawkins [1]

The Two-Slide Presentation

All You Have to Know in Two Slides or Less

Why Java works the way it works

There is only one truth

  • Did anyone read the JLS - Java Language Spec?
  • Explains what compiler and JVM builders have to obey
  • Java was the first language that tried such a model
  • Chapter 17 - Threads and Locks
  • Know what is legal and what is illegal
  • Know why you might not want to program on that level

The Java Memory Model specifies how the Java virtual machine works with the computer's memory (RAM)...

It is very important to understand the Java memory model if you want to design correctly behaving concurrent programs. ...specifies how and when threads can see values written to shared variables by other threads, and how to synchronize access to shared variables...

Jakob Jenkov

TL;DR

The one rule that tells Java to optimize the hell out of your code without breaking it

As if there is one thread, unless you say otherwise.

  • In English: I see a single threaded program, unless you tell me otherwise. I will optimize as much as possible at my own discretion, as long as the outcome stays the same.
  • Everything that fits this statement is legal.
  • Sadly, the compiler does not prevent us from writing bad code.

And Now... The Conclusion

Everything Before in a Few More Words

Memory

Memory is Everywhere

JMM vs the Computer

Our World (JVM) vs. Real World (Hardware and OS)

Where do we live?


public class Test {
    // we all always live on the heap
    private Map<String, String> aMap;
    private int counter;
    private static final Any<Integer> = new Any<>();
    
    /**
     * @param a - stack only (call-by-value)
     * @param b - you don't know, reference passing on stack 
     *            but the object itself?
     *            (call-by-reference)
     */
    public void foo(int a, Any<?> b) {
        int c = a++; // register or stack or both
        
        var d = b; // ref d on stack or in register, 
                   // but b might be anywhere
        
        // reference on stack/in register, but instance?
        // when small and it does not escape on stack
        // when larger always on heap
        int[] e = new int[a];
                            
        // reference stack, instance on heap
        List<Foo> l = new ArrayList<>();  
    }
}
  • The type does not indicate storage or sharing
  • Local variables are always safe unless you leak the reference
  • Local does not mean local storage, just local scope
  • Simple rule: If you can synchronize on it, it can and could be shared!
  • Valhalla will bring something new: Value Objects (codes like a class, works like an int) and you cannot synchronize on it, see the simple rule :-)
  • Programming hint: Immutability rulez!
  • If you cannot change it, you cannot screw it up!

Atomicity

What operations are indivisible?

Primitive vs Atomic

Is this 2 or -1 all the time?

long shared = 0;

Thread 1
=================================================
shared = 2L;
set_hi shared, 0000 0000


set_lo shared, 0000 0002


Thread 2
=================================================
shared = -1L;

set_hi shared, ffff ffff
set_lo shared, ffff ffff

This is a 32-bit problem but as a Java programmer, you don't know where it will run, hence don't rely on it! 64-bit VMs on x86 are fine.

Atomic?

int shared = 0;
=================================================
shared++;
local = shared;
local = local + 1;
shared = local;
             
getfield        #2   // Field shared:I
iconst_1        
iadd            
putfield        #2   // Field shared:I
0x00007f2af910f20c: mov 0xc(%rsi),%edi  ;*getfield shared
0x00007f2af910f20f: inc %edi
0x00007f2af910f211: mov %edi,0xc(%rsi)  ;*putfield shared
int local = 0;
=================================================
local += 2;
IINC 1 2
0x00007ff49cfde350: movabs $0x2,%r10
0x00007ff49cfde35a: add    %r10,%rax
  • Called Read - Modify - Write operation
  • One Java command != one CPU command

Is new atomic?

Is a new atomic?

class Point
{
    private int x;
    private int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
shared = local;
  • It is not atomic and we will learn later why

Visibility

What Do Other Threads See?

No garbage

What the spec guarantees

No out-of-thin-air reads: Each read of a variable must see a value written by a write to that variable.

This hurts big time (cutting edge performance optimizations), because every single bit is written to first (initialized) before made available. That's why, reuse might be still a good thing in Java... but then it is not immutable... darn...

  • Java will never present us garbage
  • Every primitive is initialized
  • Every array
  • Every reference
  • That differs from the underlying memory management by the OS, which does not write to just allocated memory, not even provide it for real in the first place.

Immutability

Is this an immutable object?

class Point {
    private int x;
    private int y;
    
    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
    
    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }
}

Nope, it isn't! This is one.

class Point {
    private final int x;
    private final int y;
    
    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
    
    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }
}
class Point {
    public final int x;
    public final int y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
}

What is this final?

Final guarantees one possible outcome

class Point
{
    private final int x;
    private final int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
  • Final means, there is only one possible value, ever!
  • Hence we should never see the initialized value (0)
  • ...only the assigned values of x and y
  • Hence immutable objects are really immutable aka complete
  • Final adds a fence (more later)

Not everything will be visible

shared = 20;
shared = 40;

print(shared);
  • Java builds a data dependency graph
  • Line 4 depends on 2, but nobody on line 1
  • Safe to drop line 1
  • Single thread consistency
// remove dead store
shared = 40;

print(shared);
// even fancier
print(40);
shared = 40;
  • This is totally legal!

Not everything will be stored

shared = 0;
for (int x : array)
{
    shared += x;
}
  • Inefficient code, because we write to memory all the time
local = 0; // likely a register
for (int x : array)
{
    local += x;
}

shared = local;
  • Single thread consistency
  • Speed is king and a write to memory is slow

More reading

// Can X be 30?
x = shared * shared;

// Compiler might do that
local1 = shared;
local2 = shared;
x = local1 * local2;

// In reality, this happens...
local = shared;
x = local * local;

0x00007f7740ab2ccc: mov  0xc(%rsi),%eax ;*getfield shared
0x00007f7740ab2ccf: imul %eax,%eax      ;*imul

thread 1
==========================

local1 = shared; // 5

local2 = shared; // 6
x = local1 * local2; // 30

thread 2
==========================
shared = 5;

shared = 6;
  • Single thread consistency
  • Removing the redundant load is also legal and gives us nice squares by accident
  • We will see this code again!!

Most Important Optimization

Avoid memory access as much as possible

// you wrote
// array is on the heap 
for (int x : arrayOfInt)
{
    sum = sum + x;
}

// javac turns this into
for (int i = 0; i < this.array.length; i++)
{
    sum = sum + this.array[i];
}
// Hotspot turns this into
int[] local = this.array;
int length = local.length;

for (int i = 0; i < length; i++)
{
    sum = sum + local[i];
}

And the reality presents...

We will get to the why in a minute


int[] local = this.array;
int length = local.length;

for (int i = 0; i < length; i++)
{
    sum = sum + local[i];
}

int[] local = this.array;
int length = local.length;

int last = 0;
// loop unrolling to the rescue plus
// some auto-vectorization (limited set of possibilities)
for (int i = 0; i < length; i = i + 4)
{
    sum = sum + local[i];
    sum = sum + local[i + 1];
    sum = sum + local[i + 2];
    sum = sum + local[i + 3];
    last = i + 3;
}

for (int i = last + 1; i < length; i++)
{
    sum = sum + local[i];
}

And even worse...

// done is a shared boolean
// you wrote
while (!done)
{
    // logic here
}

// done is a shared boolean
// the compiler produces
local = !done;
while (local)
{
    // logic here
}

And the JLS states...

To some programmers, this behavior may seem "broken". However, it should be noted that this code is improperly synchronized.

The semantics of the Java programming language allow compilers and microprocessors to perform optimizations that can interact with incorrectly synchronized code in ways that can produce behaviors that seem paradoxical.

  • This is where Rust and Go shine

Shared Access is not Concurrency

Concurrency and visibility is not about modifying things at the same time (race-condition), it is all about seeing data by more than one thread. These can be operations which are seconds or even minutes apart. The outcome might still be wrong!

  • Happen-before is the magic word here
  • Can I see things that have happened earlier with certainty?

And there is Reordering

Data and Control Dependence


sharedX = 2;
sharedY = 3;

if (sharedX > 0)
{
    print(sharedX);
}

Possible legal outcomes

This is all correct code


sharedX = 2;

if (sharedX > 0)
{
    print(sharedX);
}
sharedY = 3;

print(2);
sharedX = 2;
sharedY = 3;

sharedX = 2;

if (sharedX > 0)
{
    print(sharedX);
    sharedY = 3;
}
else
{
    sharedY = 3;
}

sharedY = 3;
sharedX = 2;

if (2 > 0)
{
    print(2);
}

Modern Machines

A Quick Detour

Detour: Cost

What Does it Cost to Leave the CPU?

  Real Humanized
CPU Cycle 0.4 ns 1 s
L1 Cache 0.9 ns 2 s
L2 Cache 2.8 ns 7 s
L3 Cache 28 ns 1 min
Memory Access 100 ns 4 min
NVM SSD 25 μs 17 h
SSD 50–150 μs1.5-4 days
HDD 1–10 ms 1-9 months
TCP SF to NY 65 ms 5 years
TCP SF to Hong Kong141 ms11 years
  • It costs 100 to 200 cycles when we have to go to main memory
  • Hence Java and the CPU avoid main memory access at all cost
  • Java does not have the notion of a location, hence this is fully transparent to you... but you can still work with it :)

The modern world

Everything happens in parallel

  • To fully utilize modern computers architectures, the compilers and the processors are free to do whatever it takes...
  • ...as long as your defined semantics stay the same
  • Javac, JIT, and the CPU reorder and optimize
  • If you don't declare things shared, there is no effort put into it
  • Current Intels/AMDs can do 3 to 6 instructions per cycle

https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake

Back to NEW

See Where it Fails and Why

Reordered New

Back to the new Example

// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
shared = local;
// a possible optimization
local = calloc(sizeof(Point));
shared = local;
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
// this will permit to load the CPU better and
// start the expensive store to memory earlier
  • Single thread consistency
  • The compilers and the CPU change your code

Double-Checked Locking

Now you might understand why double-checked locking is broken

// what you wrote 
public static Singleton getInstance()
{
    if (instance == null)
    {
        synchronized (Singleton.class)
        {
            if (instance == null)
            {
                instance = new Singleton();
            }
        }
    }
    
    return instance;
}


// what you get
public static Singleton getInstance()
{
    if (instance == null)
    {
        synchronized (Singleton.class)
        {
            if (instance == null)
            {
                local = calloc(sizeof(Singleton));
                instance = local;
                local.<init>;
            }
        }
    }
    
    return instance;
}

Bring the Fences Up

Synchronization to the Rescue

No Fences

Let the CPU and Compiler optimize the hell out of your code

==== Producer       ==== Consumer
...                 ...
sharedData = ...;   while (!sharedDone)
sharedDone = true;  {
...                     ...
                    }
                    print(sharedData);
==== Producer       ==== Consumer
...                 ...
sharedDone = true;  while (!sharedDone)
sharedData = ...;   {
...                     ...
                    }
                    print(sharedData);

Fences

Make clear what the rules are

==== Producer               ==== Consumer
...                         ...
sharedData = ...;           while (!sharedDone)
--- store_store_fence();    {
sharedDone = true;              --- load_load_fence();
...                             ...
                            }
                            --- load_load_fence();
                            print(sharedData);
  • Hardware supports different type of fences
  • Prevents reordering of instructions
  • Ensures consistent state
  • Not all hardware has the same reorder properties but we don't care

Synchronization Actions

What the spec offers to overcome this

  • volatile
  • synchronized
  • final/freeze
  • Atomics
  • var handles
  • Each "method" has different properties such as speed, scope, and hardware support
  • All establish a happen-before relation
  • All prevent reordering
  • All will ensure consistent reads and writes

Synchronization Actions

What do they do?

  • final: You see only one state
  • volatile: Ensures consistent reads and writes - main memory instead of register
  • synchronized: Ensures consistent reads and writes and prevents that multiple threads run the same code, establishes single thread semantics
  • Atomics: Ensure consistent reads and writes, encapsulate reads-modify-write operations, BUT it is not the same as a synchronized, read all memory at the beginning, write all back at the end
  • var handles: Atomic, synchronized and final as code!

Our Toolbox discussed

What Each "Consistency Tool" Offers

Final/Freeze for Immutability

Final has some unknown characteristics

class Point
{
    private final int x;
    private final int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
--- freeze(); ---
shared = local;
  • Final takes care of reordering
  • Makes sure immutable is immutable
  • Warning, don't publish your object in the constructor aka put it in cache, final does not prevent anything here, reordering in the constructor is still permitted

Mutability - Volatile

Make mutability safe

The Java volatile keyword is used to mark a Java variable as "being stored in main memory". More precisely that means, that every read of a volatile variable will be read from the computer's main memory, and not from the CPU cache, and that every write to a volatile variable will be written to main memory, and not just to the CPU cache.

http://tutorials.jenkov.com/java-concurrency/volatile.html

  • CPU cache is kinda incorrect... register is more precise
// that now works
private volatile Singleton instance; 
 
public static Singleton getInstance()
{
    // volatile ensure that we read from main memory now
    if (instance == null) // a fence
    {
        synchronized (Singleton.class) // another fence
        {
            if (instance == null) // another fence
            {
                local = calloc(sizeof(Singleton));
                local.<init>;
                --- fence();
                instance = local; // write to main memory
            }
        }
    }
    
    return instance; // yeah... read main memory again too
}

Volatile at work

Read-Modify-Write Problem

// what you wrote
private volatile int shared;

public void inc()
{
    shared += 1;
}
// what the machines does
private volatile int shared;

public void inc()
{
    local = shared; // read from main memory
    --- fence();
    local = local + 1;
    shared = local; // flush to main memory
    --- fence();
}

0xc4c2fcc: mov    0xc(%rsi),%r11d
0xc4c2fd0: inc    %r11d
0xc4c2fd3: mov    %r11d,0xc(%rsi)      ;*putfield shared
0xc4c2fd7: lock addl $0x0,-0x40(%rsp)  ;*prevent reordering

Our Square Problem Again


private int shared = 0;

public int square()
{
    int x = shared * shared;
    return x;
}

0xcab294c: mov  0xc(%rsi),%eax ;*getfield shared
0xcab294f: imul %eax,%eax      ;*imul

private volatile int shared = 0;

0xcab294c: mov    0xc(%rsi),%eax  ;*getfield shared
0xcab294f: mov    0xc(%rsi),%r10d ;*getfield shared
0xcab2953: imul   %r10d,%eax      ;*imul

The famous one - synchronized

Create bigger regions

A Java synchronized block marks a method or a block of code as synchronized. Java synchronized blocks can be used to avoid race conditions.

http://tutorials.jenkov.com/java-concurrency/synchronized.html

  • Only synchronizes on objects
  • Java does not synchronize threads, it synchronizes memory!
private int shared;

public synchronized void inc()
{
    shared += 1;
}
private int shared;

public void inc()
{
    synchronized (this) 
    {
        shared += 1;
    }
}

Lock Coarsening

Make things even more efficient

synchronized (buffer)
{
    buffer.add(x);
}

foo = bar;

synchronized (buffer)
{
    buffer.add(y);
}
synchronized (buffer)
{
    buffer.add(x);
    foo = bar;
    buffer.add(y);
}
// even that is legal
synchronized (buffer)
{
    foo = bar;
    buffer.add(x);
    buffer.add(y);
}

More optimization

Make things even more efficient

synchronized (buffer)
{
    buffer.add(x);
}
foo = bar;

synchronized (buffer)
{
    buffer.add(x);
    foo = bar;
}
// Remember: Feed the CPU!

Atomics

Lock free updates... what?

private AtomicInteger atomic = new AtomicInteger(0);

public void do()
{
    atomic.getAndIncrement();
}
// under the hood
boolean applied = false;
do
{
    int value = shared.get();
    applied = shared.compareAndSet(value, value + 1);
}
while (!applied);
  • Hardware supported
  • Version controlled updates
  • Update if it was not updated otherwise read again and try again
  • Waits can be long, hence it is lock free but not wait free

Misunderstandings

Things you might have misunderstood

  • You HAVE TO tell Java that something is shared
  • Java builds code to read and write to memory instead of optimizing things to the edge
  • Java tells the CPU not to reorder
  • Java doesn't manage the cache, the CPU does
  • Shared data is more expensive but might be worth it
  • Immutable data is great but also not free
  • Go for immutable, unless you know what you do
  • Virtual Threads (Loom) are just threads, all rules apply
  • Testing is hard, reviewing is easier
  • You might never see a problem... until...

Conclusion

  • Know what the JMM is
  • Know that your code is not the code that runs
  • Know what optimizations are permitted
  • Know how to make things consistent
  • Know that it is difficult to do it right and fast
  • Don't try to outsmart the machine or other programmers (java.util)
  • P.S. We have not talked about Threadlocal and Static!

Questions and Answers