Java Memory Model

Knowing the JMM is important

An atypical Motivation

Programming language power consumption aka efficiency

Java in the Nutshell

Just a quick knowledge quiz

  • Compiled or Interpreted?
    • Compiled AND Interpreted
  • How many compilers?
    • Three - javac, C1, C2
  • How close to the hardware?
    • Very when compiled with C1, C2
  • Better or worse than static compiling?
    • Better, because it optimizes during runtime

Concurrency and Compiler Tricks

What to know what you want to get the most out of Java

  • Did anyone read the JLS - Java Language Spec?
  • Explains what a compiler and JVM builder has to obey
  • Java was first language that tried such a model
  • Chapter 17 - Threads and Locks
  • In preparation of Java Performance Programming
  • Know what is legal and what illegal
  • Know where most of the speed comes from
  • Know that you might not want to program in that detail
A lot of the content has been shamelessly borrowed from "Concurrency Concepts in Java by Douglas Hawkins": https://www.youtube.com/watch?v=ADxUsCkWdbE

TL;DR

The one rule that tells Java to optimize the hell out of your code

As if there's one thread, unless you say otherwise.

  • In English: I don't care about other threads or programs unless you tell me.
  • Everything that fits this statement is legal

Atomicity

What operations are indivisible?

Compact != Atomic

long shared = 0;

Thread 1
=================================================
shared = 2L;
set_hi shared, 0000 0000


set_lo shared, 0000 0002


Thread 2
=================================================
shared = -1L;

set_hi shared, ffff ffff
set_lo shared, ffff ffff

Atomic?

int shared = 0;
=================================================
shared++;
local = shared;
local = local + 1;
shared = local;
aload_0         
dup             
getfield        #2   // Field shared:I
iconst_1        
iadd            
putfield        #2   // Field shared:I
0x00007f2af910f20c: mov 0xc(%rsi),%edi  ;*getfield shared
0x00007f2af910f20f: inc %edi
0x00007f2af910f211: mov %edi,0xc(%rsi)  ;*putfield shared
long local = 0;
=================================================
local += 2;
IINC 1 2
0x00007ff49cfde350: movabs $0x2,%r10
0x00007ff49cfde35a: add    %r10,%rax
  • Called Read - Modify - Write operation

Atomic?

public void foo(int i)
{
    int x = Random.nextInt();
    x++; // atomic?
    x = x + 2; // atomic
    
    i++; // atomic?
    
    x = x + i; // atomic?
}
  • Was a trick question... any local variable operation is always atomic
  • ...even though it might be more than one OP at the end

Is new atomic?

Is a NEW Operation Atomic?

class Point
{
    private int x;
    private int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
shared = local;
  • It is not atomic and we will learn why later

Visibility

What do other threads see?

No garbage

What the spec guarantees

  • Java will never present us garbage
  • Because everything is initialized
  • Every primitive is initialized
  • Every array
  • Every reference

No out-of-thin-air reads: Each read of a variable must see a value written by a write to that variable.

Immutable

Is this an immutable object?

class Point
{
    private int x;
    private int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
    
    public int getX()
    {
        return x;
    }

    public int getY()
    {
        return y;
    }
}

Nope, it isn't! This is one.

class Point
{
    private final int x;
    private final int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
    
    public int getX()
    {
        return x;
    }

    public int getY()
    {
        return y;
    }
}
class Point
{
    public final int x;
    public final int y;

    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}

What is this final?

Final guarantees one possible outcome

class Point
{
    private final int x;
    private final int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
  • Spec ensures that we don't see 0 ever
  • Only the assigned values of x and y
  • Hence immutable objects are really immutable aka complete
  • Final adds a fence (more later)

Not everything will be visible

shared = 20;
shared = 40;

print(shared);
  • Java builds data dependency graph
  • Line 4 depends on 2, but nobody on line 1
  • Remove line 1
  • Single thread consistency
// remove dead store
shared = 40;

print(shared);
// even fancier
print(40);
shared = 40;
  • This is also totally legal

No everything will be stored

shared = 0;
for (int x : array)
{
    shared += x;
}
local = 0;
for (int x : array)
{
    local += x;
}

shared = local;
  • Single thread consistency

When is data published?

  • Single thread data is published when
    • Thread ends (flushes to main memory) and another thread calls Thread.isAlive and gets false returned
    • Thread.join is called
    • Not on: Thread.yield
    • Not on: Thread.sleep (Hotspot does, but not guarantee)

More reading

// Can X be 30?
x = shared * shared;
// Compiler might do that
local1 = share;
local2 = share;
x = local1 * local2;
thread 1
==========================

local1 = share; // 5

local2 = share; // 6
x = local1 * local2; // 30
thread 2
==========================
share = 5;

share = 6;
  • Single thread consistency
  • Removing the redundant load is also legal and gives us nice squares by accident

Important Optimizations

Avoid memory access as much as possible

// you wrote
// array is on the heap such as a member variable
for (int x : arrayOfInt)
{
    // do fancy stuff
    sum = sum + x;
}

// javac turns this into
for (int i = 0; i < this.array.length; i++)
{
    // do fancy stuff
    sum = sum + this.array[i];
}
// Hotspot turns this into
Object[] local = this.array;
for (int i = 0; i < local.length; i++)
{
    // do fancy stuff
    sum = sum + local[i];
}

And even worse...

// done is a shared boolean
// you wrote
while (!done)
{
    // logic here
}

// done is a shared boolean
// the compiler produces
local = !done;
while (local)
{
    // logic here
}

And the JLS states...

JLS 17.4 Memory Model

To some programmers, this behavior may seem "broken". However, it should be noted that this code is improperly synchronized.

The semantics of the Java programming language allow compilers and microprocessors to perform optimizations that can interact with incorrectly synchronized code in ways that can produce behaviors that seem paradoxical.

And even more freedom... Ordering

Data and Control Dependence

sharedX = 2;
sharedY = 3;
if (sharedX > 0)
{
    print(sharedX);
}

The modern world

Everything happens in parallel

  • To utilize modern computers architectures, the compilers and the processor are free to do whatever it takes...
  • ...as long as your defined semantics stay the same
  • Javac, JIT, and the CPU reorder and optimize
  • As long as the result at the end of the method is the same independent of the reordering or other optimization, it can be done
  • If you don't declare things shared, there is no effort put into it
  • Current Intels can do 3 to 4 instructions per cycle (Spectre set aside)

Reordered New

Back to the new Example

// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
shared = local;
// a possible optimization
local = calloc(sizeof(Point));
shared = local;
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
// this will permit to load the CPU better and
// start the expensive store to memory earlier

Double-Checked Locking

Now you might understand why double-checked locking is broken

// what you wrote 
public static Singleton getInstance()
{
    if (instance == null)
    {
        synchronized (Singleton.class)
        {
            if (instance == null)
            {
                instance = new Singleton();
            }
        }
    }
    
    return instance;
}


// what you get
public static Singleton getInstance()
{
    if (instance == null)
    {
        synchronized (Singleton.class)
        {
            if (instance == null)
            {
                local = calloc(sizeof(Singleton));
                instance = local;
                local.<init>;
            }
        }
    }
    
    return instance;
}

Bring the Fences Up

Synchronization to the Rescue

No Fences

Let the CPU and Compiler do what they want

==== Producer       ==== Consumer
...                 ...
sharedData = ...;   while (!sharedDone)
sharedDone = true;  {
...                     ...
                    }
                    print(sharedData);
==== Producer       ==== Consumer
...                 ...
sharedDone = true;  while (!sharedDone)
sharedData = ...;   {
...                     ...
                    }
                    print(sharedData);

Fences

Make clear what the rules are

==== Producer               ==== Consumer
...                         ...
sharedData = ...;           while (!sharedDone)
--- store_store_fence();    {
sharedDone = true;              --- load_load_fence();
...                             ...
                            }
                            print(sharedData);
  • Hardware supports different type of fences
  • Prevents reordering of instructions
  • Not all hardware has the same reorder properties but we don't care

Synchronization Actions

What the spec offers to overcome this

  • volatile
  • synchronized
  • final/freeze
  • Atomics

Final/Freeze for Immutability

Final has some unknown characteristics

class Point
{
    private final int x;
    private final int y;
    
    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}
    
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
    Object.<init>();
    this.x = x;
    this.y = y;
--- freeze(); ---
shared = local;
  • Final takes care of reordering
  • Makes sure immutable is immutable
  • Warning, don't publish your object in the constructor aka put it in cache, final does not prevent anything here, reordering in the constructor is still permitted

Mutability - Volatile

Make mutability safe

The Java volatile keyword is used to mark a Java variable as "being stored in main memory". More precisely that means, that every read of a volatile variable will be read from the computer's main memory, and not from the CPU cache, and that every write to a volatile variable will be written to main memory, and not just to the CPU cache.

http://tutorials.jenkov.com/java-concurrency/volatile.html

  • Fixes double checking locking (final would be the best way or holder pattern)
// that now works
private volatile Singleton instance; 
 
public static Singleton getInstance()
{
    // volatile ensure that we read from main memory now
    if (instance == null) // a fence
    {
        synchronized (Singleton.class) // another fence
        {
            if (instance == null) // another fence
            {
                local = calloc(sizeof(Singleton));
                local.<init>;
                --- fence();
                instance = local; // write to main memory
            }
        }
    }
    
    return instance; // yeah... read main memory again too
}

Volatile again

Full behavior

  • volatile: Write "happens before" all subsequent reads
  • If Thread A writes to a volatile variable and Thread B subsequently reads the same volatile variable, then all variables visible to Thread A before writing the volatile variable, will also be visible to Thread B after it has read the volatile variable.
  • If Thread A reads a volatile variable, then all variables visible to Thread A when reading the volatile variable will also be re-read from main memory.
  • In other words: volatile is a performance penalty
public class MyClass 
{
    private int years;
    private int months
    private volatile int days;

    public void update(int years, int months, int days)
    {
        this.years  = years;
        this.months = months;
        this.days   = days;
    }
}

Detour: Cost

What does it cost to leave the CPU

  Real Humanized
CPU Cycle 0.4 ns 1 s
L1 Cache 0.9 ns 2 s
L2 Cache 2.8 ns 7 s
L3 Cache 28 ns 1 min
Memory Access 100 ns 4 min
NVM SSD 25 μs 17 h
SSD 50–150 μs1.5-4 days
HDD 1–10 ms 1-9 months
TCP SF to NY 65 ms 5 years
TCP SF to Hong Kong141 ms11 years
  • It costs 100 to 200 cycles when we have to go to main memory
  • Hence Java and the CPU avoid main memory access at all cost
  • Java does not have the notion of a location, hence this is transparent

Volatile is not the magic bullet

Compact states stay wrong

// what you wrote
private volatile int shared;

public void inc()
{
    shared += 1;
}
// what the machines does
private volatile int shared;

public void inc()
{
    local = shared; // read from main memory
    --- fence();
    local = local + 1;
    --- fence();
    shared = local; // flush to main memory
}

The famous one - synchronized

Create bigger regions

A Java synchronized block marks a method or a block of code as synchronized. Java synchronized blocks can be used to avoid race conditions.

http://tutorials.jenkov.com/java-concurrency/synchronized.html

  • Only synchronizes on objects
  • Java does not synchronize threads it synchronizes memory!
private int shared;

public synchronized void inc()
{
    shared += 1;
}
private int shared;

public void inc()
{
    synchronized (this) 
    {
        shared += 1;
    }
}

Happen-Before

The key to the kingdom

  • JLS defines the happens-before relation on memory operations such as reads and writes of shared variables
  • The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships
  • You mostly get this wrong, hence use java.util.concurrent
  • Concurrent is thread-safe but not necessarily locking/blocking
  • Each action in a thread happens-before every action in that thread that comes later in the program's order
  • An unlock of a monitor happens-before every subsequent lock of that same monitor
  • A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking
  • A call to start on a thread happens-before any action in the started thread.
  • All actions in a thread happen-before any other thread when successfully returning from a join on that thread

Correct Wait and Notify

How to avoid busy spinning

// Producer
synchronized (lock)
{
    sharedData = ...;
    sharedDone = true;
    
    lock.notify();
}
// Consumer
synchronized (lock)
{
    while (!sharedDone)
    {
        lock.wait();
    }
    print(sharedData);
}
  • Notify does not start the other thread!
  • Notify could be at the start of the producer block and it would work
  • You can be woken up without anyone notifying you (sporadic wakeup)
  • Java and JIT can reorder anything within the synchronized block

Lock Coarsening

Make things even more efficient

synchronized (buffer)
{
    buffer.add(x);
}

foo = bar;

synchronized (buffer)
{
    buffer.add(y);
}
synchronized (buffer)
{
    buffer.add(x);
    foo = bar;
    buffer.add(y);
}
// even that is legal
synchronized (buffer)
{
    foo = bar;
    buffer.add(x);
    buffer.add(y);
}

More optimization

Make things even more efficient

synchronized (buffer)
{
    buffer.add(x);
}
foo = bar;

synchronized (buffer)
{
    buffer.add(x);
    foo = bar;
}
// Remember: Feed the CPU!

Atomics

Lock free updates... what?

private AtomicInteger atomic = new AtomicInteger(0);

public void do()
{
    atomic.getAndIncrement();
}
// under the hood
boolean applied = false;
do
{
    int value = shared.get();
    applied = shared.compareAndSet(value, value + 1);
}
while (!applied);
  • Hardware supported
  • Version controlled updates
  • Update if it was not updated otherwise read again and try again
  • Waits can be long, hence it is lock free but not wait free

Shared Mutability

  • Mutability is difficult
  • Shared mutability is even harder
  • Always design for none-shared mutability
  • Immutability rules

Immutable objects have a very compelling list of positive qualities. Without question, they are among the simplest and most robust kinds of classes you can possibly build. When you create immutable classes, entire categories of problems simply disappear.

http://www.javapractices.com/topic/TopicAction.do?Id=29

Classes should be immutable unless there's a very good reason to make them mutable... If a class cannot be made immutable, limit its mutability as much as possible.

Effective Java by Joshua Bloch

Conclusion

  • Know what the JMM is
  • Know that your code is not the code that runs
  • Know what optimizations are permitted
  • Know what you need to make things consistent
  • Know that it is difficult to do it right
  • Know that you mostly should not build it yourself

Questions and Answers