Knowing the JMM is Important
Wrong code only
public class Test {
private static Test instance;
public static Test getInstance() {
if (instance == null) {
synchronized (Test.class) {
if (instance == null) {
instance = new Test();
}
}
}
return instance;
}
}
public class Test {
private static final Day[] d = new Day[365];
public static Day get(int doy) {
return d[doy];
}
public static synchronized Day getOrInit(int doy) {
if(d[doy] == null) {
d[doy] = new Day();
}
return get(doy);
}
}
public class Test {
private volatile int a;
public int dec() {
return --a;
}
public int inc() {
return ++a;
}
}
public class Test {
private Map m = new ConcurrentHashMap();
public int getOrInit(String k, String v) {
if (m.get() == null) {
m.put(k, v);
}
return m.get(k);
}
}
public class Test {
private List list;
public synchronized add(String s) {
list.add(s);
}
public int size() {
return list.size();
}
}
public class Test {
private volatile List list = new ArrayList();
public int add(Object o) {
// local for safety
var l = list;
l.add(o);
// publish
list = l;
}
}
All You Have to Know in Two Slides or Less
There is Only One Truth
The Java Memory Model specifies how the Java virtual machine works with the computer's memory (RAM)...
It is very important to understand the Java memory model if you want to design correctly behaving concurrent programs. ...specifies how and when threads can see values written to shared variables by other threads, and how to synchronize access to shared variables...
Jakob Jenkov
https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
https://jenkov.com/tutorials/java-concurrency/java-memory-model.html
The One Rule that Tells Java to Optimize the Hell Out of Your Code, Without Breaking it
As if there's one thread, unless you say otherwise.
[1] https://www.youtube.com/watch?v=ADxUsCkWdbE
Always attribute other people's work
A lot of the content has been borrowed, validated, and modified from Concurrency Concepts in Java by Douglas Hawkins [1]
[1] https://www.youtube.com/watch?v=ADxUsCkWdbE
[2] https://shipilev.net/
Memory is Everywhere
Our World (JVM) vs. Real World (Hardware and OS)
Oversimplified and not to scale
public class Test {
// we all always live on the heap
private Map<String, String> aMap;
private int counter;
private static final Any<Integer> = new Any<>();
/**
* @param a - stack only (call-by-value)
* @param b - you don't know, reference passing on stack
* but the object itself?
* (call-by-reference)
*/
public void foo(int a, Any<?> b) {
int c = a++; // stack
var d = b; // ref d on stack,
// but b might be anywhere
// reference on stack, but instance?
// when small and it does not escape on stack
// when larger always on heap
int[] e = new int[a];
// reference stack, instance on heap
List<Foo> l = new ArrayList<>();
}
}
What operations are indivisible?
long shared = 0;
Thread 1
=================================================
shared = 2L;
set_hi shared, 0000 0000
set_lo shared, 0000 0002
Thread 2
=================================================
shared = -1L;
set_hi shared, ffff ffff
set_lo shared, ffff ffff
This is a 32-bit problem but as a Java programmer, you don't know where you are running on, hence don't rely on it! 64-bit VMs on x86 are fine.
Possible outcomes: -4294967294 or 4294967295
Source: Douglas Hawkins
int shared = 0;
=================================================
shared++;
local = shared;
local = local + 1;
shared = local;
getfield #2 // Field shared:I
iconst_1
iadd
putfield #2 // Field shared:I
0x00007f2af910f20c: mov 0xc(%rsi),%edi ;*getfield shared
0x00007f2af910f20f: inc %edi
0x00007f2af910f211: mov %edi,0xc(%rsi) ;*putfield shared
long local = 0;
=================================================
local += 2;
IINC 1 2
0x00007ff49cfde350: movabs $0x2,%r10
0x00007ff49cfde35a: add %r10,%rax
That is the same for 64bit and 32bit
Source: Douglas Hawkins
Is a NEW Operation Atomic?
class Point
{
private int x;
private int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
}
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
shared = local;
Source: Douglas Hawkins
What Do Other Threads See?
What the Spec Guarantees
No out-of-thin-air reads: Each read of a variable must see a value written by a write to that variable.
This hurts big time (cutting edge performance optimizations), because every single bit is written to first (initialized) before made available. That's why, reuse might be still a good thing in Java... but then it is not immutable... darn...
Source: Douglas Hawkins
Is This an Immutable Object?
class Point {
private int x;
private int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int getX() {
return x;
}
public int getY() {
return y;
}
}
Nope, it isn't! This is one.
class Point {
private final int x;
private final int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int getX() {
return x;
}
public int getY() {
return y;
}
}
class Point {
public final int x;
public final int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
That coding style is not mine! Opening curly brace -> newline!
Final guarantees one possible outcome
class Point
{
private final int x;
private final int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
}
Point shared = new Point(x, y);
https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.5
Source: Douglas Hawkins
shared = 20;
shared = 40;
print(shared);
// remove dead store
shared = 40;
print(shared);
// even fancier
print(40);
shared = 40;
Source: Douglas Hawkins
shared = 0;
for (int x : array)
{
shared += x;
}
local = 0;
for (int x : array)
{
local += x;
}
shared = local;
Source: Douglas Hawkins
// Can X be 30?
x = shared * shared;
// Compiler might do that
local1 = shared;
local2 = shared;
x = local1 * local2;
// In reality, this happens...
local = shared;
x = local * local;
0x00007f7740ab2ccc: mov 0xc(%rsi),%eax ;*getfield shared
0x00007f7740ab2ccf: imul %eax,%eax ;*imul
thread 1
==========================
local1 = shared; // 5
local2 = shared; // 6
x = local1 * local2; // 30
thread 2
==========================
shared = 5;
shared = 6;
Source: Douglas Hawkins
Avoid memory access as much as possible
// you wrote
// array is on the heap such as a member variable
for (int x : arrayOfInt)
{
sum = sum + x;
}
// javac turns this into
for (int i = 0; i < this.array.length; i++)
{
sum = sum + this.array[i];
}
// Hotspot turns this into
int[] local = this.array;
int length = local.length;
for (int i = 0; i < length; i++)
{
sum = sum + local[i];
}
Source: Douglas Hawkins
We will get to the why in a minute
int[] local = this.array;
int length = local.length;
for (int i = 0; i < length; i++)
{
sum = sum + local[i];
}
int[] local = this.array;
int length = local.length;
int last = 0;
// loop unrolling to the rescue plus
// some auto-vectorization (limited set of possibilities)
for (int i = 0; i < length; i = i + 4)
{
sum = sum + local[i];
sum = sum + local[i + 1];
sum = sum + local[i + 2];
sum = sum + local[i + 3];
last = i + 3;
}
for (int i = last + 1; i < length; i++)
{
sum = sum + local[i];
}
// done is a shared boolean
// you wrote
while (!done)
{
// logic here
}
// done is a shared boolean
// the compiler produces
local = !done;
while (local)
{
// logic here
}
Source: Douglas Hawkins
To some programmers, this behavior may seem "broken". However, it should be noted that this code is improperly synchronized.
The semantics of the Java programming language allow compilers and microprocessors to perform optimizations that can interact with incorrectly synchronized code in ways that can produce behaviors that seem paradoxical.
It is your fault!
Source: Douglas Hawkins
Concurrency and visibility is not about modifying things at the same time (race-condition), it is all about seeing data by more than one thread. This can be operations which are seconds or even minutes apart. The outcome might still be wrong!
Data and Control Dependence
sharedX = 2;
sharedY = 3;
if (sharedX > 0)
{
print(sharedX);
}
Source: Douglas Hawkins
This is all correct code
sharedX = 2;
if (sharedX > 0)
{
print(sharedX);
}
sharedY = 3;
print(2);
sharedX = 2;
sharedY = 3;
sharedX = 2;
if (sharedX > 0)
{
print(sharedX);
sharedY = 3;
}
else
{
sharedY = 3;
}
sharedY = 3;
sharedX = 2;
if (2 > 0)
{
print(2);
}
A Quick Detour to Explain Why All That is Important
What Does it Cost to Leave the CPU?
Real | Humanized | |
---|---|---|
CPU Cycle | 0.4 ns | 1 s |
L1 Cache | 0.9 ns | 2 s |
L2 Cache | 2.8 ns | 7 s |
L3 Cache | 28 ns | 1 min |
Memory Access | 100 ns | 4 min |
NVM SSD | 25 μs | 17 h |
SSD | 50–150 μs | 1.5-4 days |
HDD | 1–10 ms | 1-9 months |
TCP SF to NY | 65 ms | 5 years |
TCP SF to Hong Kong | 141 ms | 11 years |
http://www.prowesscorp.com/computer-latency-at-a-human-scale/
Everything happens in parallel
https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake
NEW
See Where it Fails and Why
Back to the new
Example
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
shared = local;
// a possible optimization
local = calloc(sizeof(Point));
shared = local;
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
// this will permit to load the CPU better and
// start the expensive store to memory earlier
Source: Douglas Hawkins
Now you might understand why double-checked locking is broken
// what you wrote
public static Singleton getInstance()
{
if (instance == null)
{
synchronized (Singleton.class)
{
if (instance == null)
{
instance = new Singleton();
}
}
}
return instance;
}
// what you get
public static Singleton getInstance()
{
if (instance == null)
{
synchronized (Singleton.class)
{
if (instance == null)
{
local = calloc(sizeof(Singleton));
instance = local;
local.<init>;
}
}
}
return instance;
}
Synchronization to the Rescue
Let the CPU and Compiler optimize the hell out of your code
==== Producer ==== Consumer
... ...
sharedData = ...; while (!sharedDone)
sharedDone = true; {
... ...
}
print(sharedData);
==== Producer ==== Consumer
... ...
sharedDone = true; while (!sharedDone)
sharedData = ...; {
... ...
}
print(sharedData);
Source: Douglas Hawkins
Make clear what the rules are
==== Producer ==== Consumer
... ...
sharedData = ...; while (!sharedDone)
--- store_store_fence(); {
sharedDone = true; --- load_load_fence();
... ...
}
--- load_load_fence();
print(sharedData);
What the spec offers to overcome this
volatile
synchronized
final
/freezeAtomics
var handles
What do they do?
final
: You see only one statevolatile
: Ensures consistent reads and writessynchronized
: Ensures consistent reads and writes and prevents that multiple threads run the same code, establishes single thread semanticsAtomics
: Ensure consistent reads and writes, encapsulate reads-modify-write operations, BUT it is not the same as a synchronized, read all memory at the beginning, write all back at the endvar handles
: Atomic, synchronized and final as code!Final has some unknown characteristics
class Point
{
private final int x;
private final int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
}
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
--- freeze(); ---
shared = local;
Make mutability safe
The Java volatile keyword is used to mark a Java variable as "being stored in main memory". More precisely that means, that every read of a volatile variable will be read from the computer's main memory, and not from the CPU cache, and that every write to a volatile variable will be written to main memory, and not just to the CPU cache.
http://tutorials.jenkov.com/java-concurrency/volatile.html
// that now works
private volatile Singleton instance;
public static Singleton getInstance()
{
// volatile ensure that we read from main memory now
if (instance == null) // a fence
{
synchronized (Singleton.class) // another fence
{
if (instance == null) // another fence
{
local = calloc(sizeof(Singleton));
local.<init>;
--- fence();
instance = local; // write to main memory
}
}
}
return instance; // yeah... read main memory again too
}
Full behavior
volatile
: Write "happens before" all subsequent readsvolatile
is a performance penaltypublic class MyClass
{
private int years;
private int months
private volatile int days;
public void update(int years, int months, int days)
{
this.years = years;
this.months = months;
this.days = days;
}
}
# days last
0x00007f1040ab294c: mov %edx,0x10(%rsi) ;*putfield years
0x00007f1040ab294f: mov %ecx,0x14(%rsi) ;*putfield month
0x00007f1040ab2952: mov %r8d,0x18(%rsi)
0x00007f1040ab2956: lock addl $0x0,-0x40(%rsp) ;*putfield days
# days first
0x00007f012cab3c4c: mov %r8d,0x18(%rsi)
0x00007f012cab3c50: lock addl $0x0,-0x40(%rsp) ;*putfield days
0x00007f012cab3c56: mov %edx,0x10(%rsi) ;*putfield years
0x00007f012cab3c59: mov %ecx,0x14(%rsi) ;*putfield months
Read-Modify-Write Problem
// what you wrote
private volatile int shared;
public void inc()
{
shared += 1;
}
// what the machines does
private volatile int shared;
public void inc()
{
local = shared; // read from main memory
local = local + 1;
shared = local; // flush to main memory
--- fence();
}
0xc4c2fcc: mov 0xc(%rsi),%r11d
0xc4c2fd0: inc %r11d
0xc4c2fd3: mov %r11d,0xc(%rsi)
0xc4c2fd7: lock addl $0x0,-0x40(%rsp) ;*putfield shared
Our Square Problem Again
private int shared = 0;
public int square()
{
int x = shared * shared;
return x;
}
0xcab294c: mov 0xc(%rsi),%eax ;*getfield shared
0xcab294f: imul %eax,%eax ;*imul
private volatile int shared = 0;
0xcab294c: mov 0xc(%rsi),%eax ;*getfield shared
0xcab294f: mov 0xc(%rsi),%r10d ;*getfield shared
0xcab2953: imul %r10d,%eax ;*imul
Create bigger regions
A Java synchronized block marks a method or a block of code as synchronized. Java synchronized blocks can be used to avoid race conditions.
http://tutorials.jenkov.com/java-concurrency/synchronized.html
private int shared;
public synchronized void inc()
{
shared += 1;
}
private int shared;
public void inc()
{
synchronized (this)
{
shared += 1;
}
}
The key to the kingdom
java.util.concurrent
https://docs.oracle.com/javase/8/docs/api/?java/util/concurrent/package-summary.html
How to avoid busy spinning
// Producer
synchronized (lock)
{
sharedData = ...;
sharedDone = true;
lock.notify();
}
// Consumer
synchronized (lock)
{
while (!sharedDone)
{
lock.wait();
}
print(sharedData);
}
Make things even more efficient
synchronized (buffer)
{
buffer.add(x);
}
foo = bar;
synchronized (buffer)
{
buffer.add(y);
}
synchronized (buffer)
{
buffer.add(x);
foo = bar;
buffer.add(y);
}
// even that is legal
synchronized (buffer)
{
foo = bar;
buffer.add(x);
buffer.add(y);
}
Make things even more efficient
synchronized (buffer)
{
buffer.add(x);
}
foo = bar;
synchronized (buffer)
{
buffer.add(x);
foo = bar;
}
// Remember: Feed the CPU!
Lock free updates... what?
private AtomicInteger atomic = new AtomicInteger(0);
public void do()
{
atomic.getAndIncrement();
}
// under the hood
boolean applied = false;
do
{
int value = shared.get();
applied = shared.compareAndSet(value, value + 1);
}
while (!applied);
Immutable objects have a very compelling list of positive qualities. Without question, they are among the simplest and most robust kinds of classes you can possibly build. When you create immutable classes, entire categories of problems simply disappear.
http://www.javapractices.com/topic/TopicAction.do?Id=29
Classes should be immutable unless there's a very good reason to make them mutable... If a class cannot be made immutable, limit its mutability as much as possible.
Effective Java by Joshua Bloch
Douglas Hawkins: https://www.youtube.com/watch?v=ADxUsCkWdbE
What many people do
public class MyClass
{
private int sum;
public synchronized void add(int i)
{
this.sum += i;
}
public int get()
{
return this.sum;
}
}
get()
is not properly done, because the cpu has no reason to go to main memory for a fresh state
public class MyClass
{
private volatile List list;
public void add(String s)
{
list.add(s);
}
public void remove(Object o)
{
this.list.remove(o);
}
}
Correct Code for Counting
public class MyClass
{
private volatile int a;
public void inc(
{
this.a++;
}
public void dec()
{
this.a--;
}
}
public class MyClass
{
private int a;
// likely never inlined!
public synchronized void inc()
{
this.a++;
}
public synchronized void dec()
{
this.a--;
}
}
public class MyClass
{
// yes, a final should be always here, best practise
private final AtomicInteger a = new AtomicInteger(0);
// this can be now inlined
public void inc()
{
this.a.getAndIncrement();
}
public void dec()
{
this.a.getAndDecrement();
}
}
Don't forget: Concurrency Concepts in Java by Douglas Hawkins - https://www.youtube.com/watch?v=ADxUsCkWdbE