Why you really need synchronization and it is not what you think
Why you should stick around
I want to clear up common misconceptions when programming with more than one thread. I am not a JDK developer, just a power user, so some details might be overly simplified or slightly incorrect.
If this doesn't scare you away, what will?
Java is highly efficient - you want to make your programs efficient too
Source: https://sites.google.com/view/energy-efficiency-languages/home
Write correct code, make it scale, keep it efficient
Always attribute other people's work and state your sources
Some of the content has been inspired by or borrowed and modified from Concurrency Concepts in Java by Douglas Hawkins [1]
[1] https://www.youtube.com/watch?v=ADxUsCkWdbE
[2] https://shipilev.net/
All You Have to Know in Three Slides or Less
The spec, no fun
The Java Memory Model specifies how the Java virtual machine works with the computer's memory (RAM)...
It is very important to understand the Java memory model if you want to design correctly behaving concurrent programs. ...specifies how and when threads can see values written to shared variables by other threads, and how to synchronize access to shared variables...
Jakob Jenkov
https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
https://jenkov.com/tutorials/java-concurrency/java-memory-model.html
https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/
https://gee.cs.oswego.edu/dl/jmm/cookbook.html
The one rule that tells Java to optimize the hell out of your code without breaking it
As if there's one thread, unless you say otherwise.
[1] https://www.youtube.com/watch?v=ADxUsCkWdbE
When we blew it all up
When Java adjusted the JLS and made the JVM more performant with Java 1.5, a lot of programs stopped behaving correctly.
Not because the JVM was broken, but because all programs had broken synchronization and just worked fine up to that point.
Everything from Before in a Few More Words
Memory is Everything
Our World (JVM) vs. Real World (Hardware and OS)
Oversimplified and not to scale
public class Test
{
// we all always live on the heap
private Map<String, String> aMap;
private int counter;
private static final Any<Integer> = new Any<>();
/**
* @param a - stack only (call-by-value)
* @param b - you don't know, reference passing on stack
* but the object itself?
* (call-by-reference)
*/
public void foo(int a, Any<?> b)
{
int c = a++; // register or stack or both
var d = b; // ref d on stack or in register,
// but b might be anywhere
// reference on stack/in register, but instance?
// when small and it does not escape on stack
// when larger always on heap
int[] e = new int[a];
// reference stack, instance on heap
List<Foo> l = new ArrayList<>();
}
}
Forgot to define important terms
Sharing means that another thread can see and/or change data that someone else created or owns.
The data access does not have to be at the same time to create a potential data race.
A data race occurs when: two or more threads in a single process access the same memory location concurrently, and at least one of the accesses is for writing, and the threads are not using any exclusive locks to control their accesses to that memory [1].
[1] https://docs.oracle.com/cd/E19205-01/820-0619/geojs/index.html
It is in the word, isn't it?
Concurrency means, that data is accessed by two or more threads.
At least one is writing.
There is no time component in that definition!
Concurrency must be interpreted more liberally nowadays.
P.S. You share your data with the garbage collector.
Let's look behind the scenes and bust some myths
One-liners are always safe, aren't they?
Is this small enough for an atomic (indivisible) operation?
int shared = 0; // field of an instance
=================================================
shared++;
getfield #2 // Field shared:I
iconst_1
iadd
putfield #2 // Field shared:I
local = shared;
local = local + 1;
shared = local;
0x00007f2af910f20c: mov 0xc(%rsi),%edi ; getfield shared
0x00007f2af910f20f: inc %edi
0x00007f2af910f211: mov %edi,0xc(%rsi) ; putfield shared
# also possible
0x00007f73c03a18cc: incl 0xc(%rsi) ; increment at address*
int local = 0; // stack aka method scope
=================================================
local += 2;
IINC 1 2
# "local" already lives in %rax
0x00007ff49cfde350: movabs $0x2,%r10
0x00007ff49cfde35a: add %r10,%rax
Example by Douglas Hawkins
*) This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.
Is a new
atomic?
class Point
{
private int x;
private int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
}
Point shared = new Point(x, y);
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
shared = local;
Source: Douglas Hawkins
When do others see my changes?
Clean hands all the time guaranteed by the spec
No out-of-thin-air reads: Each read of a variable must see a value written by a write to that variable.
This hurts big time (cutting edge performance optimizations), because every single bit is written to first (initialized) before made available. That's why, reuse might be still a good thing in Java... but then it is not immutable... darn...
#cache #bandwidth
shared = 20;
shared = 40;
print(shared);
// remove dead store
shared = 40;
print(shared);
// even fancier
print(40);
shared = 40;
Source: Douglas Hawkins
// your code
shared = 0;
for (int x : array)
{
shared += x;
}
// "translated"
local = 0; // likely a register
for (int x : array)
{
local += x;
}
shared = local;
Source: Douglas Hawkins
// Can X be 30?
x = shared * shared;
// Compiler might do that
local1 = shared;
local2 = shared;
x = local1 * local2;
// In reality, this happens...
local = shared;
x = local * local;
0x00007f7740ab2ccc: mov 0xc(%rsi),%eax ;*getfield shared
0x00007f7740ab2ccf: imul %eax,%eax ;*imul
thread 1
==========================
local1 = shared; // 5
local2 = shared; // 6
x = local1 * local2; // 30
thread 2
==========================
shared = 5;
shared = 6;
Example: Douglas Hawkins
Data and Control Dependence
// our code
sharedX = 2;
sharedY = 3;
if (sharedX > 0)
{
print(sharedX);
}
// legal versions by the compiler
sharedX = 2;
if (sharedX > 0)
{
print(sharedX);
}
sharedY = 3;
print(2);
sharedX = 2;
sharedY = 3;
sharedY = 3;
sharedX = 2;
if (2 > 0)
{
print(2);
}
Example: Douglas Hawkins
To some programmers, this behavior may seem "broken". However, it should be noted that this code is improperly synchronized.
The semantics of the Java programming language allow compilers and microprocessors to perform optimizations that can interact with incorrectly synchronized code in ways that can produce behaviors that seem paradoxical.
It is your fault!
A Quick Detour
What Does it Cost to Leave the CPU?
Real | Humanized | |
---|---|---|
CPU Cycle | 0.4 ns | 1 s |
L1 Cache | 0.9 ns | 2 s |
L2 Cache | 2.8 ns | 7 s |
L3 Cache | 28 ns | 1 min |
Memory Access | 100 ns | 4 min |
NVM SSD | 25 μs | 17 h |
SSD | 50–150 μs | 1.5-4 days |
HDD | 1–10 ms | 1-9 months |
TCP SF to NY | 65 ms | 5 years |
TCP SF to Hong Kong | 141 ms | 11 years |
http://www.prowesscorp.com/computer-latency-at-a-human-scale/
Everything happens in parallel
https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake
Implicit concurrent execution of your code
public class Foo
{
private double total;
private double fees;
public void transact(double m)
{
total += m;
fee += Math.abs(m * 0.01);
}
}
# rM is m in a register
load total, rTotal
add rM, rTotal
store rTotal, total
load 0.01, rT1
mul rM, rT1
abs rT1
load fees, rFees
add rT1, rFees
store rFees, fees
# Execution Unit 1
load total, rTotal
add rM, rTotal
store rTotal, total
# Execution Unit 2
load 0.01, rT1
mul rM, rT1
abs rT1
# Execution Unit 3
load fees, rFees
# Execution Unit 4
add rT1, rFees
store rFees, fees
Extremly simplified. ABS is about three ASM lines.
Predict and speculate to be ahead of the crowd
private static final int COUNT = 10_000;
// Contains random numbers from -50 to 50
private int[] sorted;
private int[] reversed;
public int doIt(int[] array) {
for (int v : array) {
int sum = 0;
for (int v : array) {
if (v > 0) {
sum = sum + 2;
}
else {
sum = sum + 1;
}
}
return sum;
}
}
# T14s
# Benchmark Mode Cnt Score Error Units
sorted avgt 2 3,200 ns/op
unsorted avgt 2 14,927 ns/op
# Benchmark Mode Cnt Score Units
BranchPrediction1.sorted avgt 2 3,200.529 ns/op
BranchPrediction1.sorted:IPC avgt 3.912 insns/clk
BranchPrediction1.sorted:branch-misses avgt 5.800 #/op
BranchPrediction1.sorted:branches avgt 17,827.359 #/op
BranchPrediction1.sorted:cycles avgt 13,969.192 #/op
BranchPrediction1.sorted:instructions avgt 53,254.559 #/op
BranchPrediction1.unsorted avgt 2 14,927.107 ns/op
BranchPrediction1.unsorted:IPC avgt 0.875 insns/clk
BranchPrediction1.unsorted:branch-misses avgt 1,072.056 #/op
BranchPrediction1.unsorted:branches avgt 17,693.113 #/op
BranchPrediction1.unsorted:cycles avgt 60,857.134 #/op
BranchPrediction1.unsorted:instructions avgt 53,320.560 #/op
What did we learn in the show tonight, CraigRené?
Loop unrolling, inlining, reordering, dead code removal, branch prediction, caching, speculative execution, pipelining, memory access optimization, location aware memory, register machine, ahead of time evaluation, thread scheduling and moving, on-the-fly code replacement ...
Once again, overly simplified for your viewing pleasure.
What we have to do to get things working
Concurrency and visibility are not about modifying things at the same time, it is all about reading data that was modified by another thread. These operations could be seconds or even (theoretically) minutes apart.
NEW
See Where it Fails and Why
What all the previous thing will do to NEW
// Am I immutable?
class Point {
private int x;
private int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int getX() {
return x;
}
public int getY() {
return y;
}
}
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
shared = local;
That coding style is not mine! Opening curly brace -> newline!
But why can I see the 0 values?
// pseudo-code
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
shared = local;
// a possible optimization
local = calloc(sizeof(Point));
shared = local;
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
// this will permit to load the CPU better and
// start the store to memory earlier
Source: Douglas Hawkins
Now you might understand why double-checked locking is broken
// what you wrote
public static Singleton getInstance()
{
if (instance == null)
{
synchronized (Singleton.class)
{
if (instance == null)
{
instance = new Singleton();
}
}
}
return instance;
}
// what you get
public static Singleton getInstance()
{
if (instance == null)
{
synchronized (Singleton.class)
{
if (instance == null)
{
local = calloc(sizeof(Singleton));
instance = local;
local.<init>;
}
}
}
return instance;
}
Silence! Or I will synchronize you.
Let the CPU and compiler optimize the hell out of your code
==== Producer ==== Consumer
... ...
sharedData = ...; while (!sharedDone)
sharedDone = true; {
... ...
}
print(sharedData);
==== Producer ==== Consumer
... ...
sharedDone = true; local = !sharedDone;
while (local)
sharedData = ...; {
... ...
}
print(sharedData);
Source: Douglas Hawkins
Make clear what the rules are
==== Producer ==== Consumer
... ...
sharedData = ...; while (!sharedDone)
--- store_store_fence(); {
sharedDone = true; --- load_load_fence();
... ...
}
--- load_load_fence();
print(sharedData);
But there are no fences in the Java language, mum!?
volatile
synchronized
final
Atomics
varhandles
What can they do for us?
final
: You see only one state plus javac checksvolatile
: Ensures consistent reads and writes - main memory instead of registerssynchronized
: Ensures consistent reads and writes and prevents that multiple threads run the same code, establishes single thread semanticsAtomics
: Ensure consistent reads and writes, encapsulate read-modify-write operationsvarhandles
: Atomic, synchronized, and final as code (extremly simplified definition)!A VarHandle is a dynamically strongly typed reference to a variable, or to a parametrically-defined family of variables, including static fields, non-static fields, array elements, or components of an off-heap data structure.
What Each "Consistency Tool" Offers
Final has some unknown characteristics
class Point
{
private final int x;
private final int y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
}
Point shared = new Point(x, y);
// pseudo-code with final
local = calloc(sizeof(Point));
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
--- freeze(); ---
shared = local;
// pseudo-code no final
local = calloc(sizeof(Point));
shared = local;
local.<init>(x, y);
Object.<init>();
this.x = x;
this.y = y;
Make mutability safe
The Java volatile keyword is used to mark a Java variable as being stored in main memory. More precisely that means, that every read of a volatile variable will be read from the computer's main memory, and not from the CPU cache, and that every write to a volatile variable will be written to main memory, and not just to the CPU cache.
http://tutorials.jenkov.com/java-concurrency/volatile.html
"...and not from the CPU cache ... not just to the CPU cache". The part with the cache is horribly wrong.
René Schwietzke
// this will work now
private volatile Singleton instance;
public static Singleton getInstance()
{
// volatile ensure that we read freshly from memory
if (instance == null) // a fence
{
synchronized (Singleton.class) // another fence
{
if (instance == null) // another fence
{
local = calloc(sizeof(Singleton));
local.<init>;
--- fence();
instance = local; // write to memory
}
}
}
return instance; // read from memory again
}
Read-Modify-Write Problem
// what you wrote
private volatile int shared;
public void inc()
{
shared += 1;
}
// what the machines does
private volatile int shared;
public void inc()
{
local = shared; // read from memory
--- fence();
local = local + 1;
shared = local; // write to memory
--- fence();
}
0xc4c2fcc: mov 0xc(%rsi),%r11d
0xc4c2fd0: inc %r11d
0xc4c2fd3: mov %r11d,0xc(%rsi) ;*putfield shared
0xc4c2fd7: lock addl $0x0,-0x40(%rsp) ;*prevent reordering
Our Square Problem Again
private int shared = 0;
public int square()
{
int x = shared * shared;
return x;
}
0xcab294c: mov 0xc(%rsi),%eax ;*getfield shared
0xcab294f: imul %eax,%eax ;*imul
private volatile int shared = 0;
0xcab294c: mov 0xc(%rsi),%eax ;*getfield shared
0xcab294f: mov 0xc(%rsi),%r10d ;*getfield shared
0xcab2953: imul %r10d,%eax ;*imul
private volatile int shared = 0;
public int square()
{
int x = shared;
return x * x;
}
Create bigger regions
A Java synchronized block marks a method or a block of code as synchronized. Java synchronized blocks can be used to avoid race conditions.
http://tutorials.jenkov.com/java-concurrency/synchronized.html
private int shared;
public synchronized void inc()
{
shared += 1;
}
private int shared;
public void inc()
{
synchronized (this)
{
shared += 1;
}
}
Nothing in life is free, especially not synchronization
0x00007f80cc3a18c0: sub $0x18,%rsp
0x00007f80cc3a18c7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - org.sample.Test3::doIt2@-1 (line 17)
0x00007f80cc3a18cc: incl 0xc(%rsi) ;*putfield shared
; - org.sample.Test3::doIt2@7 (line 17)
0x00007f80cc3a18cf: add $0x10,%rsp
0x00007f80cc3a18d3: pop %rbp
0x00007f80cc3a18d4: mov 0x108(%r15),%r10
0x00007f80cc3a18db: test %eax,(%r10) ; {poll_return} *** SAFEPOINT POLL ***
0x00007f80cc3a18de: ret
0x00007f80cc3a1440: mov %eax,-0x14000(%rsp)
0x00007f80cc3a1447: push %rbp
0x00007f80cc3a1448: sub $0x20,%rsp
0x00007f80cc3a144c: mov %rsi,%rbp
0x00007f80cc3a144f: mov (%rsi),%rax
0x00007f80cc3a1452: mov %rax,%r10
0x00007f80cc3a1455: and $0x7,%r10
0x00007f80cc3a1459: cmp $0x5,%r10
0x00007f80cc3a145d: jne L0003
0x00007f80cc3a145f: mov $0x60040,%r11d ; {metadata('org/sample/Test3')}
0x00007f80cc3a1465: movabs $0x800000000,%r10
0x00007f80cc3a146f: add %r11,%r10
0x00007f80cc3a1472: mov 0xb8(%r10),%r10
0x00007f80cc3a1479: mov %r10,%r11
0x00007f80cc3a147c: or %r15,%r11
0x00007f80cc3a147f: mov %r11,%r8
0x00007f80cc3a1482: xor %rax,%r8
0x00007f80cc3a1485: test $0xffffffffffffff87,%r8
0x00007f80cc3a148c: jne L000d ;*synchronization entry
; - org.sample.Test3::doIt@-1 (line 13)
L0000: incl 0xc(%rbp)
0x00007f80cc3a1495: mov $0x7,%r10d
0x00007f80cc3a149b: and 0x0(%rbp),%r10
0x00007f80cc3a149f: cmp $0x5,%r10
0x00007f80cc3a14a3: jne L0007 ;*return {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.Test3::doIt@10 (line 14)
L0001: add $0x20,%rsp
0x00007f80cc3a14a9: pop %rbp
0x00007f80cc3a14aa: mov 0x108(%r15),%r10
0x00007f80cc3a14b1: test %eax,(%r10) ; {poll_return} *** SAFEPOINT POLL ***
0x00007f80cc3a14b4: ret
L0002: lock cmpxchg %r10,(%rsi)
L0003: lea 0x10(%rsp),%rbx
0x00007f80cc3a14bf: mov (%rsi),%rax
0x00007f80cc3a14c2: test $0x2,%rax
0x00007f80cc3a14c8: jne L0004
0x00007f80cc3a14ca: or $0x1,%rax
0x00007f80cc3a14ce: mov %rax,(%rbx)
0x00007f80cc3a14d1: lock cmpxchg %rbx,(%rsi)
0x00007f80cc3a14d6: je L0005
0x00007f80cc3a14dc: sub %rsp,%rax
0x00007f80cc3a14df: and $0xfffffffffffff007,%rax
0x00007f80cc3a14e6: mov %rax,(%rbx)
0x00007f80cc3a14e9: jmp L0005
L0004: mov %rax,%r10
0x00007f80cc3a14f1: xor %rax,%rax
0x00007f80cc3a14f4: lock cmpxchg %r15,0x7e(%r10)
0x00007f80cc3a14fa: movq $0x3,(%rbx)
L0005: je L0000
L0006: lea 0x10(%rsp),%rdx
0x00007f80cc3a1508: data16 xchg %ax,%ax
0x00007f80cc3a150b: call 0x00007f80c48f5a00 ; ImmutableOopMap{rbp=Oop }
;*synchronization entry
; - org.sample.Test3::doIt@-1 (line 13)
; {runtime_call _complete_monitor_locking_Java}
0x00007f80cc3a1510: jmp L0000
L0007: lea 0x10(%rsp),%rax
0x00007f80cc3a151a: cmpq $0x0,(%rax)
0x00007f80cc3a1521: je L000c
0x00007f80cc3a1527: mov 0x0(%rbp),%r10
0x00007f80cc3a152b: test $0x2,%r10
0x00007f80cc3a1532: je L000b
0x00007f80cc3a1534: xor %rax,%rax
0x00007f80cc3a1537: or 0x8e(%r10),%rax
0x00007f80cc3a153e: jne L000c
0x00007f80cc3a1540: mov 0x9e(%r10),%rax
0x00007f80cc3a1547: or 0x96(%r10),%rax
0x00007f80cc3a154e: jne L0008
0x00007f80cc3a1550: movq $0x0,0x7e(%r10)
0x00007f80cc3a1558: jmp L000c
L0008: cmpq $0x0,0xa6(%r10)
0x00007f80cc3a1565: je L0009
0x00007f80cc3a1567: xor %rax,%rax
0x00007f80cc3a156a: movq $0x0,0x7e(%r10)
0x00007f80cc3a1572: lock addl $0x0,(%rsp)
0x00007f80cc3a1577: cmpq $0x0,0xa6(%r10)
0x00007f80cc3a1582: jne L000a
0x00007f80cc3a1584: lock cmpxchg %r15,0x7e(%r10)
0x00007f80cc3a158a: jne L000a
L0009: or $0x1,%eax
0x00007f80cc3a158f: jmp L000c
L000a: test $0x0,%eax
0x00007f80cc3a1596: jmp L000c
L000b: mov (%rax),%r10
0x00007f80cc3a159b: lock cmpxchg %r10,0x0(%rbp)
L000c: je L0001
0x00007f80cc3a15a7: mov %rbp,%rdi
0x00007f80cc3a15aa: lea 0x10(%rsp),%rsi ;*synchronization entry
; - org.sample.Test3::doIt@-1 (line 13)
0x00007f80cc3a15af: mov %r15,%rdx
0x00007f80cc3a15b2: movabs $0x7f80e35353b0,%r10
0x00007f80cc3a15bc: call *%r10 ;*return {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.Test3::doIt@10 (line 14)
0x00007f80cc3a15bf: jmp L0001
L000d: test $0x7,%r8
0x00007f80cc3a15cb: jne L0002
0x00007f80cc3a15d1: test $0x300,%r8
0x00007f80cc3a15d8: jne L000e
0x00007f80cc3a15da: and $0x37f,%rax
0x00007f80cc3a15e1: mov %rax,%r11
0x00007f80cc3a15e4: or %r15,%r11
L000e: lock cmpxchg %r11,(%rsi)
0x00007f80cc3a15ec: jne L0006
0x00007f80cc3a15f2: jmp L0000
Make things even more efficient
synchronized (buffer)
{
buffer.add(x);
}
foo = bar;
synchronized (buffer)
{
buffer.add(y);
}
synchronized (buffer)
{
buffer.add(x);
foo = bar;
buffer.add(y);
}
synchronized (buffer)
{
buffer.add(x);
}
foo = bar;
synchronized (buffer)
{
buffer.add(x);
foo = bar;
}
Lock free updates... what?
private AtomicInteger atomic = new AtomicInteger(0);
public void do()
{
atomic.getAndIncrement();
}
// under the hood
boolean applied = false;
do
{
int value = shared.get();
applied = shared.compareAndSet(value, value + 1);
}
while (!applied);
We skip this today, because that is a talk on its own
55 min in 30 seconds
Your head is single-threaded!