Concurrency & Multithreading
Production outages from concurrency rarely look exotic—they look like a counter that is almost right, a pool that accepted work until the JVM ran out of memory, or a thread that never noticed shutdown. This chapter goes deep on threads, the Java Memory Model, java.util.concurrent, executors, CompletableFuture, and virtual threads—the material senior interviews and on-call expect.
Thread fundamentals
A thread is a lightweight unit of execution within a process—shared heap, separate stack. The OS schedules many threads onto fewer CPU cores; Java maps platform threads 1:1 to OS threads (until virtual threads).
Thread lifecycle
NEW ──start()──→ RUNNABLE ⇄ (OS scheduler: running on CPU)
│
┌───────────┼───────────┐
▼ ▼ ▼
BLOCKED WAITING TIMED_WAITING
(monitor) (wait/join) (sleep, timed wait)
│ │ │
└───────────┴───────────┘
│
run() completes / uncaught exception
▼
TERMINATED
| State | Typical cause |
|---|---|
NEW | Constructed, not started |
RUNNABLE | Eligible to run (includes actually running or ready) |
BLOCKED | Waiting to enter synchronized monitor |
WAITING | wait(), join(), LockSupport.park() |
TIMED_WAITING | sleep(ms), timed wait, join(timeout) |
TERMINATED | Completed or died with exception |
Creating threads
Subclassing Thread overrides run() on the Thread object itself—works but couples task to thread type. Implementing Runnable (or lambda) and passing to Thread or an executor separates concerns and is the standard style. Callable adds a return value and checked exceptions—always submit to an ExecutorService to obtain a Future.
| Approach | When |
|---|---|
Subclass Thread | Rare—couples task to Thread type; prefer Runnable |
Runnable | void task—no result, no checked exception in signature |
Callable<V> | Returns value, throws checked exceptions—use with ExecutorService |
| Executor / virtual thread factory | Production default—pooling and lifecycle managed |
Thread t = Thread.ofPlatform().name("worker-1").start(() -> {
System.out.println("running on " + Thread.currentThread().getName());
});
Callable<Integer> task = () -> 42;
ExecutorService pool = Executors.newSingleThreadExecutor();
Future<Integer> future = pool.submit(task);
Integer result = future.get(); // blocks until done
pool.shutdown();
Key thread methods
| Method | Purpose |
|---|---|
start() | Begin new thread—calls run() on new stack; never call run() directly for concurrency |
join() / join(ms) | Wait for thread termination |
sleep(ms) | TIMED_WAITING—does not release locks held |
interrupt() | Sets interrupt flag—cooperative cancellation |
isInterrupted() | Check flag without clearing |
Thread.interrupted() | Static—check and clear current thread flag |
void pollUntilDone(Runnable work) throws InterruptedException {
Thread worker = Thread.startVirtualThread(work);
while (worker.isAlive()) {
try {
worker.join(500);
} catch (InterruptedException e) {
worker.interrupt(); // propagate shutdown
throw e;
}
}
}
start() vs run()
Calling start() asks the JVM to schedule a new call stack that enters run(). Calling run() directly on a Thread object executes on the current thread—no parallelism. This mistake appears in tutorials and still shows up in code review.
Cooperative interruption
interrupt() sets a boolean flag on the target thread. It does not forcibly stop execution (deprecated stop() was unsafe). Blocking methods like sleep, wait, join, and BlockingQueue.take respond by throwing InterruptedException and clearing the interrupt flag—catch blocks should usually call Thread.currentThread().interrupt() again to preserve the signal for upstream shutdown logic.
Daemon threads
Daemon threads do not keep the JVM alive—when all user (non-daemon) threads end, the VM exits. Use for background housekeeping (metrics flush), not for work that must complete on shutdown unless you register a Runtime.addShutdownHook that flushes buffers and closes pools. Main business work should run on non-daemon threads (default).
Thread daemon = Thread.ofPlatform().daemon().start(() -> {
while (!Thread.currentThread().isInterrupted()) {
// background tick
}
});
Synchronization
Correctness requires mutual exclusion and visibility—two threads can corrupt data without ever holding the same lock if they only read/write a shared field without happens-before edges.
Race condition — broken counter
class BrokenCounter {
private int count = 0;
void increment() {
count++; // read-modify-write — NOT atomic
}
}
// Two threads each increment 1_000_000 times — result often < 2_000_000
Data race (JMM): two accesses to same variable, at least one write, not ordered by happens-before—undefined behavior for non-volatile non-atomic fields.
synchronized — intrinsic monitor lock
Every object has an associated monitor. One thread holds the lock; others block in BLOCKED.
class SafeCounter {
private int count = 0;
synchronized void increment() { // lock = this
count++;
}
int get() {
synchronized (this) { // block-level — finer control
return count;
}
}
}
// Static synchronized — lock = Class object
static synchronized void global() { }
Method-level locks the whole instance—can reduce parallelism. Block-level lets separate locks for independent fields (dedicated final Object lock = new Object() guards are clearer than locking on this when external code might also synchronize on your instance).
Reentrancy: the same thread may re-enter a synchronized method on the same monitor (recursive calls or synchronized methods calling each other on the same object)—the JVM tracks entry count and only releases when the outermost scope exits.
volatile — visibility, not atomicity
A volatile read or write participates in the JMM: a write to a volatile field happens-before any subsequent read of that field by another thread.
Compilers and CPUs cannot cache volatile reads in registers across potential concurrent writes in ways that break visibility.
It does not make compound operations atomic—count++ is still read-modify-write with a race between threads.
Valid uses: one-shot shutdown flags, double-checked locking when combined with correct synchronization (prefer holder idiom or static init), status fields where only visibility matters. For counters, use atomics or locks.
class Status {
private volatile boolean shutdown = false;
void shutdown() { shutdown = true; } // visible to poller immediately
void loop() {
while (!shutdown) { /* work */ } // OK flag pattern
}
}
volatile int counter;
void bad() { counter++; } // still racy — use AtomicInteger or synchronized
Java Memory Model & liveness failures
The Java Memory Model (JMM) defines when writes by one thread are visible to another. Without happens-before edges, the CPU and compiler may reorder or cache reads/writes in ways that break your mental sequential model—even if every read “looks” like it sees latest memory.
Happens-before rules (selected)
If action A happens-before B, then A’s effects are visible to B and ordered before B for readers that observe the relationship. Key sources:
- Unlock on monitor happens-before subsequent lock on same monitor
volatilewrite happens-before subsequent read of that volatileThread.start()happens-before any action in started thread- Actions in thread happens-before
join()returning java.util.concurrentutilities document their happens-before (e.g.ConcurrentHashMap,CountDownLatch)
Why “just use a lock” is not enough alone
Locks provide both mutual exclusion and release/acquire visibility. A bug pattern is: thread A writes under lock, releases; thread B reads the same field without acquiring the lock—B may see a stale value. Every read that must see published writes needs a matching synchronization edge (same lock, volatile read after volatile write, or concurrent utility with documented ordering).
Deadlock — Coffman conditions (all four required)
Deadlock is a state where a set of threads each wait for a resource held by another in the set—no thread can proceed. All four Coffman conditions must hold simultaneously; breaking any one prevents deadlock.
| Condition | Meaning |
|---|---|
| Mutual exclusion | Resource held exclusively |
| Hold and wait | Hold one lock while waiting for another |
| No preemption | Cannot force release |
| Circular wait | Cycle in wait graph |
Object lockA = new Object();
Object lockB = new Object();
Thread t1 = Thread.startVirtualThread(() -> {
synchronized (lockA) {
synchronized (lockB) { work(); }
}
});
Thread t2 = Thread.startVirtualThread(() -> {
synchronized (lockB) { // opposite order → circular wait possible
synchronized (lockA) { work(); }
}
});
Prevention & detection strategies
- Lock ordering — assign global order to resources; always acquire in ascending ID order
- Try-lock with timeout — back off, log, retry with jitter instead of waiting forever
- Shrink critical sections — fewer nested locks, fewer resources per transaction
- Detection —
jcmd <pid> Thread.print/jstackreports Java-level deadlocks; APM thread pool metrics stuck at max
Livelock & starvation
Livelock — threads are not blocked on locks but keep reacting to each other without forward progress (two people swapping sides in a hallway forever). Fix by introducing backoff, randomness, or passing priority.
Starvation — a thread never gains the lock because others always win (unfair lock, high arrival rate). new ReentrantLock(true) enables FIFO-ish fairness; throughput drops because the JVM cannot reorder lock grants for performance.
Locks, conditions & StampedLock
synchronized is built into every object and is easy to reason about for simple critical sections.
The java.util.concurrent.locks package adds explicit locks with timeouts, interruptible waiting,
fairness, multiple condition queues, and read/write separation—useful when intrinsic monitors are not flexible enough.
synchronized | ReentrantLock |
|---|---|
| Automatic release on exit (even on exception) | Must unlock() in finally |
| No timeouts | tryLock(timeout) |
| Not interruptible while waiting | lockInterruptibly() |
| Single wait set per monitor | Multiple Condition objects per lock |
| Non-fair by default | Optional fair ordering |
ReentrantLock
A reentrant lock: the thread that already holds it may acquire again (recursion, or nested methods on the same object).
An internal hold count tracks depth; each unlock() decrements until the lock is fully released.
Implementation uses CAS + park/unpark when contended—similar goals to synchronized but with a richer API surface.
lock()— blocks until acquired; does not respond to interrupt while waiting (uselockInterruptiblyinstead if you need that)tryLock()— returns immediately with success/failure—good for avoiding deadlock (try both locks, release and retry if fail)tryLock(timeout, unit)— bounded wait; useful for SLA-bound work (“wait at most 200ms for shard lock”)lockInterruptibly()— if interrupted while waiting, throwsInterruptedExceptionand does not acquire- Fair lock —
new ReentrantLock(true)grants in roughly arrival order; reduces starvation, lowers throughput
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
// critical section
} finally {
lock.unlock(); // always in finally
}
if (lock.tryLock(100, TimeUnit.MILLISECONDS)) {
try { /* work */ } finally { lock.unlock(); }
}
lock.lockInterruptibly(); // responds to interrupt while waiting
Always pair lock() with try/finally/unlock. Never call lock() twice in one path without two unlocks—easy to break with early returns. For virtual-thread-heavy code, profile for pinning: long synchronized or native blocks on carrier; consider ReentrantLock in hot paths if pinning appears.
ReadWriteLock
Splits access into a read lock (shared, many threads) and a write lock (exclusive).
ReentrantReadWriteLock is ideal when reads dominate (configuration caches, feature flags, in-memory indexes)
and writes are rare. Readers block only when a writer holds the write lock; writers block all readers and other writers.
Downgrade is possible in advanced patterns (hold write lock, release to read lock) but easy to misuse—most apps stick to simple lock/unlock per method. Stale reads are still possible if you return mutable objects without copying—locking does not replace defensive copies.
ReadWriteLock rw = new ReentrantReadWriteLock();
Map<String, Config> cache = new HashMap<>();
Config get(String key) {
rw.readLock().lock();
try {
return cache.get(key);
} finally {
rw.readLock().unlock();
}
}
StampedLock Java 8+ — optimistic reads
StampedLock is not a replacement for every ReadWriteLock—it trades API complexity for throughput on read-heavy workloads.
Three modes: write lock, read lock, and optimistic read (no lock acquired).
tryOptimisticRead()returns a stamp (version token)- Read data without holding a lock
validate(stamp)— if no write happened since step 1, read was valid- If validation fails, acquire read lock (or write lock) and retry
Optimistic paths win when writes are very rare; under heavy write contention, validation fails often and performance can fall below a plain read lock. StampedLock is not reentrant and does not support Condition—design carefully.
StampedLock sl = new StampedLock();
double x;
long stamp = sl.tryOptimisticRead();
x = this.x; // read fields
if (!sl.validate(stamp)) {
stamp = sl.readLock();
try {
x = this.x;
} finally {
sl.unlockRead(stamp);
}
}
Condition — wait/notify replacement
Object.wait/notify ties one wait set to the intrinsic monitor. Condition (from ReentrantLock.newCondition())
gives multiple wait sets per lock—e.g. “not empty” and “not full” on the same bounded buffer.
await() atomically releases the lock and parks; signal() wakes one waiter; signalAll() wakes all.
Always use await in a while (predicate) { await(); } loop, not if—spurious wakeups and barging require re-checking the condition.
ReentrantLock lock = new ReentrantLock();
Condition notEmpty = lock.newCondition();
Queue<Task> queue = new ArrayDeque<>();
void put(Task t) {
lock.lock();
try {
queue.add(t);
notEmpty.signal(); // or signalAll()
} finally {
lock.unlock();
}
}
Task take() throws InterruptedException {
lock.lock();
try {
while (queue.isEmpty()) {
notEmpty.await(); // releases lock, waits for signal
}
return queue.remove();
} finally {
lock.unlock();
}
}
Coordination utilities
These classes encode common “wait for N things” and “only M at a time” patterns with correct memory visibility—
better than hand-rolled wait/notify on a private lock. They live in java.util.concurrent
and document happens-before between their methods and your threads.
Semaphore
Maintains a set of permits. acquire() blocks until a permit is available; release() returns one. A binary semaphore (1 permit) behaves like a mutex, but semaphores are not reentrant and can be released by a thread other than the acquirer (unlike locks)—usually you still release in the same thread that acquired.
Use case: cap concurrent outbound HTTP calls to 50 so you do not overwhelm a partner API; cap DB sessions to match pool size when work is submitted from many platform threads.
CountDownLatch
One-shot counter initialized to N. Threads call countDown() to decrement; await() blocks until the count hits zero. The latch cannot be reset—create a new instance for another batch.
Use case: integration test waits for embedded Kafka + DB + app to bind ports; ETL driver waits for N worker shards to finish before merging results.
CyclicBarrier
Reusable barrier: N parties await(); when the Nth arrives, optional barrier action runs and all parties proceed. If a party times out or is interrupted, the barrier breaks and others get BrokenBarrierException.
Use case: parallel matrix multiply phases—each row worker finishes step k, barriers, then starts step k+1; game simulation tick where all agents finish “think” before “act”.
Phaser
Flexible phases: parties can register/deregister dynamically; arriveAndAwaitAdvance() moves to next phase when all registered parties arrive. Supports hierarchical phasers for tree-structured parallelism.
Use case: variable number of forked subtasks per stage in a pipeline where worker count changes between phases.
Exchanger
Two-thread rendezvous: each exchange(v) hands its object to the other. Blocks until the partner arrives.
Use case: double-buffered logging—producer fills buffer A while consumer drains B, then they swap references at the exchanger.
Semaphore dbSlots = new Semaphore(20);
void query() throws InterruptedException {
dbSlots.acquire();
try { runQuery(); } finally { dbSlots.release(); }
}
CountDownLatch ready = new CountDownLatch(3);
// each service calls ready.countDown() when up
ready.await(); // main continues when 3 reached zero
CyclicBarrier barrier = new CyclicBarrier(4, () -> mergePartialResults());
// 4 worker threads call barrier.await() per phase
Atomic classes & CAS
When contention is low and you only need a single variable updated atomically, locks are heavier than necessary.
java.util.concurrent.atomic exposes CAS-based types the JVM can implement with a few CPU instructions—
foundation of lock-free counters, concurrent queue nodes, and ConcurrentHashMap bin updates.
Compare-And-Swap (CAS)
Hardware primitive: read value V; if V == expected, write newValue; else fail. Java loops retry on failure (compare-and-swap spin). No mutex, but under high contention many threads spin—CPU burn. Progress guarantees differ from lock-based “lock-free” vs “wait-free” definitions used in textbooks.
ABA problem: value changes A→B→A; CAS thinks nothing happened. Rare for integers; addressed in AtomicStampedReference with version stamps.
| Class | Typical use |
|---|---|
AtomicInteger / AtomicLong | Counters, sequence IDs, stats |
AtomicReference<V> | Atomic swap of immutable config snapshot |
AtomicIntegerArray | Per-slot atomics without boxing |
LongAdder / LongAccumulator | High-write metrics—striped cells, sum at end |
AtomicReferenceFieldUpdater | CAS on volatile field inside node—less allocation |
AtomicInteger counter = new AtomicInteger();
counter.incrementAndGet();
counter.compareAndSet(old, old + 1);
AtomicReference<State> ref = new AtomicReference<>(State.INIT);
ref.compareAndSet(State.INIT, State.RUNNING);
// High-contention metrics — prefer LongAdder over AtomicLong
LongAdder requests = new LongAdder();
requests.increment();
long total = requests.sum();
LongAdder maintains per-thread cells that accumulate under contention; sum() merges them—much faster than hammering one AtomicLong on a hot metrics path. Trade-off: sum() is not a linearizable snapshot under concurrent updates unless updates quiesce.
AtomicReferenceFieldUpdater — reflection-based CAS on a named volatile field inside a class—used in ConcurrentLinkedQueue-style node structures to avoid an extra AtomicReference object per node.
HotSpot lowers CAS to CPU instructions (cmpxchg on x86). VarHandle (Java 9+) generalizes atomic access with fences—foundation for future APIs and off-heap work.
Executor framework
Prefer submitting tasks to an ExecutorService over raw new Thread—centralized queuing,
reuse, naming, metrics, and shutdown policies. The framework separates task definition (Runnable/Callable)
from execution policy (how many threads, which queue, what to do when saturated).
Runnable vs Callable vs Future
| Type | Returns | Exceptions |
|---|---|---|
Runnable | void | Unchecked only (in signature) |
Callable<V> | V | May throw checked exceptions—propagate to Future.get() |
Future<V> | Handle to async result | get() blocks; wraps execution exception |
submit(Callable) returns Future; execute(Runnable) is fire-and-forget. Cancel with future.cancel(true)—interrupts running task if started; may not stop non-interruptible blocking I/O immediately.
Graceful shutdown
shutdown() stops accepting new tasks; previously submitted work continues. awaitTermination waits for completion. shutdownNow() attempts to cancel queued tasks and interrupt workers—use on timeout during deploy. Do not let pools leak on redeploy in long-lived containers.
ExecutorService pool = Executors.newFixedThreadPool(8);
Future<Report> future = pool.submit(() -> generateReport());
Report r = future.get(30, TimeUnit.SECONDS);
pool.shutdown();
if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
pool.shutdownNow();
}
ThreadPoolExecutor internals
ThreadPoolExecutor is the workhorse implementation behind most Executors factory methods.
Understanding the interaction of core size, max size, queue type, and rejection policy is how you prevent
“the service accepted traffic until it died” incidents.
submit(task)
│
├─ running < corePoolSize → create thread, run
├─ else enqueue if queue not full
├─ else if running < maxPoolSize → create extra thread
└─ else RejectedExecutionHandler (AbortPolicy, CallerRuns, Discard, custom)
Idle threads > core → expire after keepAliveTime (if allowCoreThreadTimeOut false, core stay)
| Parameter | Role |
|---|---|
corePoolSize | Baseline threads kept alive |
maximumPoolSize | Cap when queue is full (bounded queue) |
keepAliveTime | Idle non-core thread TTL |
workQueue | LinkedBlockingQueue, ArrayBlockingQueue, SynchronousQueue |
RejectedExecutionHandler | Policy when saturated |
Work queue choice
| Queue | Behavior |
|---|---|
LinkedBlockingQueue (unbounded) | Tasks never rejected until OOM—latency grows under load |
ArrayBlockingQueue (bounded) | Backpressure when full—pairs with max pool + rejection policy |
SynchronousQueue | Zero capacity—handoff to thread or new thread; used by cached pool |
RejectedExecutionHandler policies
- AbortPolicy (default) — throw
RejectedExecutionException—fail fast, caller handles - CallerRunsPolicy — run task on submitting thread—slows producers, natural backpressure
- DiscardPolicy — silent drop—only for non-critical telemetry
- DiscardOldestPolicy — drop head of queue—rare, can lose oldest work unpredictably
Executors factories — production cautions
Factory methods hide the queue and bounds. Treat them as shortcuts for prototypes, not production defaults without reading the JDK source behavior.
| Factory | Risk |
|---|---|
newFixedThreadPool | Unbounded LinkedBlockingQueue—tasks pile up, memory pressure |
newCachedThreadPool | Unbounded threads + SynchronousQueue—under spike creates thousands of threads → OOM, context switch storm |
newSingleThreadExecutor | Serialized execution—OK for ordered jobs; unbounded queue |
newScheduledThreadPool | Timers/cron-like tasks—size the pool for overlap |
ThreadPoolExecutor pool = new ThreadPoolExecutor(
4, 16,
60, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(1000),
Thread.ofPlatform().name("api-", 0).factory(),
new ThreadPoolExecutor.CallerRunsPolicy()); // backpressure on caller
Never use Executors.newCachedThreadPool() for bursty server request handling without limits. Define explicit pool sizes, bounded queues, and rejection metrics—Spring Boot exposes task.execution pool tuning.
CompletableFuture Java 8+
Composable async pipelines without nested Future.get() blocking.
A CompletableFuture<T> is both a Future and a completion stage you can chain.
Default async stages use ForkJoinPool.commonPool()—fine for CPU-light transforms; dangerous for blocking JDBC/HTTP unless you pass an explicit executor.
Creating and transforming
| Method | Role |
|---|---|
supplyAsync(Supplier) | Start async computation with result |
runAsync(Runnable) | Async void task |
thenApply | Map result T → U (sync on completion thread unless *Async) |
thenCompose | Flat-map when next step returns another CF (avoid nested CF) |
thenCombine | Merge two CFs when both complete |
allOf / anyOf | Wait for all or first completion |
Error handling stages
exceptionally(ex -> fallback) runs only if the stage completed exceptionally—like catch. handle((result, ex) -> ...) sees both success and failure in one function. whenComplete is for side effects (logging) without changing the value—exceptions from the consumer can still complete the next stage exceptionally.
Unchecked exceptions in supplyAsync complete the CF exceptionally; checked exceptions cannot be thrown from Supplier without wrapping.
ExecutorService ioPool = Executors.newFixedThreadPool(32);
CompletableFuture<User> userFuture = CompletableFuture
.supplyAsync(() -> http.fetchUser(id), ioPool);
CompletableFuture<Orders> ordersFuture = userFuture.thenCompose(u ->
CompletableFuture.supplyAsync(() -> http.fetchOrders(u.id()), ioPool));
CompletableFuture<String> summary = userFuture.thenCombine(ordersFuture,
(u, o) -> u.name() + " has " + o.count() + " orders");
// Error handling
CompletableFuture<User> safe = userFuture
.exceptionally(ex -> User.guest())
.handle((u, ex) -> ex == null ? u : User.guest());
userFuture.whenComplete((u, ex) -> {
if (ex != null) log.error("failed", ex);
});
CompletableFuture<Void> all = CompletableFuture.allOf(userFuture, ordersFuture);
CompletableFuture<Object> first = CompletableFuture.anyOf(userFuture, ordersFuture);
Blocking at the edge
Inside service code, chain freely. At the boundary (HTTP controller, main), use cf.get(timeout), join(), or convert to reactive type with explicit timeout. join() on failed CF throws unchecked CompletionException wrapping the cause—unwrap getCause() for the real failure.
Chaining patterns & anti-patterns
| Good | Avoid |
|---|---|
thenCompose when next step returns CF (flatMap) | thenApply returning CF → nested CompletableFuture<CompletableFuture<T>>> |
| Dedicated executor for blocking JDBC/HTTP | Blocking call inside supplyAsync on common pool |
orTimeout / completeOnTimeout 9+ | Infinite get() without timeout |
One exceptionally at pipeline end | Log in every stage and rethrow |
Microservice orchestration: fetch user, then orders, then recommendations with thenCompose and a shared bounded IO executor. Set client timeouts with orTimeout on each stage. Aggregate with allOf for parallel fan-out to downstream APIs.
ForkJoinPool & divide-and-conquer
ForkJoinPool implements work-stealing: each worker thread maintains a deque of tasks.
When idle, a worker steals from the tail of another worker’s deque—balances CPU-bound fork/join workloads
without a central queue bottleneck. Parallel streams default to the common pool (ForkJoinPool.commonPool()).
Fork / join / compute
- fork() — schedule subtask asynchronously on the pool
- compute() — run subtask on current thread (or join stolen work)
- join() — wait for forked subtask result
Rule of thumb: fork only when the split is large enough to amortize scheduling overhead—often thousands of elements for array work. RecursiveTask<V> returns a value; RecursiveAction is void (side-effect only if you manage merging carefully).
Do not block inside ForkJoin tasks on I/O—blocks a worker and reduces steal efficiency; use an Executor for I/O instead.
class SumTask extends RecursiveTask<Long> {
private final int[] arr;
private final int lo, hi;
@Override
protected Long compute() {
if (hi - lo <= 10_000) {
long sum = 0;
for (int i = lo; i < hi; i++) sum += arr[i];
return sum;
}
int mid = (lo + hi) >>> 1;
SumTask left = new SumTask(arr, lo, mid);
SumTask right = new SumTask(arr, mid, hi);
left.fork();
long rightVal = right.compute();
return left.join() + rightVal;
}
}
ForkJoinPool pool = new ForkJoinPool();
long total = pool.invoke(new SumTask(array, 0, array.length));
pool.shutdown();
Virtual threads Java 21 — Project Loom
Platform threads = OS threads (expensive, ~MB stack). Virtual threads = JVM-managed, cheap to create millions—mounted on a small pool of carrier platform threads. When virtual thread blocks on I/O, it unmounts so the carrier runs another virtual thread.
Carrier platform threads (few)
│
├─ mount virtual thread A → JDBC read blocks → unmount A
├─ mount virtual thread B → runs
└─ mount virtual thread C → ...
Millions of virtual threads OK when most are blocked on I/O, not CPU-spinning
Thread v = Thread.ofVirtual().name("req-42").start(() -> {
String body = httpClient.sendBlocking(request); // blocks virtual thread cheaply
});
try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
IntStream.range(0, 10_000).forEach(i ->
exec.submit(() -> handleRequest(i)));
}
Pinning virtual threads
When a virtual thread blocks in native code or on a monitor while holding a carrier, it may pin the carrier thread—reducing scalability. JDK flight recorder event jdk.VirtualThreadPinned helps diagnose. Mitigations: reduce synchronized in hot paths, prefer ReentrantLock, update drivers/libraries that block carriers, avoid long CPU loops on virtual threads.
Structured concurrency Java 21+
StructuredTaskScope models child tasks as nested within a parent scope—if the parent ends (success, failure, or cancel),
children are cancelled together. This fixes “fire off async tasks and lose track” bugs in imperative code.
Shutdown policies like ShutdownOnFailure fail fast when any child fails; ShutdownOnSuccess supports racing alternatives.
API stabilized across JDK 21–23—check your target JDK javadoc for exact class names and preview flags.
// Conceptual shape — API evolved across JDK versions; check your JDK javadoc
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Subtask<User> user = scope.fork(() -> fetchUser(id));
Subtask<Orders> orders = scope.fork(() -> fetchOrders(id));
scope.join();
scope.throwIfFailed();
return combine(user.get(), orders.get());
}
When virtual threads shine vs when they do not
| Good fit | Poor fit |
|---|---|
| I/O-bound: HTTP, JDBC, messaging | CPU-bound compute (use platform pool or ForkJoin) |
| High fan-out request-per-thread style | Heavy synchronized on hot paths (pins carrier—"pinning") |
| Simpler than reactive for many teams | Native code / some blocking in libraries being updated |
Spring Boot 3.2+ & reactive vs imperative
Spring Boot 3.2+ can enable virtual threads for request handling (spring.threads.virtual.enabled=true).
Embedded servers dispatch each request on a virtual thread; blocking JDBC, JPA, and RestClient calls release the carrier while waiting on the network or DB—
so thread count is no longer the limiting factor for I/O concurrency in many apps.
Limits remain: connection pool size, downstream rate limits, heap, and CPU for serialization still cap throughput. WebFlux/reactive stacks still win when you need end-to-end reactive streams, backpressure, and non-blocking drivers throughout. Virtual threads are the pragmatic middle path: keep imperative code, gain I/O concurrency—without claiming they help CPU-bound thread pools.
Contrast: platform thread pool size ≈ CPU cores for CPU work; virtual threads ≈ number of concurrent blocking operations. Explain pinning, why synchronized can hurt virtual threads (use ReentrantLock in hot paths when profiling shows pinning). Draw happens-before for volatile + lock.
Instrument pool queue depth, rejection counts, and thread dump on deadlock. Use timeouts on all external calls inside async chains. For Kubernetes, align pool sizes with CPU limits; virtual threads reduce thread count metrics but JDBC pool size still caps DB concurrency.