Concurrency & Multithreading

Thread fundamentals

A thread is a lightweight unit of execution within a process—shared heap, separate stack. The OS schedules many threads onto fewer CPU cores; Java maps platform threads 1:1 to OS threads (until virtual threads).

Thread lifecycle

NEW ──start()──→ RUNNABLE ⇄ (OS scheduler: running on CPU)
                    │
        ┌───────────┼───────────┐
        ▼           ▼           ▼
    BLOCKED     WAITING    TIMED_WAITING
    (monitor)   (wait/join) (sleep, timed wait)
        │           │           │
        └───────────┴───────────┘
                    │
              run() completes / uncaught exception
                    ▼
               TERMINATED

State	Typical cause
`NEW`	Constructed, not started
`RUNNABLE`	Eligible to run (includes actually running or ready)
`BLOCKED`	Waiting to enter `synchronized` monitor
`WAITING`	`wait()`, `join()`, `LockSupport.park()`
`TIMED_WAITING`	`sleep(ms)`, timed `wait`, `join(timeout)`
`TERMINATED`	Completed or died with exception

Creating threads

Subclassing Thread overrides run() on the Thread object itself—works but couples task to thread type. Implementing Runnable (or lambda) and passing to Thread or an executor separates concerns and is the standard style. Callable adds a return value and checked exceptions—always submit to an ExecutorService to obtain a Future.

Approach	When
Subclass `Thread`	Rare—couples task to Thread type; prefer Runnable
`Runnable`	void task—no result, no checked exception in signature
`Callable<V>`	Returns value, throws checked exceptions—use with ExecutorService
Executor / virtual thread factory	Production default—pooling and lifecycle managed

Thread t = Thread.ofPlatform().name("worker-1").start(() -> {
    System.out.println("running on " + Thread.currentThread().getName());
});

Callable<Integer> task = () -> 42;
ExecutorService pool = Executors.newSingleThreadExecutor();
Future<Integer> future = pool.submit(task);
Integer result = future.get();  // blocks until done
pool.shutdown();

Key thread methods

Method	Purpose
`start()`	Begin new thread—calls `run()` on new stack; never call `run()` directly for concurrency
`join()` / `join(ms)`	Wait for thread termination
`sleep(ms)`	TIMED_WAITING—does not release locks held
`interrupt()`	Sets interrupt flag—cooperative cancellation
`isInterrupted()`	Check flag without clearing
`Thread.interrupted()`	Static—check and clear current thread flag

void pollUntilDone(Runnable work) throws InterruptedException {
    Thread worker = Thread.startVirtualThread(work);
    while (worker.isAlive()) {
        try {
            worker.join(500);
        } catch (InterruptedException e) {
            worker.interrupt();  // propagate shutdown
            throw e;
        }
    }
}

`start()` vs `run()`

Calling start() asks the JVM to schedule a new call stack that enters run(). Calling run() directly on a Thread object executes on the current thread—no parallelism. This mistake appears in tutorials and still shows up in code review.

Cooperative interruption

interrupt() sets a boolean flag on the target thread. It does not forcibly stop execution (deprecated stop() was unsafe). Blocking methods like sleep, wait, join, and BlockingQueue.take respond by throwing InterruptedException and clearing the interrupt flag—catch blocks should usually call Thread.currentThread().interrupt() again to preserve the signal for upstream shutdown logic.

Daemon threads

Daemon threads do not keep the JVM alive—when all user (non-daemon) threads end, the VM exits. Use for background housekeeping (metrics flush), not for work that must complete on shutdown unless you register a Runtime.addShutdownHook that flushes buffers and closes pools. Main business work should run on non-daemon threads (default).

Thread daemon = Thread.ofPlatform().daemon().start(() -> {
    while (!Thread.currentThread().isInterrupted()) {
        // background tick
    }
});

Synchronization

Correctness requires mutual exclusion and visibility—two threads can corrupt data without ever holding the same lock if they only read/write a shared field without happens-before edges.

Race condition — broken counter

class BrokenCounter {
    private int count = 0;

    void increment() {
        count++;  // read-modify-write — NOT atomic
    }
}

// Two threads each increment 1_000_000 times — result often < 2_000_000

Data race (JMM): two accesses to same variable, at least one write, not ordered by happens-before—undefined behavior for non-volatile non-atomic fields.

`synchronized` — intrinsic monitor lock

Every object has an associated monitor. One thread holds the lock; others block in BLOCKED.

class SafeCounter {
    private int count = 0;

    synchronized void increment() {  // lock = this
        count++;
    }

    int get() {
        synchronized (this) {        // block-level — finer control
            return count;
        }
    }
}

// Static synchronized — lock = Class object
static synchronized void global() { }

Method-level locks the whole instance—can reduce parallelism. Block-level lets separate locks for independent fields (dedicated final Object lock = new Object() guards are clearer than locking on this when external code might also synchronize on your instance).

Reentrancy: the same thread may re-enter a synchronized method on the same monitor (recursive calls or synchronized methods calling each other on the same object)—the JVM tracks entry count and only releases when the outermost scope exits.

`volatile` — visibility, not atomicity

A volatile read or write participates in the JMM: a write to a volatile field happens-before any subsequent read of that field by another thread. Compilers and CPUs cannot cache volatile reads in registers across potential concurrent writes in ways that break visibility. It does not make compound operations atomic—count++ is still read-modify-write with a race between threads.

Valid uses: one-shot shutdown flags, double-checked locking when combined with correct synchronization (prefer holder idiom or static init), status fields where only visibility matters. For counters, use atomics or locks.

class Status {
    private volatile boolean shutdown = false;

    void shutdown() { shutdown = true; }  // visible to poller immediately

    void loop() {
        while (!shutdown) { /* work */ }   // OK flag pattern
    }
}

volatile int counter;
void bad() { counter++; }  // still racy — use AtomicInteger or synchronized

Java Memory Model & liveness failures

The Java Memory Model (JMM) defines when writes by one thread are visible to another. Without happens-before edges, the CPU and compiler may reorder or cache reads/writes in ways that break your mental sequential model—even if every read “looks” like it sees latest memory.

Happens-before rules (selected)

If action A happens-before B, then A’s effects are visible to B and ordered before B for readers that observe the relationship. Key sources:

Unlock on monitor happens-before subsequent lock on same monitor
volatile write happens-before subsequent read of that volatile
Thread.start() happens-before any action in started thread
Actions in thread happens-before join() returning
java.util.concurrent utilities document their happens-before (e.g. ConcurrentHashMap, CountDownLatch)

Why “just use a lock” is not enough alone

Locks provide both mutual exclusion and release/acquire visibility. A bug pattern is: thread A writes under lock, releases; thread B reads the same field without acquiring the lock—B may see a stale value. Every read that must see published writes needs a matching synchronization edge (same lock, volatile read after volatile write, or concurrent utility with documented ordering).

Deadlock — Coffman conditions (all four required)

Deadlock is a state where a set of threads each wait for a resource held by another in the set—no thread can proceed. All four Coffman conditions must hold simultaneously; breaking any one prevents deadlock.

Condition	Meaning
Mutual exclusion	Resource held exclusively
Hold and wait	Hold one lock while waiting for another
No preemption	Cannot force release
Circular wait	Cycle in wait graph

Object lockA = new Object();
Object lockB = new Object();

Thread t1 = Thread.startVirtualThread(() -> {
    synchronized (lockA) {
        synchronized (lockB) { work(); }
    }
});
Thread t2 = Thread.startVirtualThread(() -> {
    synchronized (lockB) {  // opposite order → circular wait possible
        synchronized (lockA) { work(); }
    }
});

Prevention & detection strategies

Lock ordering — assign global order to resources; always acquire in ascending ID order
Try-lock with timeout — back off, log, retry with jitter instead of waiting forever
Shrink critical sections — fewer nested locks, fewer resources per transaction
Detection — jcmd <pid> Thread.print / jstack reports Java-level deadlocks; APM thread pool metrics stuck at max

Livelock & starvation

Livelock — threads are not blocked on locks but keep reacting to each other without forward progress (two people swapping sides in a hallway forever). Fix by introducing backoff, randomness, or passing priority.

Starvation — a thread never gains the lock because others always win (unfair lock, high arrival rate). new ReentrantLock(true) enables FIFO-ish fairness; throughput drops because the JVM cannot reorder lock grants for performance.

Locks, conditions & StampedLock

synchronized is built into every object and is easy to reason about for simple critical sections. The java.util.concurrent.locks package adds explicit locks with timeouts, interruptible waiting, fairness, multiple condition queues, and read/write separation—useful when intrinsic monitors are not flexible enough.

`synchronized`	`ReentrantLock`
Automatic release on exit (even on exception)	Must `unlock()` in `finally`
No timeouts	`tryLock(timeout)`
Not interruptible while waiting	`lockInterruptibly()`
Single wait set per monitor	Multiple `Condition` objects per lock
Non-fair by default	Optional fair ordering

`ReentrantLock`

A reentrant lock: the thread that already holds it may acquire again (recursion, or nested methods on the same object). An internal hold count tracks depth; each unlock() decrements until the lock is fully released. Implementation uses CAS + park/unpark when contended—similar goals to synchronized but with a richer API surface.

lock() — blocks until acquired; does not respond to interrupt while waiting (use lockInterruptibly instead if you need that)
tryLock() — returns immediately with success/failure—good for avoiding deadlock (try both locks, release and retry if fail)
tryLock(timeout, unit) — bounded wait; useful for SLA-bound work (“wait at most 200ms for shard lock”)
lockInterruptibly() — if interrupted while waiting, throws InterruptedException and does not acquire
Fair lock — new ReentrantLock(true) grants in roughly arrival order; reduces starvation, lowers throughput

ReentrantLock lock = new ReentrantLock();

lock.lock();
try {
    // critical section
} finally {
    lock.unlock();  // always in finally
}

if (lock.tryLock(100, TimeUnit.MILLISECONDS)) {
    try { /* work */ } finally { lock.unlock(); }
}

lock.lockInterruptibly();  // responds to interrupt while waiting

💡 Pro Tip

Always pair lock() with try/finally/unlock. Never call lock() twice in one path without two unlocks—easy to break with early returns. For virtual-thread-heavy code, profile for pinning: long synchronized or native blocks on carrier; consider ReentrantLock in hot paths if pinning appears.

`ReadWriteLock`

Splits access into a read lock (shared, many threads) and a write lock (exclusive). ReentrantReadWriteLock is ideal when reads dominate (configuration caches, feature flags, in-memory indexes) and writes are rare. Readers block only when a writer holds the write lock; writers block all readers and other writers.

Downgrade is possible in advanced patterns (hold write lock, release to read lock) but easy to misuse—most apps stick to simple lock/unlock per method. Stale reads are still possible if you return mutable objects without copying—locking does not replace defensive copies.

ReadWriteLock rw = new ReentrantReadWriteLock();
Map<String, Config> cache = new HashMap<>();

Config get(String key) {
    rw.readLock().lock();
    try {
        return cache.get(key);
    } finally {
        rw.readLock().unlock();
    }
}

`StampedLock` Java 8+ — optimistic reads

StampedLock is not a replacement for every ReadWriteLock—it trades API complexity for throughput on read-heavy workloads. Three modes: write lock, read lock, and optimistic read (no lock acquired).

tryOptimisticRead() returns a stamp (version token)
Read data without holding a lock
validate(stamp) — if no write happened since step 1, read was valid
If validation fails, acquire read lock (or write lock) and retry

Optimistic paths win when writes are very rare; under heavy write contention, validation fails often and performance can fall below a plain read lock. StampedLock is not reentrant and does not support Condition—design carefully.

StampedLock sl = new StampedLock();
double x;

long stamp = sl.tryOptimisticRead();
x = this.x;  // read fields
if (!sl.validate(stamp)) {
    stamp = sl.readLock();
    try {
        x = this.x;
    } finally {
        sl.unlockRead(stamp);
    }
}

`Condition` — wait/notify replacement

Object.wait/notify ties one wait set to the intrinsic monitor. Condition (from ReentrantLock.newCondition()) gives multiple wait sets per lock—e.g. “not empty” and “not full” on the same bounded buffer. await() atomically releases the lock and parks; signal() wakes one waiter; signalAll() wakes all. Always use await in a while (predicate) { await(); } loop, not if—spurious wakeups and barging require re-checking the condition.

ReentrantLock lock = new ReentrantLock();
Condition notEmpty = lock.newCondition();
Queue<Task> queue = new ArrayDeque<>();

void put(Task t) {
    lock.lock();
    try {
        queue.add(t);
        notEmpty.signal();  // or signalAll()
    } finally {
        lock.unlock();
    }
}

Task take() throws InterruptedException {
    lock.lock();
    try {
        while (queue.isEmpty()) {
            notEmpty.await();  // releases lock, waits for signal
        }
        return queue.remove();
    } finally {
        lock.unlock();
    }
}

Coordination utilities

These classes encode common “wait for N things” and “only M at a time” patterns with correct memory visibility— better than hand-rolled wait/notify on a private lock. They live in java.util.concurrent and document happens-before between their methods and your threads.

`Semaphore`

Maintains a set of permits. acquire() blocks until a permit is available; release() returns one. A binary semaphore (1 permit) behaves like a mutex, but semaphores are not reentrant and can be released by a thread other than the acquirer (unlike locks)—usually you still release in the same thread that acquired.

Use case: cap concurrent outbound HTTP calls to 50 so you do not overwhelm a partner API; cap DB sessions to match pool size when work is submitted from many platform threads.

`CountDownLatch`

One-shot counter initialized to N. Threads call countDown() to decrement; await() blocks until the count hits zero. The latch cannot be reset—create a new instance for another batch.

Use case: integration test waits for embedded Kafka + DB + app to bind ports; ETL driver waits for N worker shards to finish before merging results.

`CyclicBarrier`

Reusable barrier: N parties await(); when the Nth arrives, optional barrier action runs and all parties proceed. If a party times out or is interrupted, the barrier breaks and others get BrokenBarrierException.

Use case: parallel matrix multiply phases—each row worker finishes step k, barriers, then starts step k+1; game simulation tick where all agents finish “think” before “act”.

`Phaser`

Flexible phases: parties can register/deregister dynamically; arriveAndAwaitAdvance() moves to next phase when all registered parties arrive. Supports hierarchical phasers for tree-structured parallelism.

Use case: variable number of forked subtasks per stage in a pipeline where worker count changes between phases.

`Exchanger`

Two-thread rendezvous: each exchange(v) hands its object to the other. Blocks until the partner arrives.

Use case: double-buffered logging—producer fills buffer A while consumer drains B, then they swap references at the exchanger.

Semaphore dbSlots = new Semaphore(20);
void query() throws InterruptedException {
    dbSlots.acquire();
    try { runQuery(); } finally { dbSlots.release(); }
}

CountDownLatch ready = new CountDownLatch(3);
// each service calls ready.countDown() when up
ready.await();  // main continues when 3 reached zero

CyclicBarrier barrier = new CyclicBarrier(4, () -> mergePartialResults());
// 4 worker threads call barrier.await() per phase

Atomic classes & CAS

When contention is low and you only need a single variable updated atomically, locks are heavier than necessary. java.util.concurrent.atomic exposes CAS-based types the JVM can implement with a few CPU instructions— foundation of lock-free counters, concurrent queue nodes, and ConcurrentHashMap bin updates.

Compare-And-Swap (CAS)

Hardware primitive: read value V; if V == expected, write newValue; else fail. Java loops retry on failure (compare-and-swap spin). No mutex, but under high contention many threads spin—CPU burn. Progress guarantees differ from lock-based “lock-free” vs “wait-free” definitions used in textbooks.

ABA problem: value changes A→B→A; CAS thinks nothing happened. Rare for integers; addressed in AtomicStampedReference with version stamps.

Class	Typical use
`AtomicInteger` / `AtomicLong`	Counters, sequence IDs, stats
`AtomicReference<V>`	Atomic swap of immutable config snapshot
`AtomicIntegerArray`	Per-slot atomics without boxing
`LongAdder` / `LongAccumulator`	High-write metrics—striped cells, sum at end
`AtomicReferenceFieldUpdater`	CAS on volatile field inside node—less allocation

AtomicInteger counter = new AtomicInteger();
counter.incrementAndGet();
counter.compareAndSet(old, old + 1);

AtomicReference<State> ref = new AtomicReference<>(State.INIT);
ref.compareAndSet(State.INIT, State.RUNNING);

// High-contention metrics — prefer LongAdder over AtomicLong
LongAdder requests = new LongAdder();
requests.increment();
long total = requests.sum();

LongAdder maintains per-thread cells that accumulate under contention; sum() merges them—much faster than hammering one AtomicLong on a hot metrics path. Trade-off: sum() is not a linearizable snapshot under concurrent updates unless updates quiesce.

AtomicReferenceFieldUpdater — reflection-based CAS on a named volatile field inside a class—used in ConcurrentLinkedQueue-style node structures to avoid an extra AtomicReference object per node.

🔬 Under the Hood

HotSpot lowers CAS to CPU instructions (cmpxchg on x86). VarHandle (Java 9+) generalizes atomic access with fences—foundation for future APIs and off-heap work.

Executor framework

Prefer submitting tasks to an ExecutorService over raw new Thread—centralized queuing, reuse, naming, metrics, and shutdown policies. The framework separates task definition (Runnable/Callable) from execution policy (how many threads, which queue, what to do when saturated).

Runnable vs Callable vs Future

Type	Returns	Exceptions
`Runnable`	void	Unchecked only (in signature)
`Callable<V>`	`V`	May throw checked exceptions—propagate to `Future.get()`
`Future<V>`	Handle to async result	`get()` blocks; wraps execution exception

submit(Callable) returns Future; execute(Runnable) is fire-and-forget. Cancel with future.cancel(true)—interrupts running task if started; may not stop non-interruptible blocking I/O immediately.

Graceful shutdown

shutdown() stops accepting new tasks; previously submitted work continues. awaitTermination waits for completion. shutdownNow() attempts to cancel queued tasks and interrupt workers—use on timeout during deploy. Do not let pools leak on redeploy in long-lived containers.

ExecutorService pool = Executors.newFixedThreadPool(8);

Future<Report> future = pool.submit(() -> generateReport());
Report r = future.get(30, TimeUnit.SECONDS);

pool.shutdown();
if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
    pool.shutdownNow();
}

ThreadPoolExecutor internals

ThreadPoolExecutor is the workhorse implementation behind most Executors factory methods. Understanding the interaction of core size, max size, queue type, and rejection policy is how you prevent “the service accepted traffic until it died” incidents.

submit(task)
    │
    ├─ running < corePoolSize → create thread, run
    ├─ else enqueue if queue not full
    ├─ else if running < maxPoolSize → create extra thread
    └─ else RejectedExecutionHandler (AbortPolicy, CallerRuns, Discard, custom)

Idle threads > core → expire after keepAliveTime (if allowCoreThreadTimeOut false, core stay)

Parameter	Role
`corePoolSize`	Baseline threads kept alive
`maximumPoolSize`	Cap when queue is full (bounded queue)
`keepAliveTime`	Idle non-core thread TTL
`workQueue`	`LinkedBlockingQueue`, `ArrayBlockingQueue`, `SynchronousQueue`
`RejectedExecutionHandler`	Policy when saturated

Work queue choice

Queue	Behavior
`LinkedBlockingQueue` (unbounded)	Tasks never rejected until OOM—latency grows under load
`ArrayBlockingQueue` (bounded)	Backpressure when full—pairs with max pool + rejection policy
`SynchronousQueue`	Zero capacity—handoff to thread or new thread; used by cached pool

RejectedExecutionHandler policies

AbortPolicy (default) — throw RejectedExecutionException—fail fast, caller handles
CallerRunsPolicy — run task on submitting thread—slows producers, natural backpressure
DiscardPolicy — silent drop—only for non-critical telemetry
DiscardOldestPolicy — drop head of queue—rare, can lose oldest work unpredictably

Executors factories — production cautions

Factory methods hide the queue and bounds. Treat them as shortcuts for prototypes, not production defaults without reading the JDK source behavior.

Factory	Risk
`newFixedThreadPool`	Unbounded `LinkedBlockingQueue`—tasks pile up, memory pressure
`newCachedThreadPool`	Unbounded threads + `SynchronousQueue`—under spike creates thousands of threads → OOM, context switch storm
`newSingleThreadExecutor`	Serialized execution—OK for ordered jobs; unbounded queue
`newScheduledThreadPool`	Timers/cron-like tasks—size the pool for overlap

ThreadPoolExecutor pool = new ThreadPoolExecutor(
    4, 16,
    60, TimeUnit.SECONDS,
    new ArrayBlockingQueue<>(1000),
    Thread.ofPlatform().name("api-", 0).factory(),
    new ThreadPoolExecutor.CallerRunsPolicy());  // backpressure on caller

⚠️ Pitfall

Never use Executors.newCachedThreadPool() for bursty server request handling without limits. Define explicit pool sizes, bounded queues, and rejection metrics—Spring Boot exposes task.execution pool tuning.

CompletableFuture Java 8+

Composable async pipelines without nested Future.get() blocking. A CompletableFuture<T> is both a Future and a completion stage you can chain. Default async stages use ForkJoinPool.commonPool()—fine for CPU-light transforms; dangerous for blocking JDBC/HTTP unless you pass an explicit executor.

Creating and transforming

Method	Role
`supplyAsync(Supplier)`	Start async computation with result
`runAsync(Runnable)`	Async void task
`thenApply`	Map result T → U (sync on completion thread unless *Async)
`thenCompose`	Flat-map when next step returns another CF (avoid nested CF)
`thenCombine`	Merge two CFs when both complete
`allOf` / `anyOf`	Wait for all or first completion

Error handling stages

exceptionally(ex -> fallback) runs only if the stage completed exceptionally—like catch. handle((result, ex) -> ...) sees both success and failure in one function. whenComplete is for side effects (logging) without changing the value—exceptions from the consumer can still complete the next stage exceptionally.

Unchecked exceptions in supplyAsync complete the CF exceptionally; checked exceptions cannot be thrown from Supplier without wrapping.

ExecutorService ioPool = Executors.newFixedThreadPool(32);

CompletableFuture<User> userFuture = CompletableFuture
    .supplyAsync(() -> http.fetchUser(id), ioPool);

CompletableFuture<Orders> ordersFuture = userFuture.thenCompose(u ->
    CompletableFuture.supplyAsync(() -> http.fetchOrders(u.id()), ioPool));

CompletableFuture<String> summary = userFuture.thenCombine(ordersFuture,
    (u, o) -> u.name() + " has " + o.count() + " orders");

// Error handling
CompletableFuture<User> safe = userFuture
    .exceptionally(ex -> User.guest())
    .handle((u, ex) -> ex == null ? u : User.guest());

userFuture.whenComplete((u, ex) -> {
    if (ex != null) log.error("failed", ex);
});

CompletableFuture<Void> all = CompletableFuture.allOf(userFuture, ordersFuture);
CompletableFuture<Object> first = CompletableFuture.anyOf(userFuture, ordersFuture);

Blocking at the edge

Inside service code, chain freely. At the boundary (HTTP controller, main), use cf.get(timeout), join(), or convert to reactive type with explicit timeout. join() on failed CF throws unchecked CompletionException wrapping the cause—unwrap getCause() for the real failure.

Chaining patterns & anti-patterns

Good	Avoid
`thenCompose` when next step returns CF (flatMap)	`thenApply` returning CF → nested `CompletableFuture<CompletableFuture<T>>>`
Dedicated executor for blocking JDBC/HTTP	Blocking call inside `supplyAsync` on common pool
`orTimeout` / `completeOnTimeout` 9+	Infinite `get()` without timeout
One `exceptionally` at pipeline end	Log in every stage and rethrow

📦 Real World

Microservice orchestration: fetch user, then orders, then recommendations with thenCompose and a shared bounded IO executor. Set client timeouts with orTimeout on each stage. Aggregate with allOf for parallel fan-out to downstream APIs.

ForkJoinPool & divide-and-conquer

ForkJoinPool implements work-stealing: each worker thread maintains a deque of tasks. When idle, a worker steals from the tail of another worker’s deque—balances CPU-bound fork/join workloads without a central queue bottleneck. Parallel streams default to the common pool (ForkJoinPool.commonPool()).

Fork / join / compute

fork() — schedule subtask asynchronously on the pool
compute() — run subtask on current thread (or join stolen work)
join() — wait for forked subtask result

Rule of thumb: fork only when the split is large enough to amortize scheduling overhead—often thousands of elements for array work. RecursiveTask<V> returns a value; RecursiveAction is void (side-effect only if you manage merging carefully).

Do not block inside ForkJoin tasks on I/O—blocks a worker and reduces steal efficiency; use an Executor for I/O instead.

class SumTask extends RecursiveTask<Long> {
    private final int[] arr;
    private final int lo, hi;

    @Override
    protected Long compute() {
        if (hi - lo <= 10_000) {
            long sum = 0;
            for (int i = lo; i < hi; i++) sum += arr[i];
            return sum;
        }
        int mid = (lo + hi) >>> 1;
        SumTask left = new SumTask(arr, lo, mid);
        SumTask right = new SumTask(arr, mid, hi);
        left.fork();
        long rightVal = right.compute();
        return left.join() + rightVal;
    }
}

ForkJoinPool pool = new ForkJoinPool();
long total = pool.invoke(new SumTask(array, 0, array.length));
pool.shutdown();

Virtual threads Java 21 — Project Loom

Platform threads = OS threads (expensive, ~MB stack). Virtual threads = JVM-managed, cheap to create millions—mounted on a small pool of carrier platform threads. When virtual thread blocks on I/O, it unmounts so the carrier runs another virtual thread.

Carrier platform threads (few)
    │
    ├─ mount virtual thread A → JDBC read blocks → unmount A
    ├─ mount virtual thread B → runs
    └─ mount virtual thread C → ...

Millions of virtual threads OK when most are blocked on I/O, not CPU-spinning

Thread v = Thread.ofVirtual().name("req-42").start(() -> {
    String body = httpClient.sendBlocking(request);  // blocks virtual thread cheaply
});

try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 10_000).forEach(i ->
        exec.submit(() -> handleRequest(i)));
}

Pinning virtual threads

When a virtual thread blocks in native code or on a monitor while holding a carrier, it may pin the carrier thread—reducing scalability. JDK flight recorder event jdk.VirtualThreadPinned helps diagnose. Mitigations: reduce synchronized in hot paths, prefer ReentrantLock, update drivers/libraries that block carriers, avoid long CPU loops on virtual threads.

Structured concurrency Java 21+

StructuredTaskScope models child tasks as nested within a parent scope—if the parent ends (success, failure, or cancel), children are cancelled together. This fixes “fire off async tasks and lose track” bugs in imperative code. Shutdown policies like ShutdownOnFailure fail fast when any child fails; ShutdownOnSuccess supports racing alternatives. API stabilized across JDK 21–23—check your target JDK javadoc for exact class names and preview flags.

// Conceptual shape — API evolved across JDK versions; check your JDK javadoc
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    Subtask<User> user = scope.fork(() -> fetchUser(id));
    Subtask<Orders> orders = scope.fork(() -> fetchOrders(id));
    scope.join();
    scope.throwIfFailed();
    return combine(user.get(), orders.get());
}

When virtual threads shine vs when they do not

Good fit	Poor fit
I/O-bound: HTTP, JDBC, messaging	CPU-bound compute (use platform pool or ForkJoin)
High fan-out request-per-thread style	Heavy `synchronized` on hot paths (pins carrier—"pinning")
Simpler than reactive for many teams	Native code / some blocking in libraries being updated

Spring Boot 3.2+ & reactive vs imperative

Spring Boot 3.2+ can enable virtual threads for request handling (spring.threads.virtual.enabled=true). Embedded servers dispatch each request on a virtual thread; blocking JDBC, JPA, and RestClient calls release the carrier while waiting on the network or DB— so thread count is no longer the limiting factor for I/O concurrency in many apps.

Limits remain: connection pool size, downstream rate limits, heap, and CPU for serialization still cap throughput. WebFlux/reactive stacks still win when you need end-to-end reactive streams, backpressure, and non-blocking drivers throughout. Virtual threads are the pragmatic middle path: keep imperative code, gain I/O concurrency—without claiming they help CPU-bound thread pools.

🎯 Interview Tip

Contrast: platform thread pool size ≈ CPU cores for CPU work; virtual threads ≈ number of concurrent blocking operations. Explain pinning, why synchronized can hurt virtual threads (use ReentrantLock in hot paths when profiling shows pinning). Draw happens-before for volatile + lock.