JVM & Platform Architecture
Production outages from “memory leaks” and “random pauses” are almost always JVM mechanics misunderstood: class loaders pinning Metaspace, promotion flooding Old Gen, or threads blocked on pools you cannot see in logs alone. This chapter goes deep—not a glossary—on how HotSpot actually runs your bytecode.
JVM architecture
The JVM specification describes a machine that loads verified bytecode, executes it through an engine, stores state in defined memory areas, and calls native code when Java ends and the OS begins. HotSpot (OpenJDK’s default VM) is the implementation most servers run.
At runtime, four subsystems interact in a fixed logical order—this is the diagram worth memorizing:
YOUR CODE (.java → .class bytecode)
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ① CLASS LOADER SUBSYSTEM │
│ Bootstrap → Platform → Application (+ custom loaders) │
│ Load · Link (verify, prepare, resolve) · Initialize │
└────────────────────────────┬────────────────────────────────────┘
│ types & metadata stored in
▼
┌─────────────────────────────────────────────────────────────────┐
│ ② RUNTIME DATA AREAS │
│ Heap (objects) · Metaspace (class metadata) │
│ Per-thread: Java stack, PC register, native method stack │
└────────────────────────────┬────────────────────────────────────┘
│ bytecode executed by
▼
┌─────────────────────────────────────────────────────────────────┐
│ ③ EXECUTION ENGINE │
│ Interpreter · JIT (C1 / C2) · GC threads · runtime stubs │
└────────────────────────────┬────────────────────────────────────┘
│ JNI calls into
▼
┌─────────────────────────────────────────────────────────────────┐
│ ④ NATIVE METHOD INTERFACE (JNI) │
│ C/C++ libraries, OS APIs, GPU drivers, compression codecs │
└─────────────────────────────────────────────────────────────────┘
What each layer does in production
- Class loaders — bring types into the VM; wrong loader boundaries cause ClassNotFoundException, LinkageError, and Metaspace leaks on redeploy.
- Runtime data areas — where object graphs (heap), per-thread call state (stacks), and class metadata (Metaspace) live; OOM types tell you which area failed.
- Execution engine — interprets cold bytecode, JIT-compiles hot paths, runs GC; CPU profiles and pause times originate here.
- JNI / native — every I/O, clock, and crypto call eventually hits native code; JNI misuse can corrupt heap or leak native memory outside GC’s view.
A Spring Boot fat JAR starts with LaunchedURLClassLoader (nested JARs), allocates millions of request-scoped objects on the heap, caches class metadata in Metaspace, JIT-compiles controller hot paths after warm-up, and uses JNI for PostgreSQL drivers, DNS, and TLS via the OS.
Class loading
A class is not “loaded” in one step. The VM runs loading → linking → initialization. Loaders form a tree; the delegation model (parent-first) ensures core JDK classes cannot be spoofed by application JARs.
The three phases
| Phase | What happens |
|---|---|
| Loading | Binary .class (or generated bytes) becomes a Class<?> object; a specific loader records ownership. |
| Linking | Verify bytecode (type safety, stack map tables). Prepare static fields (default values). Resolve symbolic references to direct pointers (may be lazy). |
| Initialization | Run <clinit> — static initializers and static field assignments in source order. |
Bootstrap → Platform → Application
JDK 8 and earlier documentation used “Extension class loader”; JDK 9+ module system renamed concepts but the delegation idea persists.
| Loader | Implementation | Typical contents |
|---|---|---|
| Bootstrap | C++ inside the VM; ClassLoader.getSystemClassLoader()’s parent chain ends at bootstrap, which reports null | java.*, javax.* core APIs, VM intrinsic classes |
| Platform (Extension) | PlatformClassLoader — JDK modules not on the bootstrap path | XML, SQL, management agents, compiler tools when on module path |
| Application (System) | AppClassLoader — classpath and module path for your JARs | Your services, Spring, drivers, test classes |
Delegation model (parent-first)
When asked to load com.example.PaymentService, the application loader first delegates to its parent (platform), which delegates to bootstrap. Only if parents cannot find the class does the child define it.
Application ClassLoader "com.example.Foo?"
│ not found in parent chain for com.example.*
▼
Platform ClassLoader (delegates upward first)
▼
Bootstrap ClassLoader java.* only
│
└──► returns java.lang.String if already loaded
Child defines com.example.Foo only when parents pass.
Why it matters: You cannot replace java.lang.String from a malicious JAR on the classpath—bootstrap already defined the real one. Containers sometimes use child-first (inverse) for servlet isolation so webapp copies of libraries win over shared ones—powerful but easy to misconfigure.
Dynamic class loading
Classes appear after startup through:
- Reflection — Class.forName("com.plugin.Rule") triggers loading if absent.
- Bytecode generation — Hibernate, Mockito, Spring CGLIB create synthetic subclasses at runtime.
- Agents — -javaagent: transforms classes at load time (metrics, tracing).
- Custom loaders — OSGi, plugin architectures, URLClassLoader over HTTP (rare, security risk).
// Context class loader — critical in thread pools + EE containers
ClassLoader previous = Thread.currentThread().getContextClassLoader();
try {
Thread.currentThread().setContextClassLoader(myPluginLoader);
Class<?> plugin = Class.forName("com.vendor.Plugin", true, myPluginLoader);
Runnable task = (Runnable) plugin.getDeclaredConstructor().newInstance();
task.run();
} finally {
Thread.currentThread().setContextClassLoader(previous);
}
Metaspace leak on redeploy: Tomcat WAR hot-deploy without releasing the webapp WebappClassLoader pins thousands of classes. Wrong TCCL on pooled threads: JDBC driver registered with one loader, job runs with another → SQLException: No suitable driver.
Draw parent-first vs child-first. Explain double-checked locking class init (JLS guarantees thread-safe init). Know difference between Class.forName(name, initialize, loader) and loadClass (no init).
Runtime data areas
The specification divides JVM memory into areas with different lifetimes and threading rules. Misattributing an OOM to “heap” when Metaspace or native memory is the culprit wastes hours of tuning.
| Area | Shared? | Stores | Typical OOM |
|---|---|---|---|
| Heap | All threads | Object instances, arrays | Java heap space |
| Metaspace | All threads | Class metadata, method bytecode, constant pools | Metaspace |
| Java stack | Per thread | Stack frames (locals, operand stack) | StackOverflowError |
| PC register | Per thread | Address of current bytecode instruction | — |
| Native method stack | Per thread | JNI / native method frames | Native OOM / signal |
Method area (Metaspace) Java 8+
Before Java 8, class metadata lived in PermGen on the heap with a fixed max—frequent PermGen space OOMs. Java 8 moved metadata to Metaspace in native memory.
Stored per loaded class (not per object instance):
- Full bytecode of methods, field/method tables, inner class tables
- Runtime constant pool (strings, class references, method handles)
- Annotations visible to reflection (unless stripped by ProGuard)
- vtable / itable structures for virtual dispatch
Metaspace grows by committing native memory in chunks. Limits:
-XX:MetaspaceSize=256m # initial commit threshold (triggers GC earlier)
-XX:MaxMetaspaceSize=512m # hard cap — ClassLoader.defineClass fails beyond this
-XX:+TraceClassLoading # verbose — dev/troubleshooting only
Each class loader has a ClassLoaderData graph in native memory. Generated classes (proxies) multiply metadata cost—100 Hibernate entities with lazy proxies means hundreds of synthetic classes. jcmd <pid> VM.metaspace summarizes usage.
Heap: Young Gen → Old Gen
Almost every new places an object on the heap (escape analysis can elide some). Generational collectors exploit the weak generational hypothesis: most objects die young.
HEAP ( -Xmx caps total )
┌──────────────────────────────────────────────────────────┐
│ YOUNG GENERATION │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ EDEN — every new object starts here (TLAB bump) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ minor GC (copying) when Eden full │
│ ▼ │
│ ┌──────────────┐ copy live objects ┌──────────────┐ │
│ │ Survivor S0 │ ◄──────────────────────► │ Survivor S1 │ │
│ │ (empty swap)│ age++ each survive │ │ │
│ └──────────────┘ └──────────────┘ │
└────────────────────────────┬─────────────────────────────┘
│ age > TenuringThreshold (e.g. 15)
▼ promotion
┌──────────────────────────────────────────────────────────┐
│ OLD GENERATION — long-lived caches, singletons, pools │
│ major / mixed / full GC collects here │
└──────────────────────────────────────────────────────────┘
Object allocation lifecycle
- TLAB allocation — each thread bumps a pointer in a Thread-Local Allocation Buffer in Eden (fast path, lock-free).
- Eden fill — when Eden cannot fit the object, a minor GC (Young collection) runs.
- Copy collection — live objects copied to the empty survivor space; dead objects vanish (no sweep in young gen).
- Aging — each survivor round increments age; when age ≥ threshold, object is promoted to Old Gen.
- Old Gen pressure — long-lived graphs (caches, HTTP client pools, static collections) fill Old; collector runs concurrent or STW phases depending on algorithm.
Object layout (HotSpot 64-bit, compressed oops on): mark word + klass pointer + fields; arrays add length. Alignment padding affects footprint—why “small objects” still cost 16+ bytes.
Premature promotion — allocating huge short-lived arrays can fill Eden and push long-lived garbage into Old Gen before it dies, causing expensive old collections. Heap cannot be freed to the OS easily—-Xmx reserved space often stays committed after spikes.
Java stack: frames, locals, operand stack
Each thread has its own Java Virtual Machine stack. A method call pushes a stack frame; return pops it. Instance methods receive this at local slot 0.
A frame contains:
- Local variable array — fixed-size table: parameters, int i, reference variables, long/double take two slots.
- Operand stack — bytecode is stack-oriented: iload_1, iload_2, iadd pops two ints, pushes sum.
- Constant pool reference — for instructions like ldc, invokevirtual.
- Return address — where to continue after return (used by interpreter/JIT).
int add(int a, int b) {
return a + b;
}
// Compiled bytecode conceptually:
// iload_1, iload_2, iadd, ireturn
// Frame locals: [this?] a b (static methods omit this)
Default stack size -Xss (often 1 MB per thread on Linux). A service with 500 platform threads ≈ 500 MB stack alone—virtual threads (Java 21) shrink this cost for I/O workloads.
Stack overflow is not heap OOM. Deep recursion or infinite mutual calls → StackOverflowError. Large local arrays on stack are rare in Java (arrays are heap objects); the reference is on the stack, data is not.
PC register & native method stack
PC (program counter) register
Per thread, holds the address of the current bytecode instruction being executed. After an instruction completes, the PC advances. If the thread is running native code (JNI), the PC is undefined—native code uses real CPU program counters on the native stack.
Native method stack
When you call System.arraycopy or socket I/O, HotSpot may enter an intrinsic or JNI implementation written in C++. That execution uses the native method stack (often the same pthread stack as the Java stack on HotSpot, interleaved). Native allocations are invisible to Java heap GC—memory leaks in JNI malloc require native profilers.
Netty, compression libraries, and database drivers spend significant time in native stacks. “GC cannot save you” from native OOM caused by leaking off-heap buffers if you do not use pooled direct buffer accounting (-XX:MaxDirectMemorySize).
JIT compilation
HotSpot never interprets everything forever. It profiles bytecode execution counts and branch bias, then compiles hot methods to native machine code stored in the code cache.
Interpreter → C1 → C2
| Stage | Compiler | Characteristics |
|---|---|---|
| 0 | Interpreter | Template interpreter / C++ loop; collects profiling data (invocation counts, branch taken %) |
| 1 | C1 (Client) | Fast compile (~ms); light inlining; emits profiling for tiered decision |
| 2 | C2 (Server) | Slow compile; aggressive inlining, escape analysis, loop unrolling, SIMD auto-vectorization |
Tiered compilation (-XX:+TieredCompilation, default on server): methods start interpreted, get C1 quickly for speed, then “upgrade” to C2 when hot enough. Short-lived CLI apps may never reach C2.
Warm-up and thresholds
- -XX:CompileThreshold — invocations before C2 compile (default tiered uses complex counters, not one number).
- OSR (On-Stack Replacement) — compile hot loops inside cold methods (loop in rarely called handler).
- Deoptimization — rare path assumption fails (e.g. class hierarchy change) → fall back to interpreter, fix assumptions, maybe recompile.
# Code cache — OutOfMemoryError: CodeCache if too small
-XX:ReservedCodeCacheSize=256m
-XX:InitialCodeCacheSize=64m
# Observe compilations (example output)
# 123 4 ! 3 java.util.HashMap::getNode (bytes,code) type level
java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions \
-XX:CompileCommand=print,com.example.HotPath::process
Latency spikes after deploy often mean “cold code” (interpreted + C1). Warmup traffic, readiness probes hitting real endpoints, or JFR jdk.CompilerPhase events validate compile progress before taking SLA traffic.
Garbage collection
GC answers one question: which objects are still reachable from GC roots? Everything else is reclaimable. Algorithms differ in how they walk the graph, move objects, and pause application threads.
GC roots & reachability
Root set (starting points for tracing)—if there is no path from any root to an object, it is garbage:
- Local variables and operand stacks of active stack frames (per thread)
- Static fields of loaded classes
- JNI global references (native code holding Java objects)
- Objects used as monitors for synchronized
- JVM internal tables (JNI handles, code cache oops, etc.)
Mark-and-sweep (foundation)
Mark: traverse from roots, set mark bit on reachable objects (tri-color abstraction: white/grey/black in concurrent collectors). Sweep: walk heap, free unmarked objects—creates fragmentation in old gen. Production collectors add copying (young gen eliminates fragmentation) and compaction (slide live objects, update references).
Collector reference
| Collector | Mechanism | Pause profile | When to choose |
|---|---|---|---|
| Serial | Single GC thread, STW young + old | Pause grows with heap | Single-core embedded, tiny heaps |
| Parallel (Throughput) | Multi-thread STW copy + compact | Long pauses, max CPU on GC | Batch ETL, offline analytics—throughput over latency |
| CMS | Concurrent mark, STW remark/sweep | Low average, fragmentation risk | Legacy only — removed Java 14+ |
| G1 | Heap split into regions; incremental evacuation; default Java 9+ | Target pause ms via MaxGCPauseMillis | General-purpose services, 4–32 GB heaps typical |
| ZGC | Colored pointers, concurrent relocate | Sub-ms pauses at large heaps | Latency SLO strict, heaps 8 GB+ Java 15+ |
| Shenandoah | Brooks forwarding pointers, concurrent compact | Similar to ZGC goals | Red Hat builds / OpenJDK with Shenandoah enabled |
Minor GC vs Major GC vs Full GC
- Minor GC / Young GC — collects Eden + survivors (all algorithms). Frequency high; duration often milliseconds on healthy heaps.
- Major GC / Old GC — old generation collection. With G1, often appears as “Mixed” evacuation of old regions with garbage.
- Full GC — stop-the-world collection of entire heap (and often class unloading). Triggers: Metaspace pressure, System.gc(), heap exhaustion, promotion failure. P99 killers—minimize frequency.
Promotion failure: Survivors cannot hold all live objects after minor GC → objects promoted to fragmented old gen → triggers full GC. Sign of survivor sizing, allocation burst, or old gen too full.
Explain tri-color concurrent marking: mutator must not hide objects from collector (write barriers). Contrast throughput (Parallel) vs latency (G1/ZGC). “GC is automatic” ≠ no tuning—allocation rate dominates.
GC tuning flags & reading logs
Tune from metrics: allocation rate (MB/s), GC pause P99, old gen occupancy after full GC, Metaspace slope. Flags below are starting points—always validate on load tests mirroring production shape.
Essential JVM flags
| Flag | Effect |
|---|---|
-Xms / -Xmx | Initial and maximum heap. Set -Xms=-Xmx in containers to avoid resize and OS commit churn. |
-XX:+UseG1GC | Enable G1 (explicit if your JDK defaults differ). |
-XX:MaxGCPauseMillis=200 | Soft pause target—G1 adjusts region sizing and mixed GC cadence (not a hard guarantee). |
-XX:MaxGCPauseMillis too low | Tiny regions → overhead; too high → long pauses. 100–300 ms common for APIs. |
-XX:+UseZGC | Enable ZGC; pair with -Xmx sizing for large heaps. |
-XX:InitiatingHeapOccupancyPercent | G1: start concurrent marking when old % exceeds threshold (default 45). |
-XX:+HeapDumpOnOutOfMemoryError | Post-mortem MAT analysis on heap OOM. |
# Production-style G1 logging (Java 9+ unified JVM logging)
java -Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:+HeapDumpOnOutOfMemoryError \
-Xlog:gc*,gc+age=trace:file=gc-%t.log:time,uptime,level:filecount=5,filesize=50m \
-jar service.jar
# ZGC example (Java 21)
java -XX:+UseZGC -Xms8g -Xmx8g -Xlog:gc*:file=gc.log -jar service.jar
Reading GC log lines
Young collection (G1):
[2.145s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause)
[2.145s][info][gc] GC(12) Eden regions: 24->0(20)
[2.145s][info][gc] GC(12) Survivor regions: 2->3(4)
[2.145s][info][gc] GC(12) Old regions: 10->10
[2.145s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 24M->8M(256M) 12.345ms
- Eden regions: 24→0 — Eden evacuated completely.
- 24M→8M(256M) — heap used before → after (committed capacity).
- 12.345ms — STW pause—what hits your API latency for this event.
Mixed GC (G1): collects some old regions with high garbage alongside young—key to G1 old-gen hygiene without full heap pause.
Full GC: log line containing Pause Full — investigate Metaspace, System.gc(), or allocation failure immediately.
Correlate logs with jstat -gcutil <pid> 1000ms: watch O (old) climb toward 100% and FGC count increase.
Kubernetes memory limit must exceed -Xmx + Metaspace + thread stacks (-Xss × thread count) + code cache + direct memory. OOMKilled by kube with heap at 60% means native/memory overhead—not a Java heap leak.
JVM flags & diagnostic tools
When the process is alive but slow, or CPU is pegged, you need a toolkit that maps symptoms to layers: threads (stuck?), heap (what objects?), GC (pauses?), native (off-heap?).
JDK CLI tools (same JDK as the process)
| Tool | Command | Use when |
|---|---|---|
| jps | jps -lvm |
List JVM PIDs, main class, args—find the right process on shared hosts. |
| jstack | jstack -l <pid> |
Thread dump: deadlocks, pool exhaustion, stuck I/O, virtual thread mount info (21+). |
| jmap | jmap -histo:live <pid> |
Class histogram after full GC—what dominates heap (char[], byte[], your DTOs). |
| jmap | jmap -dump:format=b,file=heap.hprof <pid> |
Full heap dump for Eclipse MAT / VisualVM—heavy, capture during incident. |
| jstat | jstat -gcutil <pid> 1000ms |
Live S0/S1/E/O percentages, YGC/FGC counts—cheap GC dashboard. |
| jcmd | jcmd <pid> help |
Swiss army: JFR, VM.flags, GC.heap_info, Thread.print. |
# Incident capture script (run in same second)
PID=$(jps -l | awk '/com.example.Application/{print $1}')
jstack -l $PID > threads-$(date +%s).txt
jmap -histo:live $PID | head -80 > histo-$(date +%s).txt
jcmd $PID VM.flags > flags.txt
jcmd $PID JFR.start name=incident duration=120s filename=incident.jfr
# Thread dump — look for:
# java.lang.Thread.State: BLOCKED (on object monitor)
# "pool-1-thread-3" waiting on java.util.concurrent.locks
jstat -gcutil $PID 1000ms
jconsole
JDK-bundled JMX GUI. Attach to local PID or remote process with JMX ports open. Graphs heap, threads, classes, CPU. Good for quick visual trends; less detail than JFR for deep dives. Enable JMX remotely only with authentication and TLS in production.
VisualVM
Desktop profiler (standalone or via GraalVM downloads). Heap walker, sampler CPU, thread snapshots, heap dump analysis. Compare histograms between two points in time to find leak suspects (objects growing without bound).
async-profiler
Low-overhead sampling profiler using perf_events (Linux/macOS). Produces flame graphs without Safepoint bias that inflates certain JVM stacks. Modes:
cpu— where CPU time goes (hot methods after JIT)alloc— allocation hotspots (who creates byte[]?)lock— contended locks
./profiler.sh -d 60 -e cpu -f /tmp/cpu.html <pid>
./profiler.sh -d 60 -e alloc -f /tmp/alloc.html <pid>
Java Flight Recorder (JFR)
Built into OpenJDK. Low overhead when configured (-XX:StartFlightRecording or jcmd JFR.start). Events: GC pauses, socket I/O, method samples, allocation outside TLAB, lock inflation. Analyze in JDK Mission Control (JMC)—timeline correlates GC pause with thread blocked.
Workflow: jstat confirms GC pressure → GC logs identify pause type → jmap histo finds dominant classes → heap dump if leak → async-profiler alloc finds allocator → JFR for time-correlated story. Never change -Xmx without this evidence chain.