JVM & Platform Architecture

JVM architecture

The JVM specification describes a machine that loads verified bytecode, executes it through an engine, stores state in defined memory areas, and calls native code when Java ends and the OS begins. HotSpot (OpenJDK’s default VM) is the implementation most servers run.

At runtime, four subsystems interact in a fixed logical order—this is the diagram worth memorizing:

  YOUR CODE (.java → .class bytecode)
              │
              ▼
┌─────────────────────────────────────────────────────────────────┐
│  ① CLASS LOADER SUBSYSTEM                                       │
│     Bootstrap → Platform → Application (+ custom loaders)       │
│     Load · Link (verify, prepare, resolve) · Initialize         │
└────────────────────────────┬────────────────────────────────────┘
                             │ types & metadata stored in
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  ② RUNTIME DATA AREAS                                           │
│     Heap (objects) · Metaspace (class metadata)                 │
│     Per-thread: Java stack, PC register, native method stack    │
└────────────────────────────┬────────────────────────────────────┘
                             │ bytecode executed by
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  ③ EXECUTION ENGINE                                             │
│     Interpreter · JIT (C1 / C2) · GC threads · runtime stubs      │
└────────────────────────────┬────────────────────────────────────┘
                             │ JNI calls into
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  ④ NATIVE METHOD INTERFACE (JNI)                                │
│     C/C++ libraries, OS APIs, GPU drivers, compression codecs     │
└─────────────────────────────────────────────────────────────────┘

What each layer does in production

Class loaders — bring types into the VM; wrong loader boundaries cause ClassNotFoundException, LinkageError, and Metaspace leaks on redeploy.
Runtime data areas — where object graphs (heap), per-thread call state (stacks), and class metadata (Metaspace) live; OOM types tell you which area failed.
Execution engine — interprets cold bytecode, JIT-compiles hot paths, runs GC; CPU profiles and pause times originate here.
JNI / native — every I/O, clock, and crypto call eventually hits native code; JNI misuse can corrupt heap or leak native memory outside GC’s view.

📦 Real World

A Spring Boot fat JAR starts with LaunchedURLClassLoader (nested JARs), allocates millions of request-scoped objects on the heap, caches class metadata in Metaspace, JIT-compiles controller hot paths after warm-up, and uses JNI for PostgreSQL drivers, DNS, and TLS via the OS.

Class loading

A class is not “loaded” in one step. The VM runs loading → linking → initialization. Loaders form a tree; the delegation model (parent-first) ensures core JDK classes cannot be spoofed by application JARs.

The three phases

Phase	What happens
Loading	Binary .class (or generated bytes) becomes a Class<?> object; a specific loader records ownership.
Linking	Verify bytecode (type safety, stack map tables). Prepare static fields (default values). Resolve symbolic references to direct pointers (may be lazy).
Initialization	Run <clinit> — static initializers and static field assignments in source order.

Bootstrap → Platform → Application

JDK 8 and earlier documentation used “Extension class loader”; JDK 9+ module system renamed concepts but the delegation idea persists.

Loader	Implementation	Typical contents
Bootstrap	C++ inside the VM; ClassLoader.getSystemClassLoader()’s parent chain ends at bootstrap, which reports null	java., javax. core APIs, VM intrinsic classes
Platform (Extension)	PlatformClassLoader — JDK modules not on the bootstrap path	XML, SQL, management agents, compiler tools when on module path
Application (System)	AppClassLoader — classpath and module path for your JARs	Your services, Spring, drivers, test classes

Delegation model (parent-first)

When asked to load com.example.PaymentService, the application loader first delegates to its parent (platform), which delegates to bootstrap. Only if parents cannot find the class does the child define it.

Application ClassLoader  "com.example.Foo?"
        │  not found in parent chain for com.example.*
        ▼
Platform ClassLoader     (delegates upward first)
        ▼
Bootstrap ClassLoader    java.* only
        │
        └──► returns java.lang.String if already loaded

Child defines com.example.Foo only when parents pass.

Why it matters: You cannot replace java.lang.String from a malicious JAR on the classpath—bootstrap already defined the real one. Containers sometimes use child-first (inverse) for servlet isolation so webapp copies of libraries win over shared ones—powerful but easy to misconfigure.

Dynamic class loading

Classes appear after startup through:

Reflection — Class.forName("com.plugin.Rule") triggers loading if absent.
Bytecode generation — Hibernate, Mockito, Spring CGLIB create synthetic subclasses at runtime.
Agents — -javaagent: transforms classes at load time (metrics, tracing).
Custom loaders — OSGi, plugin architectures, URLClassLoader over HTTP (rare, security risk).

// Context class loader — critical in thread pools + EE containers
ClassLoader previous = Thread.currentThread().getContextClassLoader();
try {
    Thread.currentThread().setContextClassLoader(myPluginLoader);
    Class<?> plugin = Class.forName("com.vendor.Plugin", true, myPluginLoader);
    Runnable task = (Runnable) plugin.getDeclaredConstructor().newInstance();
    task.run();
} finally {
    Thread.currentThread().setContextClassLoader(previous);
}

⚠️ Pitfall

Metaspace leak on redeploy: Tomcat WAR hot-deploy without releasing the webapp WebappClassLoader pins thousands of classes. Wrong TCCL on pooled threads: JDBC driver registered with one loader, job runs with another → SQLException: No suitable driver.

🎯 Interview Tip

Draw parent-first vs child-first. Explain double-checked locking class init (JLS guarantees thread-safe init). Know difference between Class.forName(name, initialize, loader) and loadClass (no init).

Runtime data areas

The specification divides JVM memory into areas with different lifetimes and threading rules. Misattributing an OOM to “heap” when Metaspace or native memory is the culprit wastes hours of tuning.

Area	Shared?	Stores	Typical OOM
Heap	All threads	Object instances, arrays	Java heap space
Metaspace	All threads	Class metadata, method bytecode, constant pools	Metaspace
Java stack	Per thread	Stack frames (locals, operand stack)	StackOverflowError
PC register	Per thread	Address of current bytecode instruction	—
Native method stack	Per thread	JNI / native method frames	Native OOM / signal

Method area (Metaspace) Java 8+

Before Java 8, class metadata lived in PermGen on the heap with a fixed max—frequent PermGen space OOMs. Java 8 moved metadata to Metaspace in native memory.

Stored per loaded class (not per object instance):

Full bytecode of methods, field/method tables, inner class tables
Runtime constant pool (strings, class references, method handles)
Annotations visible to reflection (unless stripped by ProGuard)
vtable / itable structures for virtual dispatch

Metaspace grows by committing native memory in chunks. Limits:

-XX:MetaspaceSize=256m          # initial commit threshold (triggers GC earlier)
-XX:MaxMetaspaceSize=512m       # hard cap — ClassLoader.defineClass fails beyond this
-XX:+TraceClassLoading          # verbose — dev/troubleshooting only

🔬 Under the Hood

Each class loader has a ClassLoaderData graph in native memory. Generated classes (proxies) multiply metadata cost—100 Hibernate entities with lazy proxies means hundreds of synthetic classes. jcmd <pid> VM.metaspace summarizes usage.

Heap: Young Gen → Old Gen

Almost every new places an object on the heap (escape analysis can elide some). Generational collectors exploit the weak generational hypothesis: most objects die young.

                    HEAP ( -Xmx caps total )
    ┌──────────────────────────────────────────────────────────┐
    │  YOUNG GENERATION                                         │
    │  ┌─────────────────────────────────────────────────────┐ │
    │  │  EDEN  — every new object starts here (TLAB bump)    │ │
    │  └─────────────────────────────────────────────────────┘ │
    │         │ minor GC (copying) when Eden full              │
    │         ▼                                                │
    │  ┌──────────────┐    copy live objects    ┌──────────────┐ │
    │  │ Survivor S0  │ ◄──────────────────────► │ Survivor S1  │ │
    │  │  (empty swap)│      age++ each survive │              │ │
    │  └──────────────┘                         └──────────────┘ │
    └────────────────────────────┬─────────────────────────────┘
                                 │ age > TenuringThreshold (e.g. 15)
                                 ▼ promotion
    ┌──────────────────────────────────────────────────────────┐
    │  OLD GENERATION — long-lived caches, singletons, pools   │
    │  major / mixed / full GC collects here                    │
    └──────────────────────────────────────────────────────────┘

Object allocation lifecycle

TLAB allocation — each thread bumps a pointer in a Thread-Local Allocation Buffer in Eden (fast path, lock-free).
Eden fill — when Eden cannot fit the object, a minor GC (Young collection) runs.
Copy collection — live objects copied to the empty survivor space; dead objects vanish (no sweep in young gen).
Aging — each survivor round increments age; when age ≥ threshold, object is promoted to Old Gen.
Old Gen pressure — long-lived graphs (caches, HTTP client pools, static collections) fill Old; collector runs concurrent or STW phases depending on algorithm.

Object layout (HotSpot 64-bit, compressed oops on): mark word + klass pointer + fields; arrays add length. Alignment padding affects footprint—why “small objects” still cost 16+ bytes.

⚠️ Pitfall

Premature promotion — allocating huge short-lived arrays can fill Eden and push long-lived garbage into Old Gen before it dies, causing expensive old collections. Heap cannot be freed to the OS easily—-Xmx reserved space often stays committed after spikes.

Java stack: frames, locals, operand stack

Each thread has its own Java Virtual Machine stack. A method call pushes a stack frame; return pops it. Instance methods receive this at local slot 0.

A frame contains:

Local variable array — fixed-size table: parameters, int i, reference variables, long/double take two slots.
Operand stack — bytecode is stack-oriented: iload_1, iload_2, iadd pops two ints, pushes sum.
Constant pool reference — for instructions like ldc, invokevirtual.
Return address — where to continue after return (used by interpreter/JIT).

int add(int a, int b) {
    return a + b;
}
// Compiled bytecode conceptually:
//   iload_1, iload_2, iadd, ireturn
// Frame locals: [this?] a b  (static methods omit this)

Default stack size -Xss (often 1 MB per thread on Linux). A service with 500 platform threads ≈ 500 MB stack alone—virtual threads (Java 21) shrink this cost for I/O workloads.

🎯 Interview Tip

Stack overflow is not heap OOM. Deep recursion or infinite mutual calls → StackOverflowError. Large local arrays on stack are rare in Java (arrays are heap objects); the reference is on the stack, data is not.

PC register & native method stack

PC (program counter) register

Per thread, holds the address of the current bytecode instruction being executed. After an instruction completes, the PC advances. If the thread is running native code (JNI), the PC is undefined—native code uses real CPU program counters on the native stack.

Native method stack

When you call System.arraycopy or socket I/O, HotSpot may enter an intrinsic or JNI implementation written in C++. That execution uses the native method stack (often the same pthread stack as the Java stack on HotSpot, interleaved). Native allocations are invisible to Java heap GC—memory leaks in JNI malloc require native profilers.

📦 Real World

Netty, compression libraries, and database drivers spend significant time in native stacks. “GC cannot save you” from native OOM caused by leaking off-heap buffers if you do not use pooled direct buffer accounting (-XX:MaxDirectMemorySize).

JIT compilation

HotSpot never interprets everything forever. It profiles bytecode execution counts and branch bias, then compiles hot methods to native machine code stored in the code cache.

Interpreter → C1 → C2

Stage	Compiler	Characteristics
0	Interpreter	Template interpreter / C++ loop; collects profiling data (invocation counts, branch taken %)
1	C1 (Client)	Fast compile (~ms); light inlining; emits profiling for tiered decision
2	C2 (Server)	Slow compile; aggressive inlining, escape analysis, loop unrolling, SIMD auto-vectorization

Tiered compilation (-XX:+TieredCompilation, default on server): methods start interpreted, get C1 quickly for speed, then “upgrade” to C2 when hot enough. Short-lived CLI apps may never reach C2.

Warm-up and thresholds

-XX:CompileThreshold — invocations before C2 compile (default tiered uses complex counters, not one number).
OSR (On-Stack Replacement) — compile hot loops inside cold methods (loop in rarely called handler).
Deoptimization — rare path assumption fails (e.g. class hierarchy change) → fall back to interpreter, fix assumptions, maybe recompile.

# Code cache — OutOfMemoryError: CodeCache if too small
-XX:ReservedCodeCacheSize=256m
-XX:InitialCodeCacheSize=64m

# Observe compilations (example output)
#   123    4   !   3   java.util.HashMap::getNode  (bytes,code) type level

java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions \
     -XX:CompileCommand=print,com.example.HotPath::process

💡 Pro Tip

Latency spikes after deploy often mean “cold code” (interpreted + C1). Warmup traffic, readiness probes hitting real endpoints, or JFR jdk.CompilerPhase events validate compile progress before taking SLA traffic.

Garbage collection

GC answers one question: which objects are still reachable from GC roots? Everything else is reclaimable. Algorithms differ in how they walk the graph, move objects, and pause application threads.

GC roots & reachability

Root set (starting points for tracing)—if there is no path from any root to an object, it is garbage:

Local variables and operand stacks of active stack frames (per thread)
Static fields of loaded classes
JNI global references (native code holding Java objects)
Objects used as monitors for synchronized
JVM internal tables (JNI handles, code cache oops, etc.)

Mark-and-sweep (foundation)

Mark: traverse from roots, set mark bit on reachable objects (tri-color abstraction: white/grey/black in concurrent collectors). Sweep: walk heap, free unmarked objects—creates fragmentation in old gen. Production collectors add copying (young gen eliminates fragmentation) and compaction (slide live objects, update references).

Collector reference

Collector	Mechanism	Pause profile	When to choose
Serial	Single GC thread, STW young + old	Pause grows with heap	Single-core embedded, tiny heaps
Parallel (Throughput)	Multi-thread STW copy + compact	Long pauses, max CPU on GC	Batch ETL, offline analytics—throughput over latency
CMS	Concurrent mark, STW remark/sweep	Low average, fragmentation risk	Legacy only — removed Java 14+
G1	Heap split into regions; incremental evacuation; default Java 9+	Target pause ms via MaxGCPauseMillis	General-purpose services, 4–32 GB heaps typical
ZGC	Colored pointers, concurrent relocate	Sub-ms pauses at large heaps	Latency SLO strict, heaps 8 GB+ Java 15+
Shenandoah	Brooks forwarding pointers, concurrent compact	Similar to ZGC goals	Red Hat builds / OpenJDK with Shenandoah enabled

Minor GC vs Major GC vs Full GC

Minor GC / Young GC — collects Eden + survivors (all algorithms). Frequency high; duration often milliseconds on healthy heaps.
Major GC / Old GC — old generation collection. With G1, often appears as “Mixed” evacuation of old regions with garbage.
Full GC — stop-the-world collection of entire heap (and often class unloading). Triggers: Metaspace pressure, System.gc(), heap exhaustion, promotion failure. P99 killers—minimize frequency.

Promotion failure: Survivors cannot hold all live objects after minor GC → objects promoted to fragmented old gen → triggers full GC. Sign of survivor sizing, allocation burst, or old gen too full.

🎯 Interview Tip

Explain tri-color concurrent marking: mutator must not hide objects from collector (write barriers). Contrast throughput (Parallel) vs latency (G1/ZGC). “GC is automatic” ≠ no tuning—allocation rate dominates.

GC tuning flags & reading logs

Tune from metrics: allocation rate (MB/s), GC pause P99, old gen occupancy after full GC, Metaspace slope. Flags below are starting points—always validate on load tests mirroring production shape.

Essential JVM flags

Flag	Effect
`-Xms` / `-Xmx`	Initial and maximum heap. Set `-Xms=-Xmx` in containers to avoid resize and OS commit churn.
`-XX:+UseG1GC`	Enable G1 (explicit if your JDK defaults differ).
`-XX:MaxGCPauseMillis=200`	Soft pause target—G1 adjusts region sizing and mixed GC cadence (not a hard guarantee).
`-XX:MaxGCPauseMillis` too low	Tiny regions → overhead; too high → long pauses. 100–300 ms common for APIs.
`-XX:+UseZGC`	Enable ZGC; pair with `-Xmx` sizing for large heaps.
`-XX:InitiatingHeapOccupancyPercent`	G1: start concurrent marking when old % exceeds threshold (default 45).
`-XX:+HeapDumpOnOutOfMemoryError`	Post-mortem MAT analysis on heap OOM.

# Production-style G1 logging (Java 9+ unified JVM logging)
java -Xms4g -Xmx4g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -XX:+HeapDumpOnOutOfMemoryError \
     -Xlog:gc*,gc+age=trace:file=gc-%t.log:time,uptime,level:filecount=5,filesize=50m \
     -jar service.jar

# ZGC example (Java 21)
java -XX:+UseZGC -Xms8g -Xmx8g -Xlog:gc*:file=gc.log -jar service.jar

Reading GC log lines

Young collection (G1):

[2.145s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause)
  [2.145s][info][gc] GC(12) Eden regions: 24->0(20)
  [2.145s][info][gc] GC(12) Survivor regions: 2->3(4)
  [2.145s][info][gc] GC(12) Old regions: 10->10
  [2.145s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 24M->8M(256M) 12.345ms

Eden regions: 24→0 — Eden evacuated completely.
24M→8M(256M) — heap used before → after (committed capacity).
12.345ms — STW pause—what hits your API latency for this event.

Mixed GC (G1): collects some old regions with high garbage alongside young—key to G1 old-gen hygiene without full heap pause.

Full GC: log line containing Pause Full — investigate Metaspace, System.gc(), or allocation failure immediately.

Correlate logs with jstat -gcutil <pid> 1000ms: watch O (old) climb toward 100% and FGC count increase.

⚠️ Pitfall

Kubernetes memory limit must exceed -Xmx + Metaspace + thread stacks (-Xss × thread count) + code cache + direct memory. OOMKilled by kube with heap at 60% means native/memory overhead—not a Java heap leak.

JVM flags & diagnostic tools

When the process is alive but slow, or CPU is pegged, you need a toolkit that maps symptoms to layers: threads (stuck?), heap (what objects?), GC (pauses?), native (off-heap?).

JDK CLI tools (same JDK as the process)

Tool	Command	Use when
jps	`jps -lvm`	List JVM PIDs, main class, args—find the right process on shared hosts.
jstack	`jstack -l <pid>`	Thread dump: deadlocks, pool exhaustion, stuck I/O, virtual thread mount info (21+).
jmap	`jmap -histo:live <pid>`	Class histogram after full GC—what dominates heap (char[], byte[], your DTOs).
jmap	`jmap -dump:format=b,file=heap.hprof <pid>`	Full heap dump for Eclipse MAT / VisualVM—heavy, capture during incident.
jstat	`jstat -gcutil <pid> 1000ms`	Live S0/S1/E/O percentages, YGC/FGC counts—cheap GC dashboard.
jcmd	`jcmd <pid> help`	Swiss army: JFR, VM.flags, GC.heap_info, Thread.print.

# Incident capture script (run in same second)
PID=$(jps -l | awk '/com.example.Application/{print $1}')
jstack -l $PID > threads-$(date +%s).txt
jmap -histo:live $PID | head -80 > histo-$(date +%s).txt
jcmd $PID VM.flags > flags.txt
jcmd $PID JFR.start name=incident duration=120s filename=incident.jfr

# Thread dump — look for:
#   java.lang.Thread.State: BLOCKED (on object monitor)
#   "pool-1-thread-3" waiting on java.util.concurrent.locks

jstat -gcutil $PID 1000ms

jconsole

JDK-bundled JMX GUI. Attach to local PID or remote process with JMX ports open. Graphs heap, threads, classes, CPU. Good for quick visual trends; less detail than JFR for deep dives. Enable JMX remotely only with authentication and TLS in production.

VisualVM

Desktop profiler (standalone or via GraalVM downloads). Heap walker, sampler CPU, thread snapshots, heap dump analysis. Compare histograms between two points in time to find leak suspects (objects growing without bound).

async-profiler

Low-overhead sampling profiler using perf_events (Linux/macOS). Produces flame graphs without Safepoint bias that inflates certain JVM stacks. Modes:

cpu — where CPU time goes (hot methods after JIT)
alloc — allocation hotspots (who creates byte[]?)
lock — contended locks

./profiler.sh -d 60 -e cpu -f /tmp/cpu.html <pid>
./profiler.sh -d 60 -e alloc -f /tmp/alloc.html <pid>

Java Flight Recorder (JFR)

Built into OpenJDK. Low overhead when configured (-XX:StartFlightRecording or jcmd JFR.start). Events: GC pauses, socket I/O, method samples, allocation outside TLAB, lock inflation. Analyze in JDK Mission Control (JMC)—timeline correlates GC pause with thread blocked.

💡 Pro Tip

Workflow: jstat confirms GC pressure → GC logs identify pause type → jmap histo finds dominant classes → heap dump if leak → async-profiler alloc finds allocator → JFR for time-correlated story. Never change -Xmx without this evidence chain.