JVM performance watch roundup April 2025

I post notes on JVM performance-related news on social media whenever a patch, JEP, or benchmark catches my eye. This blog post collects everything shared in April 2025. Each section rewrites the original thread in plain text.
Follow me on to get these updates as soon as they go out.
1. JFR to gain CPU-time-aware sampling
A new JEP draft proposes CPU-time profiling in JFR via a new jdk.CPUTimeSample
event.
Unlike today’s JFR sampling, which can’t distinguish between running and waiting threads, this upgrade measures actual
CPU time consumed between timestamps.
Samples will include thread ID, timing, and CPU delta, and can be configured by interval or frequency. This allows more accurate attribution of CPU usage in profiling tools.
JProfiler will use this to highlight real CPU hot spots in JFR snapshots by identifying code that was actually executing rather than just present in stack traces.
2. Scoped values finalized in Java 25
Scoped values are replacing ThreadLocal as a safer and faster way to pass context, especially in structured concurrency and virtual threads. Finalized by JEP 506 after five previews, they arrive in Java 25.
Unlike ThreadLocal, scoped values are immutable and don’t require copying to child threads. This eliminates overhead in virtual thread hierarchies and prevents bugs caused by unintended sharing of mutable state.
They also reduce the risk of memory leaks. ThreadLocal often leads to memory leaks when large objects or class loaders are accidentally retained. Scoped values avoid this pattern by enforcing scoped, immutable context.
JProfiler’s heap walker already includes an inspection for thread-local leaks. Scoped value support is planned for a future version.
3. Compact Object Headers proposed as production feature
JEP 450 introduced Compact Object Headers in Java 24 as an experimental feature: shrinking object headers to 64 bits on 64‑bit systems. A new JEP draft now proposes promoting it to a product feature.
This change cuts live heap usage by 10–20%, improves cache locality, and reduces GC frequency and duration. In production tests (for example at Amazon), it led to:
- 20%+ heap reduction (SPECjbb2015)
- 8–10% lower CPU usage
- 15% fewer GCs
The mechanism packs class metadata and hash code into a compact field. The trade-off is more complex GC and locking logic in exchange for memory savings in large heaps.