Java

The weirder the variable name, the faster the JVM?

Published Time : 2025-11-28

The weirder the variable name, the faster the JVM?

In the consensus of software engineering, the clearer the variable naming, the better - with clear intent, complete semantics, and a clear understanding of the name, which can reduce communication costs, minimize misunderstandings, and improve maintainability. Almost all style guidelines consider 'meaningful naming' as the first principle.

But the article I read today, "Java Performances Better When You Missed Variable Names," overturned the "performance part" of this iron rule: in some stacks of Java, deliberately shortening or even "misspelling" variable names may really make services faster. It's not a change in business logic, but shorter and more "random" names that are more efficient in string constant pooling, hashing, and reflection paths. In the author's stress test, the highest throughput improvement was close to 49%. This may sound counterintuitive, but he turned it into a serious proposition using micro benchmarks, pressure gauges, and analyzers.

How was this discovered

The story begins with an 'accident'. The author accidentally wrote variables such as custEmil, ordrRegistry, and totlAmnt instead of 'customer email', 'orderHistory', and 'total amount' during the refactoring process.

The next day, the monitoring showed that the average latency dropped directly from 127ms to 80ms. The author initially suspected that it was a cache accidental hit, network fluctuation, or measurement error, but rolled back to "clean naming" and the latency returned to 127ms. This forced him to take this matter seriously.

So he systematically conducted verification. Using JMH to write a control experiment, the two versions of the code have completely identical logic, with the only variable being "naming length and form": one version uses standardized, complete, and readable names, while the other version removes vowels, shortens prefixes, and occasionally makes names more random. Then there is a closer validation to production: applying the same strategy to a Spring Boot service and comparing the throughput and latency of the two versions under 1000 concurrent, 60 second JMeter stress testing. Finally, use an analyzer (such as YourKit) to see if the string related hotspots are decreasing.

Data and Analysis: Not "mysticism", but an overlooked path in the cost stack

In the micro benchmark, the author reported that removing only vowels can bring about a 26% improvement; And when the name is shorter and more "messy" (such as abbreviations of three or four characters or meaningless combinations), the improvement is more obvious. In the pressure test, the average response decreased from 143ms to 91ms, the throughput increased from 6847 req/s to 10234 req/s, and the error rate remained unchanged. The analyzer shows a significant decrease in the total time consumption of String. hashCode() (with the same number of calls), but the total time consumption of short names is reduced by nearly one second (within a 60 second window).

Why is it possible to establish? Because the JVM's string constant pool (String Table) is a hash table structure, reflection, debugging, stack, and frame introspection will constantly trigger searches and hashes on these strings. Names with long and similar prefixes are more prone to hash collisions, longer search chains, poorer cache locality, and the cost of GC scanning and preserving these strings during the tag clear phase is also higher. JIT can optimize calculations, but it cannot eliminate the fixed costs of string tables, reflections, and GC. Short and more "random" names often have better hash distribution, lower collision rates, and more friendly cache hits.

This also explains an uncomfortable reality: in reflection intensive stacks (Spring, Hibernate, Jackson, etc.), the name is not "runtime free". On certain paths, the length and distribution of names can become measurable costs.

What should we do: Naming is no longer just a matter of style

After knowing this conclusion, should we adjust our naming strategy? I think it should be used, but only in the appropriate places and with clear boundaries.

Analyze first, then move: Use an analyzer to locate string related hotspots (such as reflection entry, serialization/deserialization, frame introspection, StringTable), and confirm that they are indeed eating up your time.

Only call names at hotspots: limit the strategy to high-frequency reflection and serialization types, fields, and methods; Keep domain models and business rules readable, don't turn team collaboration into a puzzle game.

Conservative priority, daredevil aviator:

Conservative: Remove obvious vowels and shorten prefixes (customer-EmailAddress → cstmrEmlAdr), with a target gain of 8-12%.

extreme mode: Actively shorten and weaken similar prefixes (orderHistoryList → ordrStryList), with a target gain of 18-24%.

Extreme level: A strong abbreviation of three to four characters (total amount paid → tAP), which may have higher gains, but is not recommended for use in the core business domain of production.

Alternative solutions: Replace reflections with code generation/annotation processors; Choose a more efficient implementation for the serialization layer; If necessary, fine tune - XX: StringTableSize and conduct comparative verification.

Engineering verification: Set reliable benchmarks (JIT preheating, fixed parameters, shielding I/O interference), observe changes in p95/p99 and throughput, and then decide whether to promote it.

Reflection: Who should compromise between data and dogma?

If naming is a cost in certain stacks, should we establish a 'hotspot naming strategy' that, like a performance budget, allows sacrificing some readability on a few critical paths in exchange for throughput?

Is the effect consistent across different JVM versions, GC strategies, and framework combinations? Can it be made into a reproducible experiment using a toolchain (renaming tool, lint rule, benchmark kit)?

How to convert the named "readability benefits" and performance "throughput benefits" into the same cost table as the team size increases? Should this be driven by data rather than style uniformity?

Summary

This article has made me re-examine a premise that has remained unchanged for many years: naming is only a matter of readability. The author turned it into a performance issue using micro benchmarks, pressure gauges, and analyzers. In systems that require extreme throughput, names may no longer just be "for people to see", they are also affecting "for machines to run". My answer is: strategically adjust naming, but only on hotspot paths, and make decisions based on data rather than intuition. After all, in the world of engineering, beautiful code is not necessarily the fastest code, and sometimes what we need is the real boost that can withstand traffic.