How to optimize Java startup and runtime performance

In this guide, we cover how to use strategies like AppCDS, native compilation, and CrAC to improve the performance of your Java applications.

By Josh Cummings

Apr 18, 2024 • 8 Minute Read

Please set an alt value for this image...

Subscribe to the newsletter

In this article, we’ll look at the state of performance optimization in the Java ecosystem. Specifically, we’ll look at older technologies that are coming of age like AppCDS, and newer technologies like compiling natively with GraalVM and taking warmed snapshots with CrAC. By the end, you’ll better understand the differences between each technology and what performance issues each seeks to solve.

An Overview of Java’s Performance Concerns

While most of your performance challenges are likely due to I/O, network, or your application logic, Java has received a reputation over the years for being slower than other programming languages mainly due to two design decisions: Performing Garbage Collection and Interpreting Bytecode.

In the last decade, substantial effort has been undertaken by the Java community to create increasingly efficient garbage collectors, develop tools for analyzing memory management, and evangelize collection-friendly practices for developers to follow. Fantastic progress has been made, and today these conversations are an acknowledgment of the security benefits of memory management over remaining operational costs.

Java still is on the hook, though, for running through an interpreter, a necessary constraint to ensure portability. One issue is how long it can take the JVM to warm up as it loads and initializes its classpath. Another is the fact that having a layer of abstraction will always be inherently slower than talking directly to the operating system.

And while there are (link to javap article)tools like javap(/link) that one can use to try and optimize your bytecode, such does not easily scale to dozens or hundreds of individual microservices that interoperate. As that architectural pattern gains more purchase in the industry, Java has introduced cross-cutting technologies that allow you to take snapshots of the JVM warmup step or to bypass the interpreter altogether. These allow you to get the same performance characteristics of system languages like C, Go, and Rust.

So let’s jump in and take a look at three common strategies--AppCDS, native compilation, and CrAC--and see how they compare. In each, we’ll use Spring’s PetClinic, an application that is often used in the Java community for benchmarks of a non-trivial Java application.

Capturing a baseline

The two metrics that will be of most interest to us for comparison are PetClinic’s average startup time and its average request time. I ran PetClinic on a Dell Precision 5560, Intel i7, Ubuntu 22.04 machine and achieved an average startup time of 8.12 seconds and an average request time of 0.012 seconds.

Working with AppCDS

As you already know from earlier, one cost across all Java applications is the time needed for the classloader to load classes and initialize them. In 2004, Java introduced a commercial offering called CDS which provided a mechanism for storing loaded classes on the filesystem in such a way that they can be shared between JVM instances. This was made freely available in 2018 with JDK 10, and then further simplified in JDK 13.

To use AppCDS, you must first record your application in use so that the AppCDS archive can be built following the classes that your application uses. You can achieve this by adding the -XX:ArchiveClassesAtExit property like so:

      java -XX:ArchiveClassesAtExit=build/classes.jsa -jar petclinic.jar

... Where build/classes.jsa is where the archive will be written.

This places the JVM in a kind of “recording mode” where it will note each time it loads a class. You can gracefully shut down the application at this point and the archive will be written with the classes that were loaded so far. Or, you can exercise various sections of the application to induce additional class loading.

You can imagine taking this memory-mapped cache and including it in a Dockerfile and using that as the image for your application container instances.

For every subsequent run of the application, you include the archive like so:

      java -XX:SharedArchiveFile=build/classes.jsa -jar petclinic.jar

... And the work that the JVM would normally do to load those classes will already be available to it, improving startup time.

On my local machine, startup times were reduced to 7.314 seconds for a 10% performance improvement without changing or adjusting any of my application code.

Limitations

While the idea of improving startup time by 10-50% without changing any code is lovely, it has its limitations as well. For one, the memory mapping is specific to the operating system it is running on. For another, it’s also JVM-version-specific.

These two reduce the cache’s reusability across disparate Java applications that may be on different JVM versions or different operating systems. As such, you may consider only sharing a given AppCDS cache between the instances of the same application.

Compiling Natively with GraalVM

Overview

An early optimization in the JVM was the JIT compiler. Instead of interpreting every command from bytecode to machine code at runtime, the JVM could compile chunks of bytecode into machine code at runtime, as needed (“just in time”). This optimization struck an enticing balance between having architecture-independent bytecode and an efficient runtime.

Instead of JIT compiling, GraalVM performs Ahead-of-Time compiling, which is to say that it compiles the bytecode completely into machine code before the application is run. Ostensibly at that point, the resulting binary is no different than any other executable since it no longer needs a JVM to run it.

GraalVM provides other interesting benefits as well. For example, as part of its compilation process, it performs reachability analysis to remove code on the classpath that won’t be reached by the application. The benefit here is that the effective classpath is much smaller, making for a smaller executable, less memory overhead, and a more secure binary.

To see the benefits, you compile your application with a GraalVM instead of a JVM. With PetClinic, that works like this:

      ./gradlew :bootBuildImage

This process takes much longer than JIT compilation. For example, PetClinic takes about 7 minutes to compile on my machine. Enterprise applications could take much longer than that. However, you can still compile it the traditional way locally, and only do this extra compilation at deployment time.

The resulting benefit is quite startling. Again in the case of PetClinic you can see it by running the resulting Docker image like so:

      docker run --rm -p 8080:8080 docker.io/library/spring-petclinic:latest

For example, on my local run of compiling it into a native image saw the average startup time go down to 1.128 seconds and the average request time drop to 0.055 seconds for performance improvements of 86% and 51% respectively!

Limitations

Certainly, such improvements must also come with significant tradeoffs. The first and already mentioned is that compilation time is much longer. The second is that it can be hard to debug when a given class is pruned due to unexpected behavior in the reachability analysis. There is also only limited support for the Reflection API, which is what delayed some of GraalVM’s adoption while it waited for frameworks like Spring to remove substantial amounts of reflection use.

Taking Snapshots with CrAC

The newest of the trio is CrAC. CrAC is a next-generation improvement over AppCDS. Instead of only archiving loaded classes, you can use CrAC to make a snapshot of an entirely warmed-up JVM. You can think of this as the ability to place your laptop into hibernation by closing the lid.

The setup for CrAC is similar to AppCDS in that you supply an argument to a cold JVM, warm up your application through usage, and then shut it down to allow CrAC to create the snapshot and write it to the filesystem.

To do this you’ll need a JVM that has CrAC integrated. At that point, you can run:

      java -XX:CRaCCheckpointTo=cr -jar petclinic.jar

When ready, you send the following command to create the snapshot:

      jcmd petclinic.jar JDK.checkpoint

With the snapshot taken, you can now restart the application from that checkpoint like so:

      java -XX:CRaCRestoreFrom=cr

With the following arrangement, the startup time is an astonishing 0.061 seconds! Also, request time, while not the same as with a native image, is much improved since it is performing against a warm JVM.

Limitations

While in its early stages, CrAC has some significant limitations to be aware of. Primarily, since it is a snapshot of the entire VM, that means that anything in the string pool, including passwords and other sensitive information, will now be written into the snapshot. You may need to be strategic about when you take the snapshot, say before sensitive properties have been loaded into memory. Also, CrAC will likely require some amount of code rewriting to ensure that ports and other I/O are shut down before the snapshot is taken and restarted when the snapshot is restored.

Conclusion

In this article, you received a brief overview of three technologies that are gaining traction in the Java community. As you consider which best suits your needs, remember that it’s more than just a shiny startup time. With GraalVM, there are significant investments in compilation time and in possibly rewriting code to ensure reachability; it can also be difficult to debug. With CrAC, there are notable security limitations that need to be mitigated by your platform to be safe for production. Also, like GraalVM, a certain amount of rewriting of your code to support being snapshotted is likely.

Still, these are exciting days for the Java community! I encourage you to try these technologies out today for yourself and see which can be used to improve performance in your Java applications.

Further learning

If you found this article helpful why not check out Pluralsight's dedicated learning paths on Java? Written for tech professionals, by tech professionals, each pathway is scaled so you can start at your current proficiency level. Here are some suggestions:

If you're interested in Java, you may also want to check out these resources on the Spring framwork, an open-source framework for building robust and scalable Java applications. Pluralsight offers path on core Spring as well as Spring Security.

Josh C.

Like many software craftsmen, Josh eats, sleeps, and dreams in code. He codes for fun, and his kids code for fun! Right now, Josh works as a full-time committer on Spring Security and loves every minute. Application Security holds a special place in his heart, a place diametrically opposed to and cosmically distant from his unending hatred for checked exceptions.

More about this author