Upgrading to Java 8 at Scale

The vast majority of HubSpot's backend code is written in Java. We have over 350 separate Java deployables including Dropwizard APIs, Kafka consumers, Hadoop jobs, cron jobs, and more. So when we saw the features that Java 8 brings to the table (lambdas, streams, method references, CompletableFuture, and more) we couldn't wait to upgrade and start using it. A lot of companies our size don't make these types of upgrades, it's too hard or too risky is usually the thinking. But as you fall further behind, it just becomes harder and riskier, and eventually you end up stuck with no upgrade path. This accumulation of technical debt slows down the entire development organization so keeping our stack current is a top priority at HubSpot.
 
In our case, the upgrade took nearly 5 months to complete and it was full of hurdles, roadblocks, and other surprises. We kept notes on our migration strategy and all of the potholes we hit to hopefully make the process a little bit easier for the next brave souls.

Step 1: Make JDK8 available everywhere

The first step is to install JDK8 everywhere you run Java. This includes build servers, application servers, and developers' machines. At this point JDK8 isn't used for anything, but you want to make sure that it is available in all of these environments. Thankfully, we use puppet to manage our production servers and boxen to manage our development environment which make it easy to push out this type of change. We installed JDK8 and made the Java 8 executable available as java8, which will come in handy later. 

Step 2: Start building with JDK8

The next step is to start building with JDK8. We use Maven to build which reads the JAVA_HOME environment variable. Building with JDK8 just involved updating our Java buildpack to set JAVA_HOME to /usr/java/jdk1.8.0_40 (or wherever you put JDK8 in the previous step). It's important to note that you're not building Java 8 class files at this point, and you can't start using Java 8 features. You're still compiling Java 7 class files - you're just using JDK8 to do it (we control the language level via the source/target properties on the maven-compiler-plugin).

Step 3: Start running with JDK8

Next you want to start actually running Java applications with JDK8. At HubSpot our Java apps invoke a command called java-wrapper to start up instead of calling java directly. This is just a small shell script we wrote that adds a bunch of JVM settings (memory settings, GC settings, tmp dir, etc.) and then calls java. Since java-wrapper is managed by puppet, we can just change the script to invoke java8 instead of java. We rolled this out to our test environment first and let it run for a few weeks before changing production.

We noticed that some of our apps were occasionally OOMing after this change. We're still in the process of tweaking memory and GC settings for Java 8, but decreasing the ratio of heap to off-heap memory has helped (probably because Java 8 moves a lot of data to MetaSpace which uses native memory).

Step 4: Find incompatible dependencies

So now you're ready to start compiling as Java 8 and writing lambdas, right? Not so fast! Now comes the hard part: making sure all of your dependencies are compatible with Java 8. We started by googling around to find incompatible dependencies (guice java 8, jersey java 8, etc.). After sifting through a bunch of bug reports, the good news is that the incompatibilities all boiled down to a single library: ASM. The bad news is that ASM is used all over, making it very difficult to upgrade. ASM is a bytecode generation library; it lets you read, write, or modify Java class files at runtime. The latest version (5.0.3) works well with Java 8, but unfortunately older versions don't play well with some Java 8 bytecode. ASM is pretty low-level so libraries often use a more user-friendly wrapper, most commonly CGLIB. Unfortunately, the latest released version of CGLIB (3.1) still depends on ASM 4.2. However, they did merge our PR into master, so if you want a Java 8 compatible version of CGLIB you can build and run master for now.

UPDATE: CGLIB 3.2.0 has been released!

You will need to identify all of your dependencies that rely on ASM or CGLIB. To do this we wrote a script to build the Maven dependency tree for each of our projects and write it to a single file. It looked something like this:

#!/bin/bash

for d in $JENKINS_HOME/workspace/*/ ; do
(cd $d; [ -f pom.xml ] && mvn -B dependency:tree -Dverbose=true -Dexcludes=com.hubspot -DoutputFile=/deptrees.txt -DappendOutput=true)
done

You can then grep this file to see where you have transitive dependencies on ASM or CGLIB. However, many libraries bundle and relocate ASM and CGLIB using JarJar or Maven shade plugin. You also need to find libraries in this category, but the previous strategy won't work because the bundled dependencies don't show up in the Maven dependency tree. Instead, we used a script that copied all of our dependencies into a single folder. It looked something like this:

#!/bin/bash

for d in $JENKINS_HOME/workspace/*/ ; do
(cd $d; [ -f pom.xml ] && mvn -B dependency:copy-dependencies -Dmdep.prependGroupId=true -DoutputDirectory=/jars)
done

Once you have all of the JARs in a single folder, use the jar tf command to explode the JARs using a script like this:

#!/bin/bash

for j in /jars/*.jar ; do
  jar tf $j | grep -e .class$ | nl -s "$j " | cut -c7-
done

This prints the name of every class as well as the JAR file containing it. You can grep this output for ASM or CGLIB class files, which will reveal which libraries have bundled ASM and/or CGLIB. ClassVisitor.class is a pretty good search term for ASM, and Enhancer.class works well for CGLIB. If you copy the previous snippet into a file called explodejar.sh, the command to find JARs that bundle CGLIB would look like ./explodejar.sh | grep -E '/\$?Enhancer.class' (the optional $ match is needed because some libraries, such as Guice, prepend a $ character to bundled class names).

Step 5: Upgrade incompatible dependencies

Now that you've found all of the dependencies that use ASM or CGLIB, you need to make sure they're using ASM 5.0.3 and CGLIB 3.2.0. The best way to accomplish this needs to be determined on a case-by-case basis for each library (this turned out to be the most time-consuming part of the Java 8 migration). Some libraries can simply be excluded completely. For example, some of our projects had dependencies on jruby 1.6.5 (which bundles ASM 3.3.1). Before trying to upgrade this dependency, we noticed that it was being brought in transitively by HBase. We knew that our HBase clients didn't use the jruby features, so it could safely be excluded, avoiding the need to upgrade it. Of course, many libraries need to be upgraded. One example is jersey-server, which in our case needed to be upgraded from 1.17.1 to 1.19. Given the number of Java projects we have, a big concern when upgrading libraries is backwards-compatibility. There are three types of compatibility: source, binary, and behavioral (covered in detail here). Because many of our libraries were already compiled against jersey-server 1.17.1, source compatibility isn't strong enough. We need to ensure binary compatibility, otherwise we will get exceptions at runtime. There is an excellent tool for this called the Java API Compliance Checker. You give it the old and new JAR files and it tells you whether they're binary or source compatible. An invocation to test jersey-server compatibility might look something like this (ironically it won't run with JDK8 so we set the jdk-path to JDK7):

perl japi-compliance-checker.pl -lib jersey-server -jdk-path $JAVA7_HOME -old jersey-server-1.17.1.jar -new jersey-server-1.19.jar

When it finishes you will see some output like this:

creating compatibility report ...
result: COMPATIBLE
total "Binary" compatibility problems: 0, warnings: 0
total "Source" compatibility problems: 0, warnings: 0
see detailed report:
compat_reports/jersey-server/1.17.1_to_1.19/compat_report.html

In this case the new version was fully binary-compatible with the previous version, but if it wasn't you could look at the compatibility report it generated to see exactly which methods and classes were removed or mangled between versions.

This is an awesome tool for checking binary and source compatibility, but unfortunately it doesn't get you anywhere with behavioral compatibility. Some libraries make detailed release notes for each version which can be very helpful. And depending on how active the project is, it might also be feasible to go through the commits on GitHub since the previous version and check manually for any changes in behavior. Ultimately, however, it's very difficult to spot behavioral changes between versions ahead of time, so the best approach is usually to upgrade the dependency in some of your non-mission-critical services to test the waters before rolling it out everywhere.

In some cases, the latest version of a library still depended on an old version of ASM. In this case, you could open an issue suggesting that the library upgrade its version of ASM to 5.0.3 for Java 8 compatibility (or even better, send a PR). For projects that fall in this category, our approach was to run our own fork until the library released a Java 8 compatible version, and then upgrade to that.

Step 6: Compile your code as Java 8

Now you're ready to actually compile your code as Java 8. At HubSpot we have over 1,000 POM files, so managing dependency versions and build plugins can be a challenge. To help with this, all of our Maven projects extend from a shared HubSpot parent POM. This allows us to centralize dependency information using a dependencyManagement section, configure build plugins using pluginManagement, and add validation plugins to enforce best practices. To start compiling at a Java 8 language level, we updated the configuration of the maven-compiler-plugin to set source and target to 1.8. We also added the -parameters flag to enable parameter name reflection. Then we crossed our fingers and rebuilt all of our Java projects. Because we put in so much work up front, this step went pretty smoothly. There were a few unit test failures due to the change in HashMap ordering in Java 8 (documented here), but our tests shouldn't have been relying on HashMap ordering and were easily fixed.

Once we got everything building successfully, we used the maven-enforcer-plugin to make sure dependencies incompatible with Java 8 didn't sneak back in to any of our projects by adding a configuration like this to our base POM. 

Step 7: Profit!

Now that you're compiling as Java 8, it's time to enjoy all those new features! At HubSpot, we're taking full advantage of lambdas, streams, and method references to make our code more concise and readable. So far it's been great, the only pseudo-gotcha we hit was with the new Optional class. Previously we were using Guava's Optional, but started shifting to the native Java 8 version going forward. As we started doing this, we ran into cases where we had custom handling of Guava Optionals that needed to be ported to the Java 8 version. Specifically, we needed to add Optional handling to Jackson, jDBI, and Jersey to bring it to feature-parity with the custom handling of Guava's Optional. Other than that, it's been smooth sailing and clear skies.


 Did you hit a different roadblock upgrading to Java 8? Did it go smoothly? Let us know in the comments below.
Jonathan Haber

Written by Jonathan Haber

Jonathan Haber is an engineer on HubSpot's Platform team

Comments

Subscribe for updates

New Call-to-action