Written by Charles Connell, Senior Software Engineer II @ HubSpot
HubSpot stores most of our customers’ data in a data store named HBase. We currently have about 5 petabytes of this data in our larger region (North America) and less than 1 petabyte in our other region (Europe). The HBase team at HubSpot maintains two processes to back up this data regularly to ensure we do not lose our customers’ data in the event of a failure.
One of these processes is the “write-ahead-log persister”, which copies HBase’s write-ahead-logs to an S3 bucket continuously. This ensures that writes to HBase are backed up to a location outside of the database cluster within minutes. The other process is HBase’s built-in backup tool, which we run daily for every HBase table. We are in the process of switching to this over a home-made backup system.
Noticing a Problem
Over the course of 2023, the HBase team began to require that any process accessing files inside an HBase cluster (in other words, using HDFS) do so over a secure connection. HDFS supports Kerberos out-of-the-box, so we began using that to authenticate and encrypt connections to HDFS. As soon as we started doing this, we noticed that our WAL persisters began crashing periodically. This was not a major problem for us, because they would start back up automatically and continue working. Later, when we attempted to replace our home-made backup system with HBase’s built-in tool, these processes began crashing too, which was a much bigger problem for us. I was asked to investigate the issue.
Collecting Information
To start with, I looked at the logs of the crashing processes. In every case, the logs indicated that everything was normal, until suddenly something like this:
This shows that the Java Virtual Machine terminated because it received a segmentation violation signal from the kernel. Java handles this signal by saving debug information and then exiting. All Java code immediately stops executing when this happens, with no cleanup. The Java language does not provide a path for Java code to cause a segmentation fault, so its occurrence means that the Java interpreter, or some library linked into it, must have a bug. To find the bug, I needed more information. The logs indicate that more information is saved in the file hs_err_pid2.log (full copy). I grabbed a copy of that file, and a copy of the core dump file associated with this crash. I’ll also note here that HubSpot uses a mix of x86-64 and ARM machines, and this crash only happened on x86-64.
Before talking about those files, a recap on what causes segmentation violations. On many computer hardware architectures, including the x86-64 architecture used here, memory is divided into “pages.” When the operating system wishes to allocate a page of memory to the currently running process, it does so by modifying state in the memory management unit, a component of the computer hardware. For each page, the MMU tracks the relationship between the address used by the executing program to refer to that page, and the address that the page is stored in the memory hardware, which can be different. It also stores some permissions metadata indicating whether the page should be readable, writable, and/or executable. Any CPU instruction that accesses memory must check in with the MMU in order to resolve the correct hardware address, and also check the permissions metadata. If and when a CPU is given an instruction that reads or writes to a memory address that is not known to the MMU, or is known but not allowed by the containing page’s permissions, then the CPU will refuse to execute that instruction. The CPU will then notify the operating system kernel of this event by raising a “fault,” one of several flavors of notification that the hardware can give to the kernel. Unix-based kernels will then, in turn, send a Unix signal to the process that triggered the fault.
Back to the error log file, which contains a wealth of information. To start with, this line contains the details of the segmentation violation:
SIGSEGV confirms the type of signal received. SEGV_ACCERR means that the sub-category of violation is permission-related. The memory that the program attempted to access is known to the MMU (meaning it belongs to this program), but we violated permissions with what we were attempting to do with it. Finally, si_addr: 0x000000073a200000 tells us the address that the program tried to access that triggered the violation.
The error log file has details of the memory regions used by Java’s heap (a Java concern), and the pages mapped to the program by the kernel (an OS / hardware concern). A snippet of this file shows that the illegally accessed address lies one byte outside the end of a heap region (first line):
And the next line shows that the next heap region in non-contiguous:
Turning to the mapped memory listing,
We can see that the illegally accessed address 0x000000073a200000 lies at the start of a page that is not readable, writeable, or executable. That must be why we got the SEGV_ACCERR flavor of SIGSEGV, for violating a permission.
Understanding the Bug
We still don’t know yet what the program was actually doing when it triggered the violation. From the summary, we know that it was running in a function named StubRoutines::counterMode_AESCrypt. I looked at the source of Java 17.0.8+7 for this, and found that it boils down to a routine that performs AES-CTR encryption in hand-written pseudo-assembly. The error log includes the state of the program counter, also known as the instruction pointer, which you can also see in the snippet above: pc=0x00007f82b09d94d2. To find the instruction at this address, I turned to the debugger gdb. I launched gdb with two arguments, the java executable, and the core dump file. I ensured that the executable was for exactly the same version of Java that was involved in the crash. Then, I disassembled the memory at the position of the instruction pointer:
Which resulted in:
So, we know that it’s a vpxorq instruction that is triggering the segmentation violation, because that is the instruction at the location of the instruction pointer. The excellent felixcloutier.com has an easily readable version of the Intel instruction manual explaining what this instruction does. It’s a fancy exclusive-or operation that can operate on extra-wide registers. In this case, it’s taking inputs from a memory location and the xmm0 register, XORing them together, and storing the result in xmm0. XMM registers are 128 bits (16 bytes) wide, so the operation operates on 16 bytes.
We’ve narrowed down the problem slightly: we now know that the segmentation violation is caused by a memory read, not a memory write. If an instruction is reading memory that it shouldn’t, there must be a bug in the code. To understand the bug, I needed to find that instruction in the JDK’s source code. I compared the source code with the disassembled code that gdb showed me. Eventually I found where they lined up. I’m showing the gdb disassembly on the left, and the JDK source code on the right:
Disassembled machine code (AT&T style, source then destination) |
JDK source code (roughly Intel style, destination then source) |
bind(EXTRACT_TAILBYTES); |
|
// Save encrypted counter value in xmm0 for next invocation, before XOR operation movdqu(Address(saved_encCounter_start, 0), xmm0); |
|
vpxorq (%rdi,%r12,1),%xmm0,%xmm0 |
// XOR encrypted block cipher in xmm0 with PT to produce CT evpxorq(xmm0, xmm0, Address(src_addr, pos, Address::times_1, 0), Assembler::AVX_128bit); |
test $0x8,%r8 |
// extract up to 15 bytes of CT from xmm0 as specified by length register testptr(len_reg, 8); |
je 0x7fdb989d94f6 |
jcc(Assembler::zero, EXTRACT_TAIL_4BYTES); |
vpextrq $0x0,%xmm0,(%rsi,%r12,1) |
pextrq(Address(dest_addr, pos), xmm0, 0); |
vpsrldq $0x8,%xmm0,%xmm0 |
psrldq(xmm0, 8); |
add $0x8,%r12d |
addl(pos, 8); |
bind(EXTRACT_TAIL_4BYTES); |
|
test $0x4,%r8 |
testptr(len_reg, 4); |
je 0x7fdb989d9513 |
jcc(Assembler::zero, EXTRACT_TAIL_2BYTES); |
vpextrd $0x0,%xmm0,(%rsi,%r12,1) |
pextrd(Address(dest_addr, pos), xmm0, 0); |
vpsrldq $0x4,%xmm0,%xmm0 |
psrldq(xmm0, 4); |
add $0x4,%r12 |
addq(pos, 4); |
bind(EXTRACT_TAIL_2BYTES); |
|
test $0x2,%r8 |
testptr(len_reg, 2); |
je 0x7fdb989d9530 |
jcc(Assembler::zero, EXTRACT_TAIL_1BYTE); |
vpextrw $0x0,%xmm0,(%rsi,%r12,1) |
pextrw(Address(dest_addr, pos), xmm0, 0); |
vpsrldq $0x2,%xmm0,%xmm0 |
psrldq(xmm0, 2); |
add $0x2,%r12d |
addl(pos, 2); |
bind(EXTRACT_TAIL_1BYTE); |
|
test $0x1,%r8 |
testptr(len_reg, 1); |
je 0x7fdb989d9548 |
jcc(Assembler::zero, END); |
vpextrb $0x0,%xmm0,(%rsi,%r12,1) |
pextrb(Address(dest_addr, pos), xmm0, 0); |
The code on the left and right doesn’t look exactly the same, but it represents the same operations. After a close reading, you might see what it’s attempting to do. 16 bytes is read from memory, XORed with the contents of xmm0, and the result is stored in xmm0. Then, over the course of the remaining instructions, 15 or less bytes from xmm0 are copied into memory at dest_addr + pos. By comparing the registers named in the left column with the variables named in the right column, we can see what registers are mapped to what variables. In this case, RDI is src_addr and R12 is pos. From the error log file, we can see the values of variables, so we know that RDI = src_addr = 0x00000007b5dff000 and R12 = pos = 0x0000000000000ff3. Our (e)vpxorq instruction, therefore, reads from memory starting at address 0x00000007b5dffff3. 15 bytes past that is 0x00000007B5E00002, so the range of memory read is 0x00000007b5dffff3 - 0x00000007B5E00002 inclusive. The last three bytes of this range, 0x00000007B5E00000 - 0x00000007B5E00002, fall into the non-readable memory page we found earlier. The error log file shows us that R8 = len_reg = 0x000000000000000d. By following the code, we can see that later on, because of how R8 is checked and how that affects jumps, the data from 0x00000007B5E00000 - 0x00000007B5E00002 is discarded.
The author of the code knew exactly how many bytes they needed, and ultimately used that exact number, but thought it would be okay to temporarily grab some extra from memory along the way. In almost every case, this assumption worked out fine. Unfortunately, if the src_addr buffer is positioned at the very end of readable memory, then this assumption leads to invalid memory access.
Fixing the Bug
I initially misunderstood the issue, and thought that the vpxorq instruction was writing to memory that it shouldn’t write to. This would have represented a buffer overflow bug that could have potentially been exploitable. Being cautious, I reported the bug as a security vulnerability with the Adoptium project using Github’s process for making private vulnerability reports. Later when I realized there was no vulnerability, I closed this report. However, the Adoptium maintainers coordinated to open a bug against OpenJDK. In parallel, I worked on a patch myself, which was slow going, because I hadn’t written assembly in perhaps 17 years, and also because I’d never contributed to the JDK before, so the build setup was new to me. Before I could finish that, the experienced OpenJDK maintainers posted a patch of their own. This has been merged into the OpenJDK master branch, and will appear in future releases of Java 23. The fix is also a candidate for backporting into Java 21 and 22. I appreciate everyone who worked on fixing the problem so promptly, including Aleksey Shipilev, Christian Hagedom, Martin Balao Alonso, Goetz Lindenmaier, Andrew Haley, Vladimir Kozlov, Sandhya Viswanathan, Smita Kamath, and Emanuel Peter.
In the Meantime
HubSpot uses LTS versions of Java, so we will not enjoy the bug fix until it’s backported to Java 21. In the meantime, we still want to do backups. I mentioned earlier that HubSpot uses a mix of x86 and ARM machines. Thankfully, we enabled multi-architecture last year, which gave us a simple workaround when we ran into this crash. We are now running the affected processes on ARM machines exclusively, which do not suffer from the bug. This is an acceptable solution for now, but in the long run, we want to be able to use both hardware architectures without worry. We could have also disabled the custom assembly implementation of AES-CTR, falling back to a pure-Java implementation, by using JVM argument -XX:-UseAESCTRIntrinsics. As soon as we switched to ARM, we discovered that the HBase backup tool has a memory leak and runs out of Java heap space! On to the next bug.
Are you an engineer ready to make an impact? Check out our careers page for your next opportunity! And to learn more about our culture, follow us on Instagram @HubSpotLife.