When running on EC2, you often can't win when it comes to instance types. One of the more cost-effective types available is the c1.xlarge. It has enough CPU to handle compactions, a decent amount of disk, and high network I/O. However, we've found that the relatively low memory of 7GB on the c1.xlarge often leads to stability issues in highly concurrent HBase clusters. While there are other more expensive options, this HBase tutorial will help you make the most of your c1.xlarge RegionServers.
1. Reduce the number of regions per RegionServer
Ideally you should have less than 100 regions per RegionServer. The memstore is divided for use by all active regions, and each region adds (by default) 2MB of memory for the MSLAB. Cutting this number down will help things run smoothly, and not just from a memory standpoint.
2. Steal memory from other services
You definitely shouldn't be running a TaskTracker with your RegionServer on these instance types, but you are most likely running a local DataNode. A typical configuration calls for 1GB of memory for a DataNode, but we've found that you don't need that much in a lot of cases. Verify your metrics before rolling this out, but we were perfectly safe cutting DataNode heap down to 400MB. This nice 624MB chunk will help HBase get a little further.
3. Tune or turn off MSLAB
If after stealing memory and cutting back regions you are still having issues, you can go a step further. Like I mentioned, the MSLAB feature adds 2MB of heap overhead by default for each region. You can tune this buffer down with
hbase.hregion.memstore.mslab.chunksize. The lower you go the less effective it is, but the less memory overhead as well. Turn it off altogether with
4. Be aggressive about caching and batching
Scan#setCaching(int)) and batching (
Scan#setBatch(int)) are great for limiting the effect of network latency on large scans. Unfortunately they also require more memory on both the client and server side. Keep in mind the speed trade-off, but enjoy a bit more stability by tuning these down, as close to a value of 1 as necessary.
The RegionServer also has to have enough memory to handle all of your concurrent writes. If you are heavily batching your writes, or sending a few very large cell values, you are likely to run into OutOfMemoryExceptions. Lower your batching here as well, or otherwise find a way to shrink the size of your cell values.
5. Control load from Hadoop
If you are running hadoop jobs against your HBase data, you are basically running a lot of large scans. In an HBase MapReduce job, each region becomes a mapper. If you have more than 1 region per RegionServer, chances are that you are going to have a few mappers scanning the same RegionServer concurrently at some point. Each of these scans is taking memory, disk, and CPU resources and when multiple build up it can cause some pain.
hbase.regionserver.handler.count will help limit the number of active connections taking memory, but you could still have an issue if all handlers are handling large full-region scans. Using our extension of TableInputFormat, you can easily control how many concurrent mappers run against a single RegionServer, providing more predictable memory usage.
If you are writing to HBase from a reducer, you are going to want to control partitioning there as well. This is easily implemented using Hadoop's
Partitioner interface, with HBase's
HBaseAdmin interface providing the region to RegionServer mappings.
Running HBase on low memory is hard, but not impossible
With these tips in hand, you should be well on your way to surviving operations in your low memory environment. It can be frustrating to fight for every megabyte of memory in an age of extremely cheap, fast RAM. But follow these tips and your CFO will thank you for making the most of this cost-effective instance type. For those with a little more financial freedom, we will explore the impact of Amazon's new I2 instance types in a future post. So stay tuned!