Using HBase Quotas to Share Resources at Scale

Written by Ray Mattingly | Oct 2, 2024

Written by Ray Mattingly, Engineering Lead, HBase @ HubSpot

At HubSpot, managing resource usage across our HBase clusters is a critical and difficult problem. Our HBase clusters serve over 25 million requests per second at our daily peak traffic. Thousands of distinct background processes and user-facing web services generate traffic against our most critical HBase clusters, and this traffic is often routed through a small number of internal micro-services that act as a dedicated proxy layer for these HBase clusters. With this architecture, a single bad job or worker could cause pathological load on the underlying HBase cluster, and that load could cause reliability issues across all features that depend on said cluster.

A concrete example of this applications-to-proxy-to-HBase pipeline is our CRM. The CRM is the centerpiece of the HubSpot platform, and its basic data type is a “CRM object.” HubSpot CRM has your standard CRM objects: contacts, companies, deals, etc., but it also has support for “custom objects” that can be flexibly designed to meet each customer’s unique requirements. This framework is powerful, customizable, and is used to some degree across virtually every HubSpot feature. Here is a very simplified example of how many different systems at HubSpot may interact with the CRM objects that are stored in HBase:

In reality, we have hundreds of clusters and more than 10,000 distinct systems generating workloads against them.

In this post I will explain how to achieve scalable resource sharing both at a cluster level and per-RegionServer.

Introductory Glossary

HBase
HBase is a distributed, scalable, big data store modeled after Google's Bigtable. It's part of the Apache Hadoop ecosystem and is designed to provide random, low-latency read/write access to huge datasets.

HBase Tables
Data in HBase is organized and accessed through tables, which consist of lexicographically sorted rows. The rows in each table are partitioned into regions, which are the basic building blocks for scalability and distribution.

HBase Regions
A region is a subset of a table containing all the rows between a start row and an end row. As data grows, regions can be split and merged automatically to maintain system performance. Regions are distributed across the RegionServers in your cluster to distribute the read/write workload for your table.

HBase RegionServers
An HBase RegionServer is the process responsible for handling read and write requests. Each RegionServer is responsible for some number of regions. This post will often refer to an HBase deployment as an “HBase cluster”, meaning a cluster of HBase RegionServers.

Hotspots
An HBase hotspot occurs when a disproportionate amount of traffic is directed to a single RegionServer. This can lead to performance bottlenecks as that server becomes overloaded, resulting in slower response times and reduced reliability. Hotspots often arise from poor data organization, but can also be caused by application layer bugs or general misunderstandings regarding “how much traffic is too much?” When one system at HubSpot generates a hotspot that causes an outage for other users of a given cluster, we refer to it as a “noisy neighbor hotspot.”

HBase Quotas
HBase Quotas are the out-of-the-box solution for managing resource sharing. Quotas come in several forms. For example, there are space quotas — these dictate how large the data of tables, namespaces, etc may become. This post ignores this type entirely, and we don’t use space quotas at HubSpot. Separately there are throttle quotas, which restrict throughput. For example, one can specify that only 10 writes may be done to a given table per minute. Or one could specify that only 1000 requests may be executed by a given user per second. Throughput limits can be defined based on request counts, write size, and read IO. Throttles can also be optionally configured on a per-machine basis, meaning that a 1000 request/second/machine limit would restrict its subjects to only executing 1000 requests/second against any given RegionServer (rather than 1k/sec against the whole cluster). This is a nice choice in our opinion, because this allows throttles to scale horizontally with the cluster, and it allows them to protect against hotspots. This post will primarily focus on per-RegionServer, per-second, user-specific throttles.

Why is Resource Sharing a Problem?

Users don’t always behave. With thousands of distinct systems generating workloads against such critical clusters, we need to assume that some systems will run awry. In this real example you can see the load across every RegionServer drastically increase by approximately 10x in a matter of moments:

Reads per second per RegionServer increased by approximately 10x in an instant, and we saw this change across every server in the cluster.

Digging deeper into our metrics, we can prove that this increased workload originated from a single system:

We see here that the rate of reads per second per user looks normal for everything except this severe outlier, the brown line.

A 10x, 200k RPS increase in read traffic is a drastic example, but it is both a real example, and it demonstrates that even a well-distributed schema can be at risk of these “noisy neighbor” abuses — where a single user may saturate the available resources. Incidents like this are surprisingly common when thousands of microservices, workers, and batch analytics are at play. Further, in a strongly consistent database like HBase — even a single overloaded commodity server can be significantly problematic for the application layer. To put it simply, we need to find a way to eliminate the possibility of runaway users, hot clusters, and even hotspots; this example demonstrates that thoughtful schema design alone is not an adequate guardrail.

In case you’re wondering “why not autoscale?” I would like to proactively debate this idea. First of all, autoscale cannot be fast enough to guarantee our reliability standards across all systems when unthrottled traffic is in the mix. Traffic can multiply in an instant, but machines take time to bootstrap, acquire balanced traffic, and warm their caches. Further, autoscale is not a cost effective solution for database abuses like this. In our experience, the traffic most likely to be problematic is not latency sensitive traffic. Our end-users do not generate 200k RPS of latency sensitive traffic in a moment’s notice, but rather our internal systems, batch analytics, and async processes sometimes do. By throttling these internal and insensitive workloads you can maintain high reliability standards and efficient bottomline costs.

Quotas as a Solution

Our solution can be broken down into a few steps:

Enabling quotas
Supplying quota user overrides
Configuring default user quotas
Customizing back off strategies
Improving workload estimation

Each step is important and useful in its own right, but it is the culmination of these new features that multiplies the utility of HBase Quotas.

Enabling Quotas

HBase Quotas are the out-of-the-box solution for resource sharing. After several contributions to Apache HBase, we have found them to be a very effective tool at HubSpot. This system is disabled by default, but can be enabled by setting hbase.quota.enabled to true in your HBase configuration. With no other custom configurations, and no quotas defined, enabling quotas should not have any effect on requests.

HBase quotas will periodically “refresh” via a scheduled chore. A quota refresh updates the RegionServers’ understanding of the defined limitations — for example, if you update a quota, then it will take effect in the next refresh. This chore runs by default on five minute intervals (at the time of writing this blogpost). We decided that five minutes is not quick enough, because we would like to modify system throughputs much more seamlessly in an emergency. We have configured hbase.quota.refresh.period to 30000 (this value is in milliseconds) and have not observed increased quota refreshes to be a burden on performance, but have appreciated the improved throughput flexibility.

Supplying Quota User Overrides

What is a Quota User Override?

To put it simply, a quota user override is a new feature in HBase that allows you to specify a custom username (for use only by Quotas) on any given request. This will make more sense as you read through this section. This is also where we needed to make some contributions to Quotas in order to make it work at HubSpot, and where our custom configurations begin.

As a refresher from our glossary definitions, at HubSpot we have focused on per-RegionServer, per-second, user-specific throttles. To us, it seemed to make sense that, if we wanted to ensure that no users monopolize resources, then user throttles are the way to go.

But we had a problem — because we use web services as proxies between our traffic generators and the database itself, our original user was obfuscated. It wouldn’t make sense to throttle the direct caller, our proxy micro-service, because that would limit traffic without any regard for its true source. For example, if the EmailSendingKafkaWorker called the CrmObjectsWebService which called the CrmObjectsHBaseCluster, then the CrmObjectsWebService would erroneously be considered the “user” from the quota’s perspective.

To solve this, we added a couple of new features:

Connection and Request Attributes
Each request to HBase contains at least one Operation (a Get, Put, Delete, …), and often at HubSpot we will send batches of hundreds or thousands of Operations in a single request. Operations already supported the notion of “attributes” — a map of arbitrary key/value metadata. Expanding the payload of each operation feels quite wasteful when we send millions of operations per second at our peak traffic, so we added support for connection and request attributes which allow for metadata communication at much lower frequency than a per-Operation basis.

Quota User Overrides
This feature allows you to specify a key that, when passed in as a request attribute, will override the quota’s “user” value. In other words, this allows you to tell HBase quotas who this request is coming from via a request attribute. Please note that this system is not “secure” in any sense — any HBase client could trivially change their username to bypass your quotas.

Specifying a Quota User Override Key

In order to use quota user overrides, you must specify the request attribute key to be used for this feature. You can do this by configuring hbase.quota.user.override.key to your desired key. For this example, we will assume that you have configured hbase.quota.user.override.key to qu (short for quota user, shortened because this will be going over the network for each request).

Sending the Request Attribute

Let’s now pretend that you’re designing an endpoint which will interact with HBase:

By using the TableBuilder class like we have in the example above, you can use TableBuilder#setRequestAttribute to customize the quota user.

Circling back to our original example:

the EmailSendingKafkaWorker called the CrmObjectsWebService which called the CrmObjectsHBaseCluster

Let’s assume that the endpoint defined above lives in CrmObjectsWebService, and it is appropriately passing in the request attribute qu=EmailSendingKafkaWorker for requests from said worker. You would now be empowered to create user throttles (via the shell or the Admin interface) for the EmailSendingKafkaWorker in isolation — you could restrict it to, say, 100MB of IO per RegionServer. If you strike a balance between a reasonable throughput for the given system, and a throughput which cannot monopolize any servers, then you’ve established a reliable client/service relationship!

Configuring Default User Quotas

What are Default User Quotas?

If you’ve gotten to this step, then you’re already in a much better position. In an emergency, you could manually implement a strict quota and throttle a single user in isolation, despite any obfuscation introduced by a microservice architecture. This is a great start, but it is retroactive and manual operations don’t scale.

To become proactive and scalable, we introduced default user quotas. Default user quotas allow you to specify per-RegionServer, per-second user throttles that will be applied to each user out-of-the-box.

Default User Configurations

For example, one can configure hbase.quota.default.user.machine.read.size to 524288000. This would ensure that, without any other configuration, workers like EmailSendingKafkaWorker may only read 500MB/second from single RegionServers. To be clear, this will limit each distinct user at 500MB/second, not the aggregate of all users, so one user’s exhaustion of the quota does not affect other users’ quota availability.

At the time of writing this blogpost, there are several different request types that support default user quotas (see HBase’s QuotaUtil class):

hbase.quota.default.user.machine.read.num
hbase.quota.default.user.machine.read.size
hbase.quota.default.user.machine.request.num
hbase.quota.default.user.machine.request.size
hbase.quota.default.user.machine.write.num
hbase.quota.default.user.machine.write.size

All of these defaults are applied on a per-second and per-RegionServer basis. The size quotas are all in bytes, and the read/write/request number limits are self-explanatory.

Customizing Back Off Strategies

At this point, you now have clusters with great guardrails against runaway traffic. Perhaps you have modern SSDs in production that are easily capable of 1GB/sec of IO, so you have restricted each original caller to utilize no more than 100MB/sec out-of-the-box. This ensures that, without some sort of coordination across original callers, hotspots and cluster wide resource saturation become extremely unlikely.

At HubSpot, we found this approach to have one glaring gap: we hadn’t thought much about the relationship between throttling servers and throttled clients.

AverageIntervalRateLimiter’s Insufficiencies

Under the hood, HBase Quotas are powered by RateLimiters. The rate limiters monitor resource usage against the defined limit, and handle refreshing of resource availability as time passes. By default, HBase uses the AverageIntervalRateLimiter and we found this to be inadequate.

The AverageIntervalRateLimiter is designed to refill quota availability in chunks of flexible size. It does this by proactively refilling the proportion of the TimeUnit (at HubSpot, always per-second) that has passed since the last check of the quota. So if you have defined a 1000 request per second limit, and one millisecond has passed since the quota was exhausted, then it would allow a single request to be executed (because 1ms = 1/1000 of a second, and 1/1000 of the 1000 RPS quota = 1 request). The AverageIntervalRateLimiter sounds like a good way to balance a desire for low latency and high quota utilization, while still safely backing off as necessary, but in practice we found it to be far too optimistic.

The AverageIntervalRateLimiter’s problematic optimism is showcased if your average request size is far smaller than your overall quota. For example, let’s say that a RegionServer is serving 10,000 reads per second, and each read is fetching one 64kb block from disk; this workload would require 625MB/sec of IO. If you put a 500MB/sec throttle in place, then the throttle would quickly be exhausted and the server would begin throwing RpcThrottlingExceptions. RpcThrottlingExceptions are the client’s clue to back off before retrying — they contain a recommended back off time (or “wait interval”) that is the server’s estimation of when appropriate resources will be available to serve the given request, and the HBase client implicitly sleeps for the RpcThrottlingException’s recommended back off.

The RpcThrottlingException back off time is calculated by determining how much time would need to pass, in a vacuum, for the quota to be adequately refilled for the given request. So, circling back to our original example, let’s say that our 500MB/sec quota has been exhausted, but single block requests keep coming in to read 64KB. The quota would calculate that a 64KB request is only 0.01% of the entire 500MB/sec (512000KB/sec) limit, so it would estimate a back off time at 0.01% * 1sec, or 0.1ms (rounded down to 0ms!).

A zero millisecond back off is problematic for several reasons:

It’s too optimistic. Sure, it makes sense in a vacuum, but the reality is that quotas virtually never become exhausted in production due to a single client thread. So the retry is unlikely to actually succeed in a multithreaded environment.
It’s wasteful. If the request would actually succeed in 0.1ms, then it is unnecessary to incur the latency of throwing an exception to the client and executing another RPC.
It’s difficult to use. Configuring the HBase client optimally is already difficult — it becomes virtually impossible if we are prone to exhausting the entire retry count in a matter of milliseconds once quotas become exhausted.
It’s dangerous. Zero millisecond back offs can actually cause hotspots by effectively DOSing a server’s RPC layer as client threads fall into instant retry loops.

Here’s an example of a real throttle-induced-DOS in a HubSpot cluster:

As quotas became exhausted, machines began throwing RpcThrottlingExceptions with 0ms back offs. These immediate back offs resulted in persistent quota exhaustion and a multiplication of RPC traffic.

FixedIntervalRateLimiter: A Suitable Alternative

AverageIntervalRateLimiter may not be the best solution for your HBase setup, but what’s our alternative? HBase offers a FixedIntervalRateLimiter which is a much simpler design: it simply refills your quota to its limit on the given TimeUnit (again, at HubSpot, always per-second). So if you implement a 1000 request/second quota and you exhaust it within 1 millisecond, then you will need to wait 999ms to execute your next request (or 1000 requests, if you’d like).

The FixedIntervalRateLimiter has its own drawbacks: in a latency sensitive production environment, you don’t want to needlessly wait around for seconds at a time. Further, long wait intervals cause a poor developer experience because it is difficult to fully utilize your quotas with all of the wasted cycles, and if it is difficult to fully utilize your quotas, then it’s difficult to reason about how they will work. If you are unable to consistently utilize meaningful proportions of your quota, then it becomes difficult to reason about ideal limits; you’re often left wondering why you’re being throttled despite throughput metrics suggesting you’re doing an acceptable amount of work.

Making FixedIntervalRateLimiter Better

To combat the wait interval pessimism of the FixedIntervalRateLimiter, we added a new configuration to HBase (HBASE-28453). The new configuration, hbase.quota.rate.limiter.refill.interval.ms, defines the interval in milliseconds on which a FixedIntervalRateLimiter should refill proportions of its quota (rather than just unchangeably using the quota’s TimeUnit). We refer to this interval as the quota’s “refill interval.” The refill interval allows you to avoid pessimistic back offs, while also guarding against frivolous retries.

After a few different trials, we landed on hbase.quota.rate.limiter.refill.interval.ms set to 50ms across all clusters at HubSpot, and it has worked well enough that we have not been motivated to continue experimentation. This means that, if we define a 1000 request/second throttle, then it will refill 50 requests every 50ms. Here is how a 100ms refill interval changed our ability to consistently utilize a 25MB/sec quota’s full throughput:

Early in this chart we do not have hbase.quota.rate.limiter.refill.interval.ms configured; as a result, we struggle to meaningfully utilize our 25MB/s limit. Later in the chart we have restarted RegionServers to pick up a custom refill interval of 100ms. This allows us to consistently utilize a much larger proportion of the limit.

And here is how the 100ms refill interval affected RpcThrottlingExceptions’ wait intervals as we rolled out the new configuration across our QA environment.

With default configurations, we see a pattern of presumably quick quota exhaustion, followed by long back offs. As RegionServers picked up a 100ms refill interval, these back offs became a much more normal distribution approaching instant retries on one end and one second back offs on the other.

Improving Workload Estimation

Throughout our work here we also realized that our Quotas setup was only as good as HBase’s workload estimation. Without accurate workload estimation, one may allow expensive requests through a too-saturated throttle. Further, workload estimates are used to determine wait interval recommendations. So poor estimation could result in tiny back offs of expensive requests, and we have already discussed why that can be so problematic.

To improve quota estimation, we would recommend upgrading to HBase 2.6 or backporting the following to your fork of HBase:

HBASE-27687. This change drastically improved the size estimates of Gets and Scans. We achieved this by using block IO to measure read size throttle usage, as opposed to the legacy approach based on result sizes. Result sizes were ignorant to the true cost of the request — for example, a heavily filtered scan could easily read GBs from disk while only returning tiny results, and quotas previously had no teeth to combat this type of request.
HBASE-28385. This change further improved the size estimates for Scans. As discussed above, reasonable back off intervals are a critical component in an effective throttling system. Wait intervals are determined by estimating the cost of a request, and then calculating the time required for the quota to service the estimated cost. In previous Quotas implementations, Scan workload estimations were lowballing reality, and incredibly naive; HBase would always estimate a Scan IO cost as only one block!
In reality, Scans — particularly long running Scans — will often do far larger workloads than a single block. Scans can read as much as hbase.server.scanner.max.result.size in a single Scanner#next call, and this value defaults to 100MB in HBase 2.6. Further complicating things, scans may read dramatically different sizes across multiple calls depending on quota availability — a nearly exhausted quota could plausibly limit a single Scanner#next call to only one block of IO, even if it read 100MB on the previous call.

If you’re curious about the implementation details then I’d recommend reading the PR, but to put it simply: we’ve improved Scan cost estimations so that wait intervals will be more reasonable.

With these changes in place your workload estimation, and consequently your Quotas performance, should be greatly improved.

Quotas at HubSpot Today

To provide a concrete example, we can look back at a MapReduce job in January 2024 — before we implemented any default throttling. This job, without any server-side guardrails, would generate approximately 150MB/s of IO from the average RegionServer, and as much as 400MB/s from the maximum.

This particular job generates as much as 400MB/s from a single RegionServer, meanwhile its average workload is less significant.

This is a very precarious situation for the cluster and its other users because this client, and any others like it, have the freedom to request resource-saturating workloads from our RegionServers. It’s especially disappointing because MapReduce jobs exist to do batch analytics, making them inherently latency insensitive — customers are not waiting on this work to be done as quickly as possible. But the hotspots that this job produces can degrade performance for customer-facing traffic and too easily cause meaningful outages for our product.

In times past, we would wait until a job like this caused an outage, and then we would require that the relevant product developers put more thought into their schema, traffic distribution, and client-side rate limiting. In other words, we were solving for outages retroactively and forcing our product developers to spend time focused on details that are irrelevant to their feature set — neither aspect being aligned with HubSpot’s customer focused philosophy.

Fast forward to today and look at the same job’s workload, bearing in mind that we now have a 100MB/s default user throttle in place on this cluster:

This job’s traffic remains “hotspotty” in nature — its maximum and average IO workloads differing by about 2x. But by limiting the max workload at a predetermined, safe, ceiling, we have mitigated the risk posed by this application.

On typical runs we will now see this job generate 50MB/s of IO from the average RegionServer in this cluster, and 100MB/s in the maximum case. This default throttling setup mitigates would-be hotspots in real-time and without any manual intervention required.

By leaning into default user throttles for all applications, it has never been easier to both manage and use HBase at HubSpot. We are virtually always throwing some volume of RpcThrottlingExceptions across our clusters, and we have found this to be a healthy, well-utilized, and predictable state for both operators and users.

Conclusion

Whether you are new to the HBase ecosystem or you have been working in it for a decade, you have likely run into resource sharing issues. It’s also likely that you considered HBase Quotas as a potential solution, but found it to be insufficient.

At HubSpot, we have successfully isolated bad users to a precise degree via the Quotas setup described in this post. Even beyond bad user isolation, it is a huge win to allow the application layer to pass the responsibility of rate limiting up to the RegionServers — by proactively solving the complex choices required to run a reliable application at scale, we’ve been able to drive up developer productivity across product teams at HubSpot.

With an upgrade to 2.6 and a little bit of customization, Quotas can now be a very powerful and scalable guardrail, even for clusters with thousands of distinct tenants.

Are you ready to make an impact? Check out our careers page for your next opportunity! And to learn more about our culture, follow us on Instagram @HubSpotLife.

View full post