Blog - HubSpot Product Team

Leads API Lessons Learned

Written by Stephen Huenneke | Jan 14, 2011

In October 2009 we planned and started a migration of our entire leads storage system from MS-SQL to MySQL.  In the process we were going to build a RESTful leads API and build an entirely new user-facing web application to help customers manage and nurture their leads better than ever.  The goals were lofty, but we felt we were going to be successful because we had a good architecture and developers.

A year later, after many long nights and continued optimizations,  the Leads API is up and running and the user facing front end is happily serving our 4000+ customers and serve over 1 million API calls per day.  I'm happy with what we built, but of course see plenty of ways we could have done better.  As I've started working on some new API's for HubSpot I've taken a look back at the things we did right and wrong with the leads API and thought I'd share some things I found particularly useful here.

What we did right:

  • Mandate internal usage to force real usage, quickly.  We mandated that all apps that weren't deemed part of the lead management suite would have to use the API to interact with leads data.  This forced us to scale to meet our internal customers' demands immediately.  Within a week of launching the first version and migrating the first few applications to use the API instead of database connections, we had over 20k API calls per day.  It was awesome to see our work coming to fruition so quickly and the success built up a lot of confidence in our strategy.  We were able to see scale issues and find bugs in near real time as we combed through logs of errors caused by real-world production leads data.  This was reporting was much faster than even the most comprehensive QA testing or unit testing suites. 
  • Iterate quickly to solve problems.  We iterated quickly on the problems we found, fixing them and shipping within hours or even minutes of finding a problem.  We setup our error logging to email the leads dev team whenever major errors occurred and we were always shipping bug fixes and improvements several times a day.  The code in production was almost always HEAD or close to it and we built an infrastructure so that no one had any qualms about shipping mid-day.  We were responding to out internal users needs at breakneck speed.
  • Analyze usage and find ways to scale horizontally.  When we realized that we were hitting over 20k API calls per hour during peak usage times, and serving hundreds of MB worth of data per hour, it became clear we were going to need more than two small EC2 instances to serve the data. We were hesitant to upgrade the servers, but we realized that the cost of upgrading versus the cost of trying to squeeze every last bit of performance out of the code was unbalanced.  With a choice between dozens of developer hours versus a couple hundred dollars a month for equivalent gains in performance, the scales were tipped pretty heavily in favor of throwing hardware at the problem.

What we did not get right:

  • Decouple from resources used by other applications.  We noticed a funny thing after a few months.  During peak usage hours, which happened to be during the middle of the night when our internal customers were running scheduled jobs to process big chunks of lead information, we were slowing down our MySQL read-only helper.  We started receiving complaints from our international customers about the front-end being slow to respond during these same hours.  We had been crippling the shared MySQL helper and causing even the most trivial of queries to slow to a crawl for the user-facing lead management tool.  We created separate helpers for the two services and solved the issue easily enough, but it was a big oversight on our part.
  • Construct an allowlist of allowed response fields.  We chose to blocklist data fields we explicitly didn't want to serve and filtered those out of our returned JSON responses.  At first, this was great, as we were able to add fields by simply adding them to our objects, while reflection and our JSON serialization did the rest.  But then we realized that new fields were causing issues with deserializing the data on deployed applications with older versions of the objects.  It was easy enough to fix, we changed our deserialization to ignore those errors.  So we fixed the outright bugs, but we've caused some long-running pain and uncertainty about what data is being served and when.  If we added a field at a certain date, then deployed to production at a later point, then deployed dependent projects even later, who knows what version of the object went out where and when.  At any moment someone could add new data to the returned JSON and this undermines confidence in the API as the responses are changing for internal and external customers alike.  We're taking steps to fix this in the Leads API, and have built the Blog API from  the beginning with an eye on allowlisted elements.  Consistent responses and results helps increase users' confidence and that confidence helps them build better software faster.
  • Iterate quickly to solve problems.  Yeah, this was also something did wrong.  The other side of the fast iteration coin is that you get caught up in the momentum of your work.  It's hard to take a step back and stop to think about what your one line fix might mean to the dozens or hundreds of consumers of an API.  What about documentation?  What  about unit tests?  These are all tools we use to ensure we are building high quality, reliable software.  Iterate too fast, and too loosely and you find yourself falling behind on keeping up with your own software.  You find yourself in a seemingly endless cycle of bug fixing, deploying, bug finding,  bug fixing, deploying...
  • Struts 2 REST plugin.  The Struts REST Plugin is a great tool for building out a RESTful interface for a complex hierarchy of objects.  It handles JSON and XML output by default, requires a small amount of configuration and is generally well written, well tested code.  It's also a terrible  fit for the API we wanted to build.  It's got too many features we didn't need (i.e. XML isn't used by our API consumers internal or external).  It was exceedingly complex to perform tasks that in a normal struts stack would be totally mundane (i.e. be able to stream a response instead of compiling one giant response string in memory and causing heap space errors)
  • Don't just build features for specialized internal users.  Internal users wanted more data, more flexibility, faster responses  and complex queries on the leads data.  Building those features was a challenge, and spending all that time concentrating on internal features was time we could have been spending to make the API faster, more reliable, more scalable and easier to maintain.  Those gains would have benefited internal and external users alike.  The rapid iteration on internal-only features certainly helped gain traction and higher usage for the API, but it happened at the cost of improvements for all consumers of the API.  We have a google group for hubspot api consumers now, and its helping us to make better decisions for external developers needs.

As I look back on the last year of work I see a lot of mistakes we made that I would make again in a heartbeat.  The API was essentially a proof of concept in the beginning.  It gained traction internally and externally and we responded accordingly.  We probably never took the time to look at the API as we'd built it and ask if it was truly production ready.  That's hard to do when you're fending off bugs left and right, just trying to keep your head above the water.  What I've taken away is to be leaner about features, keep fast iterations, and be more thoughtful about repercussions the choices you make might have.  Prioritizing "internal use only" features above public ones should require a hard sell from the parties requesting these features.  You're asking for time the developers could be spending improving the entire app, possibly negating the need for your internal feature entirely.  By no means is the book closed on this we have short-term plans for better performance, higher reliability and greater scalability for this project, and I look forward to all the lessons I'll learn from those improvements.