While we can run local profiles to measure these changes ourselves, we often find that our internal tests aren’t necessarily reflective of the impact changes will have in production. In a single case, we saw local improvements of nearly 750 milliseconds for a change that ended up having less than a 100 millisecond improvement on performance in production. RUM allows us to test these changes on customers, and because they’re behind feature flags we can revert back to the old changes instantly if we find an experiment has negative impacts.
Goal Setting
Once we had a system in place to track performance data, the next step to creating a delightfully fast experience for our customers was to actually define what “fast” means. While it seems like that may be as simple as choosing a target number, as we started to define these SLAs we realized there were lots of other variables we had to isolate. Do we want to measure international users or just start with users in the US? Should we be measuring background tabs? What constitutes a successful page load, and how do we know when a page is done loading?
Isolate Inconsistent Variables
When we first started measuring our applications’ performance, we ran into lots of problems with inconsistent measurements, both within a single application’s numbers as well as across applications. As mentioned above, variables like foreground versus background tabs created challenges in laying out a consistent set of performance guidelines.
We chose to exclude any page visits that loaded in background tabs, as they’re heavily throttled by browsers and aren’t an accurate representation of our performance. We are also currently excluding international users from our SLAs, though we plan to change this in the future. While we don’t currently have international data centers outside of the US, once our backend supports multi-region replication we’ll likely start to include international data in our measurements as that data will become more meaningful, whereas today we’d be measuring our users’ distance from our data center.
Set Aggressive Goals
While the conventional wisdom suggests choosing smaller, easily attainable goals, we found that without aggressive performance requirements teams would not be as motivated to meet them; the farther away the goal the more motivated teams were to try and meet it. Additionally, we intend to continue to move these targets as more teams get into SLA. While these goals are very aggressive, especially for business applications of our size, we feel that providing a consumer-grade experience of a fast, responsive app is non-negotiable in building modern software.
On the frontend we require that 75% of all page loads for an application finish in less than 2 seconds. Teams are taking a myriad of technical approaches to achieve this, from long-lived asset caches to code splitting, and more. On the backend, 75% of API requests must complete in less than 100 milliseconds. In the last year we’ve become keenly aware that frontend performance has a strong correlation with backend performance. Without a performant, properly architected backend there is no way for the frontend to be performant. Additionally, with our new efforts to treat external integrators on our platform as top-level users, API performance has never been more important.
Results
While the technical details will fill several future blog posts, we’ve been able to combine many strategies including caching expensive work, preventing excessive re-rendering of components, aggregating data fetching via GraphQL or purpose-built aggregator REST endpoints, and aggressive code splitting and JavaScript caching to drastically improve the performance of our applications. Since the start of this project, the number of user-facing apps within the SLA mentioned above has increased from about 5% to nearly 50%, and our apps that aren’t passing are still making notable progress. In one case, our CRM application mentioned earlier, we have seen our 75th percentile load times decrease from around 10 seconds to about 4.5 seconds, and those times are continuing to fall.
Though we have a long way to go, our teams have made an impressive amount of progress in just one year while simultaneously managing to ship new products and provide value to our customers. We’re excited to see our engineers continue to provide investment in this area and discover new ways to provide fast, delightful experiences to our users.