Behind HubSpot AI: How Does Prediction Engine Score Millions of CRM Objects Daily?

Written by Leena Bhandari | Nov 15, 2024

Written by Leena Bhandari, Senior Software Engineer, Öznur Alkan, Staff Technical Lead, Andrea Cicalese, Senior Software Engineer, and Nabeel Nauman, Senior Software Engineer at HubSpot.

-----

Predictive Scoring: Why It Matters for CRM

HubSpot provides a customer relationship management (CRM) platform that helps front office teams such as sales, marketing, and customer service grow better. In Hubspot CRM, customers keep records such as contacts, deals, companies and track them through stages in marketing and sales lifecycles. For example, a deal object added to the CRM might move through various qualifying stages such as appointment scheduled, qualified to buy, presentation scheduled, decision-maker brought in, etc. before the deal is either closed as won or lost. Throughout these stages, predictive scoring assesses the likelihood of a contact converting into a customer or a deal closing successfully, using machine learning (ML) models.

How does predictive scoring benefit customers?

A sales representative manages hundreds of contacts and dozens of deals at a given time. Their goal is to advance high-potential deals and prioritize their time strategically. Scoring and ranking these deals helps them quickly identify which are more likely to close, allowing them to focus their efforts on high-value opportunities. Score also provides visibility into deals that aren't progressing well, helping to identify actions that could lead to a positive outcome. Predictive scoring saves time and increases the chances of conversion, driving overall success.

Why Do We Have a Platform Solution for Predictive Use Cases?

To fully leverage predictive scoring, as well as other predictive use cases, a robust platform solution is essential. The main motivations for building this platform are: consistent data structure behind CRM, a similar inference pipeline, and the need to handle high volumes of data across different predictive use cases.

HubSpot CRM is designed with a consistent data structure, where all the objects (like contacts, deals, and companies) have identifiers, properties, and associations with other objects. This structure enables predictive models to utilize the existing properties and associations data within the CRM to calculate relevant features.
At the inference stage, the model needs to be called regularly or in response to specific events to ensure predictions remain up to date. This inference pipeline is similar across different use cases.
CRM objects typically experience high CRUD volume. For instance, there are as many as 11B contact objects in the CRM with about 18M being added daily. They receive around 18K updates per second, depending on the time of the day, with peak loads reaching up to 60K updates per second. Having each team implement the inference pipeline individually would mean re-solving the scalability challenges that other teams have already addressed.

Motivated by these factors, Prediction Engine was built to provide an object agnostic platform solution for predictive use cases within the CRM. This platform standardizes ML inference at scale, addressing latency requirements and improving the time to deliver end-user functionality.

Inside the Prediction Engine: A System Overview

Prediction Engine is responsible for feature computation, online and offline inferencing, sending feedback to the model and publishing predictions for upstream services.

Prediction Engine Components

The diagram above provides a system overview of the Prediction Engine. The components highlighted in yellow are Prediction Engine components, the ones in green indicate the incoming data sources, and the ones in blue refer to the external components. Prediction Engine is built on the following core components:

Update Listeners: listen for updates made to CRM objects.
Debouncer: consolidates updates within a specified timeframe.
Scoring Workers: calculate features and score the objects.
Scheduled Batch Jobs: refresh scores at regular intervals.
Feedback Manager: generates and sends feedback to the model.
Result Publisher: publishes scores and explanations for upstream services.

The Inference Server is an existing component responsible for generating predictions based on previously trained models. It handles inference requests and collects feedback for ML models. To facilitate this feedback process, Prediction Engine stores information about the inferences made in the Inference Database that’s used for feedback.

Modular Framework for Reusable Inference Logic

To handle different use cases and consolidate logic across multiple inference pipelines, Prediction Engine implements modular components. These shared components enable logic reuse, reduce code duplication and simplify the integration of new scoring or predictive use cases. By leveraging this architecture, the system supports extensibility, allowing new predictive models to be incorporated easily. To handle many predictive scenarios, we employ two different inference flows: online and offline. Below, we first detail these reusable components and then discuss the optimizations implemented for both online and offline inferencing flows.

Scoring Workers

Predictions for a scoring use case typically yield a probability value. For some use cases the result may also include SHAP-like explanations, providing additional context for the calculated scores. To generate these predictions, scoring workers employ feature calculators to generate actual model features and send requests to the Inference Server which serves the trained models.

To ensure the relevance of updates, a delta threshold is utilized. This means that a new score is only written if it differs significantly from the previous score by more than the delta threshold (|new value - old value| > delta). This approach is based on the understanding that not every score update will be pertinent to the end user; small changes to the score lack significance. By implementing this delta threshold, the volume of updates made to the CRM is reduced by approximately 22% for the online inferencing for the deal scoring use case.

In offline inferencing, a significant finding was that for the deal scoring use case, around 84% of object updates fail to meet the required delta threshold. Additionally, for models that provide explanations, generating the explanations was observed to take about 20 times longer than simply generating just the scores. To address this latency and optimize performance, Prediction Engine utilizes a splitting approach which works as follows.

A request is sent to calculate the score.
The difference between the new score and the previous score of the object is evaluated against the delta threshold.
Explanations are requested only for the scores that pass the delta threshold.

This splitting logic allows Prediction Engine to make explanation calls only for those objects where the result is likely to be saved, thereby speeding up the overall job by reducing the inferencing time for “irrelevant score” requests. For the deal scoring use case, this method reduced the time taken for offline inferencing by 57%.

Result Publisher
Prediction Engine publishes results to a Kafka topic, making them available for consumption by upstream services. These results are then delivered to end users via the HubSpot app. Additionally, the scores are updated in the CRM, allowing other services such as reporting and Copilot to access and use the data.

Feedback Manager
When an object reaches the end of its life cycle (e.g., when a deal is closed), feedback is sent to the model for all the inferences generated for that object. This feedback is used for analyzing the model performance and for model auto-training. Inference Database is a Vitess/MySQL database used to track all these inferences made on an object.

Optimizing Online and Offline Inferencing in Prediction Engine

Prediction Engine implements both online and offline inferencing flows. Below, we describe the details of each flow and the optimizations implemented to handle the required volume.

Online Inferencing

Online inferencing employs Update Listeners to process real-time update events for two primary use cases:

Object Updates
Portal Upgrades

Object Updates - Predictions are generated using model features derived from object properties. Whenever there are changes to relevant properties, the predictions need to be adjusted accordingly. To facilitate this, online inferencing is supported by Prediction Engine in near real-time by listening to object property updates published on a Kafka topic. To manage the volume of incoming requests, only the most “critical” properties—identified by ML engineers and product managers—are monitored. This approach has resulted in approximately a 41% reduction for the contact scoring use case.

Portal Upgrades - Within the CRM, certain features are available through different HubSpot subscription options. When a portal gains access to an object scoring feature through a subscription, predictions and scores for all objects within that portal must be calculated. Online inferencing for portal upgrades is facilitated by listening for portal upgrade events published to a Kafka topic.

Given that portals can contain anywhere from 10-100 objects to millions, it's crucial to manage processing efficiently. To prevent smaller portals from experiencing longer wait times while larger portals are being processed, the load is segregated into separate queues for large and small portals. This ensures fairness in processing times and optimizes the overall user experience.

Debouncer

During periods of high activity, bursts of property updates for the same object can lead to increased loads to the Inference Server and increased writes on the CRM. To mitigate this, a Debouncer is implemented, ensuring that each object is scored at most once within a specified timeframe. The internal architecture for Debouncer varies based on the volume of object updates and the business SLA for the use case. Prediction Engine implements multiple debouncing strategies: one uses an in-house queue built on Amazon SQS and HBase to manage uniqueness checks and long term scheduling, while another approach groups object updates in memory based on the object keys and processes the update only once.

The timeframe is configurable according to the acceptable SLA per use case and can range from minutes to hours. A longer time frame means that more object tasks or requests are held in memory. For example, our analysis indicated that increasing the timeframe from 1 hour to 6 hours reduced the task requests by 3% but increased memory usage by 135%. Consequently, the incoming volume is a critical factor in determining the timeframe for the debouncer. Debouncer significantly reduced the load of our systems. For instance, for the contact scoring use case, debouncing contributes to 44% reduction in the volume of requests received.

Offline Inferencing

For use cases where real-time predictions are not necessary, offline inferencing can be run on large batches of objects to generate bulk predictions. Another common scenario for running offline inferencing is when feature values change implicitly. This necessitates periodic recalculation of predictions to combat feature drift. For instance, the "Days Since Last Update" feature in deal scoring is important in determining the probability of deal progression. As time goes, the value of this feature changes implicitly. We use offline inferencing to regenerate predictions for all objects that have not been updated explicitly within a preconfigured timeframe, effectively capturing those missed by the online inferencing pipeline.

To implement offline inferencing in Prediction Engine, Scheduled Hadoop jobs are utilized to scan the entire CRM and re-score all the un-updated objects. To prevent overloading the live HBase database, these jobs make use of snapshots that are taken daily and are stored in S3. An icebox threshold is implemented within the job to avoid processing objects with very low scores. The rationale is that these objects may no longer be significant for the end user and can be excluded from inferencing. This approach helps eliminate the unnecessary inferences and writes for objects with insignificant scores, contributing to an overall reduction of approximately 2% in inference requests.

Closing Insights: Performance and Impact

In the deal scoring use case, Prediction Engine processes approximately 20K requests per second in the online inference pipeline, generating around 60 updates per second. It scans over 3.5B objects and scores approximately 250M objects in the offline batch job. By utilizing debouncing, feature importance analysis, routing queues, update thresholds, and splitting prediction requests, Prediction Engine reduces the load on upstream services by up to 78%, preventing an overload of updates that could confuse end users with unnecessary score changes.

Designed in an object-agnostic way, Prediction Engine offers a standardized approach for hosting any scoring use case and integrating predictive models within the CRM platform. The system's modular architecture supports extensibility, allowing for the easy integration of new predictive models as business needs evolve.

Deal Scoring feature in HubSpot App

View full post