Written by Mehmet Aydar, Senior Software Engineer II @ HubSpot
We recently made HubSpot Academy’s rich repository of knowledge available to answer any business topic and HubSpot software question effortlessly using the OpenAI Large Language Model (LLM). Our approach uses the Retrieval Augmented Generation (RAG) method. Over 7,000 HubSpot Academy videos encompassing 700+ hours of content created by seasoned Academy professors have been indexed in a vector database to augment LLM, which mitigates the problems of knowledge gaps and hallucinations.
We’re sharing the high-level outline of the approach for anyone who wants to follow in our footsteps. Read until the end to hear about the lessons we’ve learned through building this feature.
HubSpot Academy Overview
HubSpot Academy researches, develops, and distributes education with the purpose of educating and inspiring people to transform the way the world does business. The HubSpot Academy team's purpose is to empower a global learning community through business and technology education and credentials to maximize career potential and accelerate organizational success.
HubSpot Academy educates and inspires over 500,000 learners across the globe each year. It offers learning experiences such as comprehensive certifications, courses, lessons, playlists, and short-form content provided in videos and chapters. Education content is provided by professors teaching in six different languages: English, Spanish, German, Japanese, French, and Portuguese. They offer education on business topics like marketing, sales, customer success, and operations, and HubSpot software including but not limited to the marketing hub, sales hub, service hub, and operations hub.
Academy Videos
Professors at HubSpot Academy reach the learners through educational videos. The videos are an important part of the Academy platform. Each video has subtitles either uploaded by the professors or auto-generated using speech-to-text techniques. The subtitles are in WebVTT format, which is a format for displaying timed text tracks. A WebVTT file contains cues, which can be either a single line or multiple lines, as shown below:
We created backend jobs to extract video subtitles and store them in our database in plain-text and WebVTT formats. They are used to augment the LLM and to locate the user’s question inside the video.
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) is a method that combines the benefits of retrieval-based and generation-based approaches in natural language processing. In this method, a retriever is used to retrieve relevant information from a knowledge base, which is then used by a generator model to produce coherent and informative responses. In simple words, a retriever fetches facts from some data set and feeds the results into a Generator which is an LLM. By leveraging the strengths of both retrieval and generation models, RAG can generate more accurate and contextually relevant responses, making it a powerful tool for tasks such as question answering and dialogue generation.
Approach
The architecture is outlined in the diagram below:
Indexing video captions
Video captions are retrieved, chunked, and indexed in HubSpot’s Vector as a Service (VaaS) system. VaaS is backed by Qdrant. It provides vector storage and closest neighbor search. Our vector index consists of 5 shards, and it uses cosine similarity as the similarity measure with a vector embedding size of 1536.
The video captions are split into small chunks. We use the recursively split-by-character methodology for chunking. The text splitter is designed for general text and is customizable with a list of characters. It attempts to divide the text based on these characters in a specific order until the chunks are of a manageable size. We use the following list of characters for splitting ["\n\n", "\n", " ", ""], aiming to preserve the coherence of paragraphs, sentences, and words as much as possible, as they are typically the most semantically connected parts of the text. The maximum number of characters in a chunk is 1250, with a maximum chunk overlap size of 25. Assuming the average number of characters in English is about 5, each chunk can contain about 250 words.
Each chunk is then vectorized using an embedding model. We use the text-embedding-ada-002 embedding model provided by the OpenAI. It can take a maximum token size of 8191, and it generates vectors with dimensions of 1536. It aligns with the settings of our vector index.
While storing the embeddings in our vector index, we also store additional payloads such as the ID of the video, chunk index, chunk content, and language.
Prompt Generation
In this step, we detect the source language of the user query, run the user query against VaaS, extract chunks that semantically match the query, and finally construct a prompt and run it through the generative AI Engine. The steps are outlined as follows:
- The user query is vectorized and searched through the cache. If there is a cache hit, then the result is returned from the cache. Otherwise, it goes through the below steps.
- The source language is detected using the Google Translate service.
- We run a semantic search using cosine similarity for the user query in VaaS. If it returns results, then the results are returned for prompt construction. Otherwise, the user query is translated into English, and we run the translated query against VaaS, and return the results from VaaS. In the prompt generation step, we instruct the generative AI engine to return an answer in the source language. This way, we can provide answers to the users even if the query is not in one of the languages supported by the Academy.
- The results from VaaS include the content, which comes from the chunks of video captions, and source details, including video identifiers, language, and chunk index. The content is fed to the LLM to answer the user query. The source details will further be used in the response to provide references for the answer.
Below is a sample prompt generated for the query “How can I send an email using HubSpot software?". Some parts of the prompt are cropped for better readability:
LLM Settings
We run the generated prompt on the Generative AI Engine, which uses the OpenAI GPT-3.5-turbo as the LLM model. The LLM model settings are as below:
Processing Generative AI Responses
Our prompt includes specific instructions to retrieve the response from the Generative AI Engine in a specific JSON format, which is parseable to a Java object. Below is an example response returned as the response to the prompt above. Some parts of the response are cropped for readability purposes:
As we have the identifier information for the sources, we can generate reference links back to the specific Academy videos that led the LLM to generate the answer.
The response object contains a field named “lineOrderOfTheStartLine”, which indicates the line where the answer is located in the chunk. We use this field to calculate the exact timing (called “vts”) of the answer within the video, by utilizing the regex and timestamp information we originally had within the WebVTT captions. The timing value is then concatenated with the video source link. This is an example of a video link with timing. Academy Frontend lets the video start at the indicated timing value once the user clicks on the video reference link.
Caching Answers
As the LLMs are slow, on average, it takes about 3-4 seconds for the service to generate a proper response. Running frequent queries on LLM could also be costly. This leads us to integrate a caching mechanism.
We cache the results once we have a proper response. The key for the cache is the user query, along with some other properties to determine the visibility of the content. We make use of a separate index in VaaS for caching. The use of VaaS allows us to match results with a similarity threshold. For instance, the following queries are treated as the same: “How can I commit my code?”, “How can you commit your Code.”. By default, we use 95% semantic similarity for retrieving results from the cache.
Integrations
The AcademyAI service is currently integrated with ChatSpot and HubSpot’s in-app help widget. In addition, HubSPot's new Pricing & Packaging chatbot has integrated with AcademyAI to help field in-depth feature questions.
Chatspot Integration
ChatSpot is an AI-powered sales and marketing assistant designed to help businesses grow. ChatSpot combines the power of ChatGPT with dozens of unique data sources, most notably the HubSpot CRM. With this integration, we have made the Academy one of the knowledge sources for ChatSpot.
Using the ChatSpot Academy Template, users can ask Academy-related questions. For example, ChatSpot returned the below response using AcademyAI for the query “Academy: Explain Inbound Marketing Strategies".
For more information, you can visit the blog post here.
In-app Help Widget
Growth In-app Help Service is a self-service that assists users in solving their help needs in the HubSpot app via a smooth self-service experience. The In-app Help service has a list of commonly asked questions along with a curated list of answers. AcademyAI integration enabled showing Academy videos alongside the In-app Help answer.
For instance, it shows below the video as a complement to their answer to the user query “How do I edit properties?”
As of today, the In-App Help only uses the VaaS endpoint of AcademyAI. In the future, we plan to integrate the Generative AI endpoint.
Pricing & Packaging Chatbot
The Pricing & Packaging Chatbot answers questions regarding the products and features that HubSpot offers. The team is benefiting from integration with AcademyAI when it comes to answering more in-depth feature-related questions.
For instance, it shows the below answer along with a reference link to HubSpot Academy for the user query “How can I use email tools in HubSpot?”
Future Integration Plans
We plan to have the AcademyAI as part of the Academy app experience. This will help users ask anything to the Academy rather than just do a keyword search. A concept design is illustrated below:
When users pick one of the options, such as “How do I..?”, it is expected that they will be suggested by a set of questions powered by autocomplete functionality. The “How-to” kind of questions will be sent to AcademyAI, while keyword-like queries will still be sent to the Academy regular search powered by Elasticsearch.
The Academy Frontend team is also working on the “Academy Assistant” project powered by AcademyAI. The goal is to make the integration of AcademyAI and Academy components easier for other teams and to leverage AcademyAI to generate content for users wherever they need help in the HubSpot app. A concept design for Academy Assistant is illustrated below:
Lessons Learned
Generative AI has revolutionized AI adoption. Users can achieve magic with a simple chat command. We believe that launching fast and failing fast is imperative for AI projects. Therefore, starting small and adding new features iteratively would help advance an AI project.
Excitement for a proven new technology usually comes with hype. So, it is important to determine some KPIs for the features you are adding and measure the impact accordingly.
Answers generated from LLMs are usually precise. However, speed is still an issue. Generative AI answers are significantly slower than those of an average traditional machine learning algorithm. Caching usually helps mitigate some of the problems associated with speed.
Another challenge with the LLMs is the token size of the prompts. We use a maximum token size of around 4K. It is challenging to manage the token size when you need a large prompt. This is usually more evident if you need to include the user’s previous conversation history with the LLM within the prompt.
Another important topic that we need to be aware of is the feedback mechanism. We are planning to have users rate the LLMs answers and incorporate that feedback back into the LLM. We need a more sophisticated token size management solution to achieve this.
Videos are a powerful feature of HubSpot Academy. We have seen that having the videos in the query response has a significant impact on click-through rates. The transformation of taking the Academy video library and outputting it from a generative AI query enabled us to produce genuine AI answers. We want to thank the HubSpot Academy professor team for creating the rich repository of video content that makes this feature possible.
Are you an engineer ready to make an impact? Check out our careers page for your next opportunity! And to learn more about our culture, follow us on Instagram @HubSpotLife.