A RAG Chatbot Powered by Claude 3 Haiku and MongoDB Atlas in Node.js

Fateh Ali Aamir
5 min readApr 4, 2024

--

What Are We Making?

We’re making a RAG Chatbot that can help us communicate with our data. The data can exist in any form as long as we can create vector embeddings for it. Once we have those, we can store them in any vector database of our choice. I’m using MongoDB Atlas for this example. And you make things more interesting, I decided to use Anthropic’s Claude 3 Haiku model for this example because it was very intriguing for me. Recent benchmark scores have shown that Anthropic has overtaken OpenAI as the leader in the LLM world. For the embedding model, we’ll still use OpenAI’s Text Embedding 3 Small Model. And finally. trying out something completely unprecedented, I decided to build this entire thing in Node.js just to show that it’s possible.

What Are We Using

  1. Anthropic’s Claude 3 Haiku — claude-3-haiku-20240307
  2. OpenAI’s Text Embedding 3 Small Model — text-embedding-3-small
  3. MongoDB Atlas Vector Search
  4. Node.js (version 20) with Express
  5. Langchain

Let’s Make the Necessary Imports

const { OpenAIEmbeddings } = require("langchain/embeddings/openai");
const { MongoDBAtlasVectorSearch } = require("@langchain/mongodb");
const { MongoClient } = require("mongodb");
const { ChatAnthropic } = require("@langchain/anthropic");
const { RetrievalQAChain } = require("langchain/chains");
const { PromptTemplate } = require ("@langchain/core/prompts");
const { StringOutputParser } = require("@langchain/core/output_parsers");
const { Document } = require("langchain/document");

Creating the Embedding

First of, we will initialize our OpenAI Embedding Model. This constructor takes in an API Key and the model’s name. Text Embedding 3 Small will give you 1536 dimensions for your vector embedding.

const embedding_model = new OpenAIEmbeddings({
openAIApiKey: "", //add your OpenAI API Key
modelName: "text-embedding-3-small" //you can leave this unchanged
})

Storing the Embedding

Now we will make the necessary initialization for MongoDB. First, we’ll start with our client. The MongoClient constructor will take in your connection string as a parameter. After that, you will specify your database and your collection. dbConfig is an important parameter for the Vector Store so make sure that’s accurate. You also have to create a Vector Search instance from the MongoDB Atlas dashboard. We’ll look at it in the next step.

const client = new MongoClient(""); //add your MongoDB Atlas connection string
const database = client.db(""); //add your MongoDB Atlas database name
const collection = database.collection(""); //add your MongoDB Atlas collection name
const dbConfig = {
collection: collection, //add collection here
indexName: "vector_index", //add Vector Index name
textKey: "text", //you can leave this unchanged
embeddingKey: "embedding", //you can leave this unchanged
};

We need to set up the Vector Search instance in our MongoDB Atlas. For that, you will have to navigate to your specific collection and then create a Search Index. Choose the option at the bottom that says Atlas Vector Search and JSON Editor. Once you’re in, add the following JSON to it. Remember that this is specific to the embedding model we are using. It can change depending on your model. Be mindful of that.

{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
}
]
}

Next up, we need to set up our Vector Store. The Vector Store will take in an array of documents, an embedding model and the dbConfig parameter we defined above. We need the documents in a certain format and that is why the first two statements are there. We can also add any metadata that we want.

let docs = [] //initialize an empty array for the documents
docs.push(new Document({ pageContent: JSON.stringify(data), metadata: []})) //adding the incoming data to the array
const vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, embedding_model, dbConfig); //vector store intialization

Query Processor

Here we are initializing the infamous Claude 3 Haiku. Currently making waves in the LLM world, Haiku is the fastest model from Anthropic. The ChatAnthropic constructor from Langchain will take in the model’s name, an optional maximum token count and the Anthropic API Key.

const llm = new ChatAnthropic({ //initializing the LLM model
modelName: "claude-3-haiku-20240307", //leave this unchanged
maxTokens: 1024, //max tokens in the response
anthropicApiKey: "", //add your anthropic API key here
});

Next we will initialize MongoDB and OpenAI Embedding Model again just like we did before. Nothing new here.

const client = new MongoClient(""); //add your MongoDB Atlas connection string
const database = client.db(""); //add your MongoDB Atlas database name
const collection = database.collection(""); //add your MongoDB Atlas collection name
const dbConfig = {
collection: collection, //add collection here
indexName: "vector_index", //add Vector Index name
textKey: "text", //you can leave this unchanged
embeddingKey: "embedding", //you can leave this unchanged
};

const embeddings = new OpenAIEmbeddings({ //initializing the OpenAIEmbeddings model
openAIApiKey: "", //add your OpenAI API key here
modelName: "text-embedding-3-small" //leave this unchanged
})

Now we have to run the semantic search to fetch the relevant documents. After we initialize the Vector Store (this time without the documents variable) we can run then the similarity search function that will take in your query and an integer k. This number will determine how many closest documents are returned after the cosine similarity has been executed.

const vectorStore = await new MongoDBAtlasVectorSearch(embeddings, dbConfig); //initializing vector store
const vectorResult = await vectorStore.similaritySearch(query, 5); //running similarity search for the query

This is where the magic of Langchain comes in. First we add our prompt. The prompt will tell the LLM what it is and how it has to function. We will add two placeholders in the prompt {context} and {question} that will be populated on runtime. Then we will initialize a createStuffDocumentsChain which will take in our LLM, prompt and an Output Parser as parameters. After that, we only have to run the invoke method to get our beautifully crafted response from our LLM.

const prompt = PromptTemplate.fromTemplate( //add your prompt here to optimize your performance
"You are a agent that will analyse and give statistical responses for the data CONTEXT: {context} USER QUESTION: {question}"
)

const ragChain = await createStuffDocumentsChain({ //rag chain to emulate chatbot behaviour
llm,
prompt,
outputParser: new StringOutputParser(),
});

const result = await ragChain.invoke({ //invoke the chain to return response
question: query,
context: vectorResult,
});

Conclusion

Let’s go over the power of Claude 3 Haiku and other models from Anthropic. The benchmark below says it all. While the Opus model has overthrown GPT-4, the Haiku model has created a huge disruption in the industry because it provides exceptional speeds, wonderful performance and an affordable pricing model.

The model has different costs for input and output tokens. for a million input tokens, you’ll be charged USD 0.25 and for a million output tokens, you be charged USD 1.25. That’s a significant difference from GPT-4 considering how good the performance is. Other than this, Mistral is another strong player on this leaderboard when it comes to LLMs. Similarly, there are a lot of Vector Databases out there and a lot of Embedding Models, we just have to choose what works best for us.

Thanks for reading!

--

--