LangServe: How to Deploy Your Chains

Fateh Ali Aamir
3 min readJun 6, 2024

--

What is LangServe?

LangServe helps developers deploy LangChain runnables and chains as a REST API. This library is integrated with FastAPI and uses Pydantic for data validation. It is a very powerful tool when it comes to quick and efficient deployment. And it offers you a playground as well, which is a really valuable feature.

main.py

from fastapi import FastAPI, HTTPException
from langserve import add_routes
import uvicorn
from app.services.agents.sales_agent import sales_chain
from app.services.agents.support_agent import support_chain

Here we are using the standard fastapi and uvicorn libraries to set up our server. We’re also using the add_routes function from the langserve library. We’re then importing the sales_chain and the support_chain that we will deploy using LangServe.

app=FastAPI(
title="Langchain Server",
version="1.0",
decsription="A simple API Server"
)

add_routes(
app,
ChatOpenAI(),
path="/openai"
)

add_routes(
app,
sales_chain,
path="/sales"
)

add_routes(
app,
support_chain,
path="/support"
)

if __name__=="__main__":
uvicorn.run(app,host="localhost",port=8000)

The main.py file is fairly simple. We start our app with the FastAPI initialiser. We then add our routes. The add_routes function will take in three parameters, the FastAPI app, the chain or runnable, and the path you want to deploy it on. Pretty simple, right?

sales_agent.py

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone
import os
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate

# Set up Pinecone index
index = "bae360"
os.environ["OPENAI_API_KEY"] = ""
os.environ["PINECONE_API_KEY"] = ""
pinecone = Pinecone(api_key="")

The sales_agent.py file is where we will develop our chain. I’ll only be giving an example of one chain here as the other one is just a copy of this apart from the prompt. So here we are using various libraries, most notably langchain, langchain_core, langchain_openai and pinecone. After that, we are setting up our API keys.

chat_llm = ChatOpenAI(
openai_api_key="",
model= "gpt-3.5-turbo",
temperature=0,
verbose=True,
)

llm = ChatOpenAI(model="gpt-3.5-turbo")

# Set up OpenAI Embeddings model
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small", dimensions=1536)

# Load Pinecone index and create vector store
vector_store = PineconeVectorStore(index_name=index, embedding=embedding_model)

Here we are setting up our LLM, embedding model and the vector store. The ChatOpenAI function takes in the usual parameters, the key and model are required ones while the others can be skipped. The embedding model will require you to specify the model while the dimensions are optional. Finally, we’re spinning our Pinecone vector store instance by giving it the index_name and the embedding model object.

# Define template for prompts
prompt_template = """You are an email improvement bot. You will be given emails and you must output grammatically and concise emails.

CONTEXT:
{context}

QUESTION:
{question}
"""

rag_prompt = ChatPromptTemplate.from_template(prompt_template)

# Perform similarity search in vector store
retriever = vector_store.as_retriever()

entry_point_chain = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)

sales_chain = entry_point_chain | rag_prompt | llm | StrOutputParser()

In the later part of our flow, we’re adding our prompt_template to the ChatPromptTemplate to get the rag_prompt. After that, we use our vector store as a retriever by using the as_retriever() function. Next, we create our entry_point_chain, which is crucial for input. We specify that the context will be retrieved by the retriever and the question will be received from the RunnablePassthrough() function. Finally, we set up our chain using the LCEL, LangChain Expression Language. We add our entry_point_chain, rag_prompt, llm and StrOutputParser() and our chain is ready!

The Playground

So the playground is a really interesting part of LangServe. Hop on to http://localhost:8000/sales/playground/ (replace `sales` with your own path) and enjoy this built-in feature where you can easily test your endpoints with ease. It even shows you the intermediate steps taken by the chain.

These are amazing developments from LangChain and there are only good things to follow. With the speed at which development is growing in the Generative AI space, it is only a matter of time before even greater things are out there to leave us all standing in awe. And a special thanks to AI Makerspace for their highly insightful YouTube channel that helped me implement the RAG-based approach to this solution.

--

--