How to Set up a Free (Local) Generative AI Application

Fateh Ali Aamir
3 min readJun 3, 2024

--

Today, we’re going to create a completely free Generative AI application that can run locally on your computer offline. We will be doing that by using Ollama, LangChain, Unstructured and Qdrant.

Ollama

Ollama is an open-source platform that lets users run large language models (LLMs) on their local machine. It’s a lightweight framework that simplifies the process of downloading, installing, and interacting with LLMs, without requiring technical expertise or cloud-based platforms.

Click here to install Ollama or run the following command:

curl -fsSL https://ollama.com/install.sh | sh

Once you’ve installed this you can head over the the Ollama model page to choose any model to work with. I suggest installing Llama 3 as the LLM and Nomic as the Embedding Model. Once you’ve installed Ollama, you can install the models by using the commands below:

ollama pull llama3
ollama pull nomic-embed-text

Once you’ve got these models up and running, we can start moving towards the implementation.

LangChain, Unstructured and Qdrant

LangChain is a framework for developing applications powered by large language models (LLMs). The Unstructured library is an open-source tool that helps process and structure unstructured text documents for machine learning tasks. Qdrant is an open-source vector database and vector similarity search engine written in Rust. It provides a production-ready service with an API for storing, searching, and managing high-dimensional points, or vector embeddings, along with metadata called payloads.

Let’s into the Chat Inference code. First we will make the neccessary imports:

from langchain_community.vectorstores import Qdrant
from langchain_community.document_loaders import UnstructuredFileLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.chat_models import ChatOllama

Next, we’re initializing our models:

embeddings = OllamaEmbeddings(
model="nomic-embed-text",
)

llm = ChatOllama(model="llama3")

We’re setting up our Nomic Embedding Model here and also initializing Llama 3 using LangChain’s ChatOllama interface.

doc_loader = UnstructuredFileLoader("documents/Chapter 644 u Brain Abscess.pdf")
documents = doc_loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

qdrant = Qdrant.from_documents(
docs,
embeddings,
path="/tmp/local_qdrant",
collection_name="my_documents",
)

retriever = qdrant.as_retriever(search_type="mmr", k=10)

Here we are using the UnstructuredFileLoader to load files that include text and images because the UnstructuredFileLoader can extract all types of data. After that, we’re using the CharacterTextSplitter to chunk our text. Finally, we’re starting our Qdrant instance using the documents, embedding model, our path and the collection name. After that, we initialize Qdrant as our retriever that will use the Max Marginal Relevance Search to retrieve our documents. There is more about that here.

query = "Create flash cards for Brain Abscess?"

found_docs = retriever.invoke(query)

context = found_docs
question = query

Here, we have our query. First, we will retrieve the relevant documents from Qdrant and then we set those values to context and question.

print("Getting response...")
response = llm.invoke(f"Use the following context: {context}, Question: {question} Your response should be in the following JSON format: {{ \"flashcards\": [ {{ \"card\": \"card\" }} ] }}")

print(response)

Finally, we’re getting our response using the llm.invoke() function. This function will take in our prompt with the message placeholders. After that, we can use our response as we wish.

Conclusion

The best part about this application is that it is free and it can run offline whenever you want. All you’ll have to do is update your Ollama models from time to time but other than that, this is perfect for your personal use. And if you want to take this up a notch, you can host this on your server and access this from anywhere in the world. The possibilities are endless as Generative AI and LLMs become common to the masses. And a funny thing about Ollama is that you can also find uncensored models here for a bit of fun 😜

Happy Coding!

--

--