close
close

first Drop

Com TW NOw News 2024

Creating a RAG Chatbot with Langflow and Astra DB
news

Creating a RAG Chatbot with Langflow and Astra DB

A step-by-step guide to creating a RAG chatbot using Langflow’s intuitive interface, integrating LLMs with vector databases for context-driven responses.

Creating a RAG Chatbot with Langflow and Astra DBPhoto by Igor Omilaev on Unsplash

Retrieval-Augmented Generation, or RAG, is a natural language process that combines traditional retrieval techniques with LLMs to generate more accurate and relevant text by integrating the generation properties with the context provided by the retrievals. It has recently been widely used in the context of chatbots, giving companies the opportunity to enhance their automated customer communications by leveraging advanced LLM models customized to their data.

Langflow is the graphical user interface of Langchain, a centralized development environment for LLMs. Released in October 2022, Langchain was one of the most used open-source projects on GitHub by June 2023. It took the AI ​​community by storm, especially for its framework built to create and customize multiple LLMs with features such as integrations with the most relevant text generation and embedding models, the ability to chain LLM calls together, the ability to manage prompts, the option to equip vector databases to accelerate computations and smoothly serve results to external APIs and task flows.

This article presents an end-to-end RAG Chatbot created with Langflow using the famous Titanic dataset. First, the registration needs to be done in the Langflow platform, here . To start a new project, some handy pre-built flows can be quickly customized based on the user’s needs. To create a RAG Chatbot, the best option is to use the Vector shop RAG template. Figure 1 shows the original flow:

Figure 1 — Langflow Vector Store RAG Template Flows. Source: The Author.

The template has OpenAI pre-selected for the embeddings and text generations, and those are the options used in this article, but other options such as Ollama, NVIDIA, and Amazon Bedrock are available and easy to integrate by just setting the API key. Before using the integration with an LLM provider, it is important to check that the chosen integration is active in the configurations, just like in Figure 2 below. Also, global variables such as API keys and model names can be defined to facilitate the input on the flow objects.

Figure 2 — OpenAI Active Integrations and Overview. Source: The author.

There are two different flows on the Vector Store Rag template. The flow below shows the retrieval part of the RAG, where the context is provided by uploading a document, splitting it, embedding it and then storing it in a Vector Database on Astra DB that can be easily created on the flow interface. Currently, the Astra DB object retrieves the Astra DB application token by default, so there is no need to even collect it. Finally, the collection that will store the embedded values ​​in the Vector DB needs to be created. The collection dimension needs to match that of the embedding model, which is available in the documentation, for correct storage of the embedding results. So if the chosen embedding model is OpenAI’s text-embedding-3-small, the collection dimension created should be 1536. Figure 3 below shows the full retrieval flow.

Figure 3 — Pickup current from the Titanic dataset. Source: The author.

The dataset used to enhance the chatbot context was the Titanic dataset (CC0 license). At the end of the RAG process, the chatbot should be able to provide specific details and answer complex questions about the passengers. But first, we update the file on a generic file loader object and then split it using the global variable “separator;” because the original format was CSV. Also, the chunk overlap and chunk size were set to 0 because each chunk will be a passenger by using the separator. If the input file is in straight text format, it is necessary to apply the chunk overlap and size settings to make the embeddings correct. To complete the flow, the vectors are stored in the titanic_vector_db on the demo_assistant database.

Figure 4 — Complete generation flow. The author.

Moving to the RAG generation flow, shown in Figure 4, it is triggered with the user input on the chat, which is then searched in the database to provide context for the prompt later. So if the user asks something related to the name “Owen” on the input, the search is performed through the collection of the vector DB looking for “Owen” related vectors, fetches them and runs them through the parser to convert them into text, and finally the context needed for the prompt later is obtained. Figure 5 shows the results of the search.

Figure 5 — Result of the query performed in the Vector DB to obtain context. Source: The author.

Back to the beginning, it is also crucial to reconnect the embedding model to the vector DB using the same model in the retrieval flow to perform a valid query, otherwise it would always be empty because the embedding models used in the retrieval and generation flows would be different. Furthermore, this step proves the huge performance benefits of using vector DBs in a RAG, where the context needs to be quickly retrieved and passed to the prompt before forging an answer of any kind to the user.

In the prompt shown in Figure 6, the context comes from the parser already converted to text, and the question comes from the original user input. The figure below shows how the prompt can be structured to integrate the context with the question.

Image 6 — Prompt passed to the AI ​​model. Source: The Author.

Now that the prompt is written, it’s time for the text generation model. In this flow, we’ve chosen the GPT4 model with a temperature of 0.5, a recommended standard for chatbots. The temperature controls the randomness of predictions made by an LLM. A lower temperature will generate more deterministic and straightforward answers, resulting in more predictable text. A higher temperature will generate more creative outcomes, although if set too high, the model can easily hallucinate and produce incoherent text. Finally, simply set the API key using the global variable with OpenAI’s API key and it’s as simple as that. Then, it’s time to run the flows and check the results on the playground.

Image 7 — Playground with the result of the RAG Chatbot. Source: The author.

The conversation in Figure 7 clearly shows that the chatbot has correctly obtained the context and correctly answered detailed questions about the passengers. And while it may be disappointing to find out that there was no Rose or Jack on the Titanic, unfortunately it is true. And that’s it. The RAG chatbot has been created and can of course be improved to improve conversational performance and cover some potential misinterpretations, but this article shows how easy Langflow makes it to customize and personalize LLMs.

Finally, there are multiple options for implementing the flow. HuggingFace Spaces is an easy way to deploy the RAG chatbot with scalable hardware infrastructure and native Langflow that requires no installations. Langflow can also be installed and used via a Kubernetes cluster, a Docker container, or directly in GCP using a VM and Google Cloud Shell. Check out the documentation for more information on deployment.

New times are dawning and low-code solutions are starting to set the tone for how AI will be developed in the real world in the near future. This article described how Langflow is revolutionizing AI by centralizing multiple integrations with an intuitive UI and templates. Today, anyone with basic AI knowledge can build a complex application that would have required a huge amount of coding and deep learning framework expertise at the beginning of the decade.


Creating a RAG chatbot with Langflow and Astra DB was originally published in Towards Data Science on Medium. People continued the conversation by bookmarking and commenting on this story.