Turning AI into a Trustworthy Assistant: How I Used RAG to Reduce Hallucinations

A practical walkthrough of building accurate, source-backed AI systems for legal, support, and internal knowledge applications.

Continue reading...

Featured Project
case study

What Happens If AI Handles Customer Support? ServiceNow Found Out

May 03, 2025

See how ServiceNow use smart AI agents to answer 80% of customer questions quickly, saving time and making customers happier.

Project
portfolio

Building a Crypto Price Feed for the Web

May 03, 2025

I built a smart tool that helps anyone interested in cryptocurrency track the latest prices and data without the hassle of checking daily.

From the Firehose

About

I built a smart AI tool that helps answer questions from uploaded documents, making it easy for businesses to get accurate info with sources fast.

Anyone who uses AI regularly should notice that sometimes, AI makes things up and delivers it in the most confident tone that makes it hard to doubt it without prior knowledge or fact checking.

But in production, we want AI to actually automate a process or two without us having to check it for errors regularly and having to deal with the embarrassment of providing hallucinated responses to customers or clients.

So how to do this? How to make mister AI agent or mister AI chatbot or any AI app have more accurate data?

You give it less data.

You read that correctly, If your prompt carries a whole document for every answer your chatbot provides, that's not just a lot of unnecessary AI usage that'll add to your bills, that's a lot of data noise that'll cause unspecific answers.

So here's my take on more accurate AI in a simple way and in only a few dozen lines of code. Let's get some RAGs.

Retrieval Augmented Generation (RAG) is a friend you should want to make, and here's why:

I'll be showing how it works in a private website I made.

First, let's upload a file to get answers from, I got the PDF version of the psychology of money:

 

Article content

PDF upload page for RAG

Once it is uploaded, some backend processing happens on the file to make it ready. And then the following page appears:

Article content

Testing RAG in a website

The best way to explain RAG is to show how it works. In the screenshot above is a form that allows you to ask questions about the uploaded file.

In the "Query" text box, you can ask questions and click the blue button to get answers from the document. Nothing special so far, right ?

However, the answer comes in three parts. We have the response that is quite literally the answer to the question.

Then the sources, showing from which document and which page the answer is deduced from.

And the context is basically the actual pieces of text from the document that were used to answer the question. It's in a spoiler, here's how it looks like :

 

Article content

Context used for the response

Without RAG, AI answers from the knowledge base you provide it and from the variance of fish weight in the Atlantic Ocean, learned about in the last OpenAI update.

A mix of relevant and irrelevant data sources doesn't work well for AI accuracy.

It's even more complicated when you have a chatbot in a real estate agency website and a bored teenager comes asking about explosive household materials.

You want your chatbot to say "I don't know" instead of "Here's a list of household explosive materials, how else can I help?"

Kind of like this:

 

Article content

RAG finds no answer

Now let's talk a bit about how it works. Here's the flowchart :

 

Article content

RAG system flowchart

Then below is the code for it all, if you're not into the technical part, you can just stop reading here because below is just going to be a headache. Or you can leave your questions down in the comments.

The secret here is using a database to store the information in the uploaded file as a lot of numbers that contain the meaning of the words and phrases in the document. This is called the vector space.

These numbers are called embeddings, and I used OpenAI's API to get them set up for questioning. These embeddings are stored in a Chroma database using the following code:

def split_text(self, documents: list[Document]):
    # Initialize a text splitter with specific chunk size and overlap settings
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=300,         # Maximum size of each chunk in characters
        chunk_overlap=100,      # Overlap between chunks to maintain context
        length_function=len,    # Function used to measure text length
        add_start_index=True,   # Include the start index of each chunk in metadata
    )
    
    # Split the documents into smaller chunks
    self.chunks = text_splitter.split_documents(documents)
    
    # Print the number of documents processed and the total number of chunks created
    print(f"Split {len(documents)} documents into {len(self.chunks)} chunks.")

    # Select the 10th chunk for inspection (example chunk)
    document = self.chunks[10]
    
    # Print the content of the selected chunk
    print(document.page_content)
    
    # Print metadata (e.g., source, page number) of the selected chunk
    print(document.metadata)

    # Return the list of chunks
    return self.chunks

def save_to_chroma(self, chunks: list[Document]):
    # Clear out the existing Chroma database if it exists
    if os.path.exists(self.CHROMA_PATH):
        shutil.rmtree(self.CHROMA_PATH)  # Remove the existing database directory

    # Create a new Chroma database from the provided chunks and embeddings
    db = Chroma.from_documents(
        chunks,                     # List of document chunks to be saved
        OpenAIEmbeddings(),         # Embedding model used for vectorization
        persist_directory=self.CHROMA_PATH  # Directory where the database is stored
    )

    # Print a confirmation message showing the number of saved chunks and the directory used
    print(f"Saved {len(chunks)} chunks to {self.CHROMA_PATH}.")

Then to process the data stored in the Chroma database, here is the code that finishes the lap and sends a dict with the data to my flask server:

def get_response(self):
    # Initialize the embedding function to convert text into vectors
    embedding_function = OpenAIEmbeddings()
    
    # Load the Chroma database using the specified persist directory and embedding function
    db = Chroma(persist_directory=self.CHROMA_PATH, embedding_function=embedding_function)

    # Search the database for the top 3 most relevant chunks based on the query text
    results = db.similarity_search_with_relevance_scores(self.query_text, k=3)
    
    # Check if there are no results or if the highest relevance score is below 0.7
    if len(results) == 0 or results[0][1] < 0.7:
        print(f"Unable to find matching results.")  # Print a message if no good match is found
        return  # Exit the function if no suitable results are available

    # Combine the content of the top results into a single context string, separated by "---"
    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    
    # Create a chat prompt template using the provided template
    prompt_template = ChatPromptTemplate.from_template(self.PROMPT_TEMPLATE)
    
    # Format the prompt by inserting the context and the user query
    prompt = prompt_template.format(context=context_text, question=self.query_text)
    
    # Print the final formatted prompt (useful for debugging)
    print(prompt)

    # Extract just the content of the top results into a list for reference
    context_texts = [doc.page_content for doc, _score in results]

    # Initialize the OpenAI chat model with the specified model version
    model = ChatOpenAI(model="gpt-4o-mini")
    
    # Get the model's response by invoking it with the formatted prompt
    response_text = model.invoke(prompt)

    # Extract sources with their file path and page number from the metadata
    sources = [
        (doc.metadata.get("source", "unknown"), doc.metadata.get("page", "unknown"))
        for doc, _score in results
    ]
    
    # Format the final response by including the model's response and the source information
    self.formatted_response = f"Response: {response_text.content}\nSources: {sources}"
    
    # Return a dictionary containing the response, sources, prompt, and context text
    return {"response": response_text.content, "sources": sources, "prompt": prompt, "context": context_texts}

These are class methods, I create objects from them in my flask server code below to show them on the website:

@app.route('/portfolio/RAG/setup', methods=["POST", "GET"])
def rag_setup():
    # Initialize the form for file upload
    form = RagForm1()
    
    # Check if the form submission is valid (POST request with valid input)
    if form.validate_on_submit():
        uploaded_file = form.file.data  # Get the uploaded file from the form

        # Secure the file name to prevent malicious filenames
        secured_filename = secure_filename(uploaded_file.filename)

        # Save the uploaded file into a temporary folder with a secured name
        temp_dir = os.path.join(TEMP_FOLDER, secured_filename)
        uploaded_file.save(temp_dir)

        # Initialize the RAG database with the temporary folder containing the file
        rag_database = RagDatabase(TEMP_FOLDER)
        rag_database.prepare()  # Prepare the database by processing the uploaded document

        # Redirect the user to the action page after setting up the database
        return redirect(url_for('rag_action'))

    # Render the setup page with the form for file upload
    return render_template("RAG.html", form=form)


@app.route('/portfolio/RAG/results', methods=["GET", "POST"])
def rag_action():
    # Initialize the form for user query input
    form = RagForm2()
    
    # Variables to store response, sources, prompt, and context
    response = None
    sources = None
    prompt = None
    context = None

    # Check if the form submission is valid (POST request with query input)
    if form.validate_on_submit():
        query = form.query.data  # Get the user query from the form
        
        # Initialize the RAG query with the user's query
        rag_query = RagQuery(query)
        
        # Get the response by querying the prepared RAG database
        response_dict = rag_query.get_response()
        
        # Try extracting the response and other details from the result dictionary
        try:
            response = response_dict["response"]  # Model's answer to the query
            sources = response_dict["sources"]    # List of sources used for the answer
            prompt = response_dict["prompt"]      # Full prompt sent to the model
            context = response_dict["context"]    # Context used to generate the response
        except TypeError:
            # Handle the case where no relevant answer is found
            response = "There is nothing in the document that could answer your question."
    
    # Render the results page with the form, response, sources, and context
    return render_template("RAG_results.html", form=form,
                           response=response, sources=sources,
                           prompt=prompt, context=context,
                           enumerate=enumerate)  # Pass enumerate for use in the template

A simple application of RAG may not be enough to show how actually useful it is, but here's three more use cases:

 

  1. Customer Support Knowledge Bases: RAG can instantly retrieve relevant information from large collections of internal documents, product manuals, or troubleshooting guides. This reduces response times, improves agent efficiency, and enhances customer satisfaction by providing precise answers.
  2. Legal Document Analysis: For legal teams dealing with lengthy contracts, case law, or regulations, RAG helps by quickly surfacing the most relevant sections. This speeds up research and reduces the risk of missing critical information during reviews.
  3. Research and Technical Documentation: In fields like scientific research, engineering, or software development, RAG enables users to query vast archives of papers or technical docs, extracting key insights without manually reading through hundreds of pages.

These use cases showcase how RAG can deliver high ROI by reducing manual effort, speeding up decision-making, and improving accuracy across various industries.

You can try RAG for yourself by clicking the following button:

Try it here


If you haven't seen enough, here's another one for you. I dove into the world of recruitment and found a way to make it faster and fairer! Using cool technology called AI, I created a system that can quickly look through lots of resumes to find the best candidates. This means I save time and help companies make better choices without bias.

Learn More

About

I built a smart AI tool that helps answer questions from uploaded documents, making it easy for businesses to get accurate info with sources fast.