GitHub - Krishna-Rani-t/Retrieval-Augmented-Generation

Retrieval-Augmented Generation (RAG) System with Ollama

This is a simple project where I implemented a Retrieval-Augmented Generation (RAG) system using Python and the Ollama API for running local language models. The idea is to use document retrieval (based on embeddings) to find the most relevant facts, and then generate answers using a local LLM (LLaMA 3.2-1B Instruct).

What the Code Does

Loads a dataset of short factual lines from a text file (like 'cat-facts.txt').
Uses an embedding model (bge-base-en) to generate embeddings for each line.
Stores these embeddings in a list called VECTOR_DB.
When the user asks a question, it:
- Embeds the query
- Finds the most similar chunks using cosine similarity
- Builds a prompt from the retrieved chunks
- Uses a local LLM to generate a response based only on that context
The chatbot response is streamed in real time.

How to Run

Requirements:

Python 3.8 or higher
Ollama installed from https://ollama.com
Download the models by running these commands: ollama pull hf.co/CompendiumLabs/bge-base-en-v1.5-gguf ollama pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF
Install the Python package: pip install ollama

Steps:

Make sure you have a file called cat-facts.txt in the same folder (each line = one fact).
Run the script: python demo.py
Type your questions in the terminal. Type 'exit' to quit.

Task 8: Experimentation & Reflection

Trying different top_n values:
- top_n = 1: Very focused, but sometimes misses additional useful context.
- top_n = 3: Default and balanced. Includes diverse and relevant facts.
- top_n = 5: Includes more info, but may be repetitive or slightly off-topic.
What limitations did you observe?
- The system depends a lot on the way the facts and questions are written.
- With a small dataset, answers are often repetitive or too simple.
- It can’t combine facts or reason very well.
- Sometimes the model still makes up information if it doesn’t find good context.
What could be improved with a larger dataset or better models?
- Using larger, paragraph-based datasets would give more meaningful answers.
- A stronger LLM would generate more accurate and fluent responses.
- Combining sparse (BM25) and dense retrieval could help improve result quality.
- Adding metadata or chunking full documents might give better grounding and traceability.

Files Included

demo.py : The full working script
cat-facts.txt : Sample dataset (can be replaced with your own)
README.md : This documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
cat-facts.txt		cat-facts.txt
demo.py		demo.py

Krishna-Rani-t/Retrieval-Augmented-Generation

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages