A project demonstrating RAG capabilities using Ollama and Wikipedia as the knowledge base. This application shows how you can "finetune" any AI to YOUR actual needs without expensive model training. Simply feed your AI with your own knowledge base, company documents, internal wikis, product specifications, or any domain-specific content.
The best part? This architecture is easily applicable to other LLM models (GPT, Claude, etc.) and can work with your personal data sources beyond Wikipedia. The AI answers questions using your actual data, with full citation and source transparency.
Traditional AI models are trained on general knowledge and may not have access to your company's specific information, internal processes, or proprietary data. Retrieval-Augmented Generation (RAG) solves this by:
No Model Training Required - Use any pre-trained LLM (like Llama, GPT, Claude, etc.) without expensive fine-tuning
Real-Time Knowledge Updates - Add new information instantly without retraining models
Source Transparency - Every answer includes citations, so you know exactly where the information came from
Domain-Specific Expertise - Transform generic AI into a specialist for your company's needs
Cost-Effective - Significantly cheaper than training custom models while achieving similar results
See the dramatic difference RAG makes when answering questions. The comparison below shows how context transforms generic AI responses into accurate, cited answers.
Without RAG, the AI relies solely on its training data, which can lead to:
- Generic or outdated information
- No access to company-specific processes or documents
- Inability to cite sources
- Potential hallucinations when asked about proprietary information
With RAG, the AI:
- Retrieves relevant chunks π from your knowledge base using semantic search
- Synthesizes answers π§ using both the retrieved context and its general knowledge
- Provides citations π linking back to the original sources
- Stays current β‘ with your latest documents and information
- Answers accurately β about your company's specific content
The difference is clear: RAG-powered responses are grounded in your actual data, making the AI a reliable assistant for your organization's needs.
Build and manage your knowledge base with ease. The interface below shows both ingestion and management capabilities side by side.
|
Populate Your Knowledge Base |
Manage Your References |
Easily add content to your knowledge base through two methods:
-
Search by Topic - Enter topics (one per line), and the system automatically fetches up to five relevant Wikipedia pages for each topic. Perfect for quickly building a knowledge base around specific subjects.
-
Ingest Specific Articles - Paste full Wikipedia URLs to embed exact articles. Ideal for curated reading lists or when you need precise control over the content.
The ingestion process:
- Fetches content from Wikipedia
- Splits documents into semantic chunks
- Generates vector embeddings for each chunk
- Stores everything in Qdrant vector database for fast retrieval
Full transparency and control over your knowledge base:
- View All Ingested Content π - See every article, document, or reference in your knowledge base
- Monitor Chunk Counts π - Understand how your content is structured and indexed
- Remove Outdated Content ποΈ - Delete references that are no longer relevant
- Track Sources π - Each reference shows title, topic, URL, and chunk count
This management interface ensures your knowledge base stays clean, relevant, and up-to-date with your company's evolving needs.
- Semantic Search π - Find relevant information using vector similarity search
- Multi-Session Chat π¬ - Manage multiple conversation threads
- Source Citations π - Every answer includes clickable source links
- Wikipedia Integration π - Easy ingestion of Wikipedia content (easily extensible to other sources)
- Real-Time Updates β‘ - Add or remove knowledge without downtime
- Modern UI π¨ - Clean, responsive React interface
- RESTful API π - Full FastAPI backend with OpenAPI documentation
Before you begin, ensure you have the following installed:
- Python 3.11+ π (with pip)
- Node.js 18+ π¦ and npm
- Docker Desktop π³ (for Qdrant and MongoDB)
- Ollama π€ (for running local LLM models)
The easiest way to get started is using our automated startup scripts:
.\scripts\start.ps1scripts\start.batThese scripts will:
- Check if Docker is running β
- Start Qdrant vector database container (port 6333) ποΈ
- Start MongoDB container (port 27017) π
- Launch FastAPI backend server (port 8000) π
- Launch React frontend dev server (port 5173) βοΈ
To stop all services:
.\scripts\stop.ps1or
scripts\stop.batIf you prefer to set up manually or need more control:
# Navigate to backend directory
cd backend
# Create and activate virtual environment (using conda as example)
conda create -n RAG python=3.11
conda activate RAG
# Install Python dependencies
pip install -r requirements.txt# Navigate to web directory
cd web
# Install Node.js dependencies
npm installStart Qdrant and MongoDB using Docker:
# Start Qdrant vector database
docker run -d \
--name qdrant-rag \
-p 6333:6333 \
-p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
# Start MongoDB
docker run -d \
--name mongo-rag \
-p 27017:27017 \
-v $(pwd)/mongo_data:/data/db \
mongo:latestCreate a .env file in the backend/ directory (optional, defaults are provided):
# MongoDB Configuration
MONGODB_URI=mongodb://localhost:27017
MONGODB_DATABASE=rag_portfolio
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
COLLECTION_NAME=wiki_rag
# Embedding Model
EMBED_MODEL=sentence-transformers/bge-small-en-v1.5
VECTOR_SIZE=384
# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.2:3b
# Optional: Retriever threshold
RETRIEVER_SCORE_THRESHOLD=0.5Make sure Ollama is installed and running with your chosen model:
# Install Ollama from https://ollama.ai
# Pull the model
ollama pull llama3.2:3b
# Start Ollama server (usually runs automatically)Backend:
cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Frontend:
cd web
npm run devOnce everything is running, you can access:
- Frontend UI π₯οΈ: http://localhost:5173
- Backend API π: http://localhost:8000
- API Documentation π: http://localhost:8000/docs
- Qdrant Dashboard π: http://localhost:6333/dashboard
- MongoDB π: mongodb://localhost:27017
Retrieval-Augmented-Generation/
βββ backend/ # FastAPI backend
β βββ app/
β β βββ core/ # Settings and configuration
β β βββ db/ # Database clients (MongoDB, Qdrant)
β β βββ embeddings/ # Embedding model management
β β βββ models/ # Pydantic models
β β βββ routers/ # API route handlers
β β βββ services/ # Business logic services
β βββ requirements.txt # Python dependencies
βββ web/ # React frontend
β βββ src/
β β βββ api/ # API client
β β βββ components/ # React components
β β βββ ...
β βββ package.json # Node.js dependencies
βββ scripts/ # Startup/shutdown scripts
βββ demo/ # Demo screenshots
βββ README.md # This file



