A production-grade, tool-augmented conversational AI system built on LangGraph, featuring persistent memory, PDF RAG, real-time web search, and full observability via LangSmith.
ARA is an agentic research assistant that demonstrates how modern LLM applications move beyond simple prompt→response patterns into stateful, tool-calling workflows. Unlike basic chatbots, ARA maintains conversation context across sessions, dynamically invokes external tools (web search, news, semantic research), and retrieves information from uploaded PDF documents using vector similarity search.
This project exists to solve a specific architectural challenge: how do you build an AI assistant that remembers, reasons, acts, and explains—while remaining observable and debuggable? The answer lies in treating the LLM as a decision-maker within a graph-based state machine, rather than a monolithic black box.
ARA follows a LangGraph-native architecture where the LLM operates as a node within a directed graph. The system boundaries are:
| Layer | Responsibility |
|---|---|
| Frontend | Streamlit chat UI with streaming responses |
| Orchestration | LangGraph StateGraph with conditional tool routing |
| LLM Backend | Groq (Llama 3.3 70B) with automatic fallback |
| Tools | Web search (Serper), News (NewsAPI), Research (Tavily), RAG |
| Persistence | SQLite checkpointer + ChromaDB vector store |
| Observability | LangSmith traces with thread grouping |
The key insight is that the LLM never directly "calls" tools—it emits structured tool requests that the graph runtime intercepts, executes, and feeds back into the conversation state.
ARA implements a chat → tool → chat feedback loop using LangGraph's StateGraph:
┌─────────┐
│ START │
└────┬────┘
│
▼
┌─────────┐ needs_tool? ┌─────────┐
│ chat │────────────────────▶│ tools │
│ node │◀────────────────────│ node │
└────┬────┘ └─────────┘
│ no tools needed
▼
┌─────────┐
│ END │
└─────────┘
class ChatState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]The add_messages reducer ensures that tool outputs and LLM responses accumulate correctly without overwriting history.
graph.add_conditional_edges(
"chat",
tools_condition, # Built-in LangGraph router
{"tools": "tools", END: END}
)When the LLM emits a tool_call, tools_condition routes to the ToolNode. When it emits plain text, the graph terminates.
ARA treats tools as first-class citizens using LangGraph's ToolNode pattern:
| Tool | Provider | Use Case |
|---|---|---|
web_search |
Serper API | General factual queries |
news_search |
NewsAPI | Time-sensitive current events |
tavily_search |
Tavily | Deep semantic research |
rag_search |
ChromaDB | PDF document retrieval |
The system prompt includes temporal triggers that force tool usage:
TEMPORAL = ["current", "today", "news", "latest", "recent", "price", "market"]If the user query contains these keywords, the LLM is instructed to always invoke a search tool before responding—preventing hallucinated dates, prices, or events.
Unlike traditional RAG architectures where retrieval happens before LLM invocation, ARA exposes RAG as a callable tool:
@tool
def rag_search(query: str, thread_id: str) -> str:
"""Search uploaded PDF documents for relevant excerpts."""This means the LLM decides when document context is needed, rather than blindly injecting context on every turn.
conn = sqlite3.connect(SQLITE_DB_PATH, check_same_thread=False)
checkpointer = SqliteSaver(conn)
chatbot = graph.compile(checkpointer=checkpointer)Every message exchange is persisted to SQLite. When a user returns to a previous thread, the full conversation state is restored—including tool call history.
Each conversation receives a unique thread_id:
config = {"configurable": {"thread_id": "abc-123"}}
chatbot.stream(state, config)This enables:
- Per-thread conversation memory
- Per-thread vector store collections
- LangSmith trace grouping
PDF embeddings are stored in ChromaDB with thread-scoped collections:
Chroma.from_documents(
chunks,
embedding=embeddings,
collection_name=f"thread_{thread_id}",
persist_directory=CHROMA_PERSIST_DIR,
)LLM applications are notoriously difficult to debug. A user asks "What's the stock price of Apple?" and receives incorrect output. Was it:
- A hallucination?
- A tool failure?
- A parsing error?
- Rate limiting?
ARA addresses this with comprehensive tracing:
@traceable(name="chat_node")
def chat_node(state, config):
...Every component—LLM calls, tool invocations, embedding requests—is traced with:
- Latency metrics
- Token counts
- Input/output payloads
- Thread grouping for multi-turn analysis
Run: "chat_turn"
├── chat_node (LLM invocation)
│ └── bind_tools
├── tools (ToolNode)
│ └── web_search
│ └── HTTP request
└── chat_node (final response)
Traditional request-response patterns create dead air while the LLM thinks. ARA uses LangGraph's native streaming:
for event in chatbot.stream(state, config, stream_mode="values"):
if "messages" in event:
yield event["messages"][-1].contentThis enables:
- Token-by-token response rendering
- Visible tool execution feedback
- Perception of faster response times
| Component | Technology |
|---|---|
| LLM | Groq (Llama 3.3 70B Versatile) |
| Orchestration | LangGraph |
| Embeddings | HuggingFace Inference API (BGE-small) |
| Vector Store | ChromaDB |
| Checkpointing | SQLite via SqliteSaver |
| Web Search | Serper API |
| News Search | NewsAPI |
| Research Search | Tavily |
| Observability | LangSmith |
| Frontend | Streamlit |
| Language | Python 3.11+ |
-
HuggingFace Inference Timeouts: The hosted inference API returns 504s under load. Solved with batching (16 texts/batch), retry logic (3 attempts), and exponential backoff.
-
Groq Tool Call Format: Groq's function calling occasionally outputs malformed JSON. Lowered temperature to 0.25 and simplified tool signatures.
-
Thread State Restoration: Rebuilding conversation UI from checkpoint data required deduplication logic to handle reducer accumulation.
-
Embedding Shape Variance: HuggingFace returns inconsistent tensor shapes (
[[[vec]]]vs[[vec]]). Built a normalization layer to handle all cases. -
Chroma Collection Naming: Thread UUIDs contain hyphens, which Chroma rejects. Implemented sanitization:
thread_abc-123→thread_abc_123.
- No multi-modal support: Images and audio are not processed.
- Single-user design: No authentication or multi-tenancy.
- Cold start latency: First embedding request after idle can take 5-10 seconds.
- Context window limits: Very long PDFs may exceed chunk capacity.
- No citation linking: Tool results are summarized but sources aren't hyperlinked in responses.
- Agent memory abstraction: Replace per-thread isolation with semantic memory (remembering across threads).
- Parallel tool execution: Run web_search and news_search concurrently for faster responses.
- Multi-agent delegation: Spawn specialist sub-agents for research, writing, analysis.
- Self-correction: Implement reflection loops where the agent critiques and revises its own output.
- Deployment: Containerize for production with PostgresSaver and Redis caching.
This project demonstrates competency in:
- LangGraph — State machines, conditional routing, tool binding, checkpointers
- LangChain — Tool definitions, message types, embedding interfaces
- LLMOps — LangSmith tracing, latency optimization, error handling
- RAG Engineering — Chunking strategies, vector stores, retrieval-as-tool pattern
- Production Python — Async patterns, retry logic, configuration management
- System Design — Separating orchestration from execution, thread-scoped state
- Python 3.11+
- API keys: Groq, HuggingFace, Serper, NewsAPI, Tavily (optional), LangSmith (optional)
git clone https://github.com/ayushsyntax/Agentic-Research-Assistant.git
cd Agentic-Research-Assistant
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtCreate .env from the example:
cp .env.example .env
# Edit .env with your API keysstreamlit run src/app.pyara/
├── src/
│ ├── app.py # Streamlit frontend
│ ├── config.py # Environment + LangSmith setup
│ ├── database.py # SQLite checkpointer + thread names
│ ├── embeddings.py # HuggingFace embedding client
│ ├── graph.py # LangGraph workflow
│ ├── llm.py # Groq LLM factory
│ ├── rag.py # PDF ingestion + retrieval tool
│ └── tools.py # Web/news/research tools
├── tests/
│ └── test_*.py # Unit tests
├── data/
│ ├── sqlite/ # Checkpoint persistence
│ └── chroma/ # Vector store
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI
├── requirements.txt
├── .env.example
└── README.md
Built with curiosity and caffeine.



