Automated "Code-to-Insight" Data Science Report Generator
Repo2Report is an AI-powered tool designed to bridge the gap between technical code repositories and business stakeholders. It ingests raw GitHub repositories (Jupyter Notebooks, Python scripts, Markdown), analyzes the code structure and outputs, and generates a professional "Industry 2026" Standard Data Science Report using Large Language Models (LLMs).
- Automated Ingestion: Clones and parses GitHub repositories instantly.
- Notebook Intelligence: Extracts executed outputs from
.ipynbfiles to validate results without re-running code. - Obsidian-Ready Output: Generates reports in clean Markdown, ready for your knowledge base.
- Privacy-First: Filters out PII and sensitive data before processing.
- Context-Aware: Uses specific "Industry 2026" templates for Executive Summaries, Methodology, and Ethics.
- Python 3.10+
- Google Gemini API Key (or OpenAI equivalent)
-
Clone the repository
git clone [https://github.com/yourusername/repo2report.git](https://github.com/yourusername/repo2report.git) cd repo2report -
Install dependencies
pip install -r requirements.txt
-
Set up environment variables Create a
.envfile in the root directory:GOOGLE_API_KEY=your_api_key_here
Run the Streamlit application:
streamlit run app.py-
Open the local URL provided (usually http://localhost:8501).
-
Paste a target GitHub Repository URL.
-
Click Generate Report.
-
Download the .md file.
The system follows a Retrieve-Read-Report pipeline:
-
Ingestion Agent: gitpython handles cloning; custom logic filters non-text files.
-
Parsing Layer: nbformat flattens notebooks; AST analysis identifies libraries.
-
LLM Critic: Generates section-specific insights (Methodology, Risks, Results).
-
UI/UX: Streamlit provides the frontend interface.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Vision Support: Parsing actual graphs from notebook outputs using Multimodal LLMs.
PDF Export: Direct conversion from Markdown to PDF.