Skip to content

ccronca/pragma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pragma

Historical GitLab MR Database with Semantic Search API

Pragma indexes your GitLab merge requests and provides a REST API and an MCP server for AI assistants to query for historical context. Instead of doing code reviews locally, it acts as a searchable knowledge base that external tools (Claude Code, Gemini, etc.) can query to get institutional knowledge from past code changes, discussions, and decisions.

Connect Pragma to your AI assistant via MCP and ask questions like:

Search pragma discussions for "authentication strategy" and summarize what
the team decided in past MRs.

Features

  • Semantic Search: Find similar historical MRs using vector similarity
  • Rich Context: Indexes titles, descriptions, diffs, and discussions
  • REST API: Simple HTTP endpoints for external tools
  • Secure by Default: Localhost-only binding (127.0.0.1)
  • GitLab Integration: Fetches MRs via python-gitlab API
  • ChromaDB: Local vector database for fast retrieval

Quick Start

# 1. Install dependencies
uv sync

# 2. Initialize configuration
uv run pragma init

# 3. Edit config.yaml with your GitLab repository details

# 4. Test connection
uv run pragma test-connection

# 5. Index historical MRs
uv run pragma index

# 6. Start API server
uv run pragma serve

Visit http://localhost:8000/docs for interactive API documentation.

Usage

CLI Commands

# Initialize and configure
uv run pragma init

# Test GitLab connection
uv run pragma test-connection

# Index merge requests
uv run pragma index

# Start API server (localhost only, secure)
uv run pragma serve

# Start with custom port
uv run pragma serve --port 8080

# Development mode with auto-reload
uv run pragma serve --reload

API Endpoints

  • POST /search - Semantic search for similar MRs
  • GET /mrs/{mr_id} - Get full details of specific MR
  • GET /mrs - List all indexed MRs
  • GET /stats - Database statistics
  • GET /health - Health check

Example: Search for Similar MRs

import requests

response = requests.post("http://localhost:8000/search", json={
    "code_diff": "your code diff here",
    "top_k": 5,
    "min_score": 0.5
})

similar_mrs = response.json()
for mr in similar_mrs:
    print(f"!{mr['mr_id']}: {mr['mr_title']} (score: {mr['similarity_score']:.4f})")

MCP Server Integration

Pragma includes an MCP server (src/mcp_server.py) exposing three tools:

Tool Description
mcp__pragma__search Semantic search over historical MRs
mcp__pragma__get_mr Get full details of a specific MR
mcp__pragma__list_mrs List all indexed MRs

Setup

Add to your MCP client configuration (e.g. Claude Code ~/.claude.json):

{
  "mcpServers": {
    "pragma": {
      "command": "uv",
      "args": [
        "run",
        "--project", "/path/to/pragma",
        "python",
        "/path/to/pragma/src/mcp_server.py"
      ],
      "env": {
        "PRAGMA_API_URL": "http://localhost:8000"
      }
    }
  }
}

The --project flag ensures pragma's virtual environment is used regardless of the current working directory.

Example Prompts

Find team decisions about a topic:

Search pragma discussions for "API rate limiting strategy" and summarize
what the team decided in past MRs.

Review a change with historical context:

Before reviewing my change, search pragma for:
1. Discussions about similar changes to this component
2. Diffs with similar code patterns

Use both to provide context-aware feedback.

The two-query pattern — for the richest context, always make two calls to search:

  • content_type: "discussion" + natural language query → finds why decisions were made
  • content_type: "diff" + the actual code diff → finds how similar changes were implemented

See CLAUDE.md for full parameter reference and advanced usage.

Architecture

GitLab API → Index MRs → Embedding Model → ChromaDB
                                               ↓
External AI Tool → HTTP API → Vector Search → Historical Context

Configuration

Set credentials as environment variables (never stored in files):

export GEMINI_API_KEY=your_gemini_api_key
export GITLAB_PRIVATE_TOKEN=your_gitlab_token

Edit config.yaml for non-sensitive settings:

gitlab:
  base_url: https://gitlab.cee.redhat.com  # Optional, defaults to gitlab.com

repository:
  type: gitlab
  owner: product-security/pdm  # Can be nested group
  name: pdm-db

vector_store:
  type: chromadb
  path: ./data/chroma_db

# Embedding model configuration
# Both indexing and search must use the same provider.
# Switching provider requires re-indexing all MRs.
embeddings:
  provider: gemini        # "gemini" (default) or "local"
  # model: BAAI/bge-large-en-v1.5  # local only, this is the default

Gemini provider (default): requires GEMINI_API_KEY in the environment.

Local provider: runs on-device using HuggingFace sentence-transformers, no API key or internet access needed at query time. The model is downloaded once on first use and cached locally.

embeddings:
  provider: local
  model: BAAI/bge-large-en-v1.5  # ~1.3GB, high quality
  # model: all-mpnet-base-v2     # ~420MB, good quality
  # model: all-MiniLM-L6-v2      # ~80MB, fast but lower quality

When switching providers, clear the existing index and re-index:

uv run pragma clear-index --yes
uv run pragma index

Security

  • No authentication - API is unauthenticated
  • Localhost only - Defaults to 127.0.0.1 for security
  • Sensitive data - MR diffs may contain proprietary code
  • Do not expose to public networks without firewall/VPN

Development

# Install pre-commit hooks
uv run pre-commit install

# Run linting manually
uv run pre-commit run --all-files

# Install new dependency
uv add <package-name>

Systemd User Service (Auto-start on Login)

Run Pragma as a systemd user service that starts automatically on login.

1. Create the service file at ~/.config/systemd/user/pragma.service:

[Unit]
Description=Pragma Historical MR Search API
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
WorkingDirectory=/home/YOUR_USERNAME/repositories/github/pragma
Environment="PATH=/home/YOUR_USERNAME/.local/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=%h/.config/pragma/env
ExecStart=/usr/bin/env uv run pragma serve --host 127.0.0.1 --port 8000
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=default.target

2. Create the environment file at ~/.config/pragma/env:

GEMINI_API_KEY=your_gemini_api_key
GITLAB_PRIVATE_TOKEN=your_gitlab_token

3. Enable and start:

systemctl --user daemon-reload
systemctl --user enable pragma.service
systemctl --user start pragma.service
systemctl --user status pragma.service

Management commands:

# View logs
journalctl --user -u pragma.service -f

# Restart
systemctl --user restart pragma.service

# Stop
systemctl --user stop pragma.service

Requirements

  • Python 3.10-3.12
  • GitLab private token with API read access
  • Gemini API key (only when using the gemini embedding provider)
  • uv package manager

License

MIT

Contributing

Contributions welcome! Please ensure:

  • Pre-commit hooks pass
  • API endpoints are documented
  • Security best practices followed

About

A REST API that indexes historical GitLab merge requests into a vector database and exposes semantic search endpoints, giving AI coding assistants access to your team's past decisions, code patterns, and review discussions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages