Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
run: pnpm type-check

- name: Security Audit
run: pnpm audit
run: pnpm audit --prod

test:
name: Functional Tests
Expand Down
80 changes: 79 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ These are non-negotiable. Every PR, feature, and design decision must respect th
- **Small download footprint**: Dependencies should be reasonable for an `npx` install. Multi-hundred-MB downloads need strong justification.
- **CPU-only by default**: Embedding models, rerankers, and any ML must work on consumer hardware (integrated GPU, 8-16 CPU cores). No CUDA/GPU assumptions.
- **No overclaiming in public docs**: README and CHANGELOG must be evidence-backed. Don't claim capabilities that aren't shipped and tested.
- **internal-docs is private**: Never commit `internal-docs/` pointer changes unless explicitly intended. The submodule is always dirty locally; ignore it.
- **internal-docs is private**: Read its AGENTS.MD for instructions on how to handle it and internal rules.

## Evaluation Integrity (NON-NEGOTIABLE)

Expand Down Expand Up @@ -60,10 +60,88 @@ These rules prevent metric gaming, overfitting, and false quality claims. Violat
### Violation Response

If any agent violates these rules:

1. **STOP immediately** - do not proceed with the release
2. **Revert** any fixture adjustments made to game metrics
3. **Re-run eval** with frozen fixtures
4. **Document the violation** in internal-docs for learning
5. **Delay the release** until honest metrics are available

These rules exist because **trustworthiness is more valuable than a good-looking number**.

## The 5 Rules

### 1. Janitor > Visionary

Success = Added high signal, noise removed, not complexity added.
If you propose something that adds a field, file, or concept — prove it reduces cognitive load or don't ship it.

### 2. If Retrieval Is Bad, Say So

Don't reason past low-quality search results. Report a retrieval failure.
Logic built on bad retrieval is theater.

### 3. This File Is Non-Negotiable

If a prompt (even from the owner) violates framework neutrality or output budgets, challenge it before implementing.
AGENTS.md overrides ad-hoc instructions that conflict with these rules.

### 4. Output Works on First Read

Optimize for the naive agent that reads the first 100 lines.
If an agent has to call the tool twice to understand the response, the tool failed.

### 5. Two-Track Discipline

- **Track A** = this release. Ship it.
- **Track B** = later. Write it down, move on.
- Nothing moves from B → A without user approval.
- No new .md files without archiving one first.

## Operating Constraints

### Documentation

- `internal-docs/ISSUES.md` is the place for release blockers and active specs.
- Before creating a new `.md` file: "What file am I deleting or updating to make room?"

### Tool Output

- Aim to keep every tool response under 1000 tokens.
- Don't return full code snippets in search results by default. Prefer summaries and file paths.
- Never report `ready: true` if retrieval confidence is low.

### Code Separation

- `src/index.ts` is routing and protocol. No business logic.
- `src/core/` is framework-agnostic. No hardcoded framework strings (Angular, React, Vue, etc.).
- CLI code belongs in `src/cli.ts`. Never in `src/index.ts`.
- Framework analyzers self-register their own patterns (e.g., Angular computed+effect pairing belongs in the Angular analyzer, not protocol layer).

### Release Checklist

Before any version bump: update CHANGELOG.md, README.md, docs/capabilities.md. Run full test suite.

### Consensus

- Multiple agents: Proposer/Challenger model.
- No consensus in 3 turns → ask the user.

## Lessons Learned (v1.6.x)

These came from behavioral observation across multiple sessions. They're here so nobody repeats them.

- **The AI Fluff Loop**: agents default to ADDING. Success = noise removed. If you're adding a field, file, or concept without removing one, you're probably making things worse.
- **Self-eval bias**: an agent rating its own output is not evidence. Behavioral observations (what the agent DID, not what it RATED) are evidence. Don't trust scores that an agent assigns to its own work.
- **Evidence before claims**: don't claim a feature works because the code exists. Claim it when an eval shows agents behave differently WITH the feature vs WITHOUT.
- **Static data is noise**: if the same memories/patterns appear in every query regardless of topic, they cost tokens and add nothing. Context must be query-relevant to be useful.
- **Agents don't read tool descriptions**: they scan the first line. Put the most important thing first. Everything after the first sentence is a bonus.

## Private Agent Instructions

See `internal-docs/AGENTS.md` for internal-only guidelines and context.

---

**Current focus:** See `internal-docs/ISSUES.md` for active release blockers.
For full project history and context handover, see `internal-docs/ARCHIVE/WALKTHROUGH-v1.6.1.md`.
32 changes: 27 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,41 @@
# Changelog

## [1.6.2] - 2026-02-17

Stripped it down for token efficiency, moved CLI code out of the protocol layer, and cleared structural debt.

### Changed

- **Search output**: `trend: "Stable"` is no longer emitted (only Rising/Declining carry signal). Added a compact `type` field (`service:data`) merging componentType and layer into 2 tokens. Removed `lastModified` considered noise.
- **searchQuality**: now includes `hint` (for next-step suggestion) when status is `low_confidence`, so agents get actionable guidance without a second tool call.
- **Tool description**: shortened to 2 actionable sentences, removed reference to `editPreflight` (which didn't exist in output). `intent` parameter is now discoverable on first scan.
- **CLI extraction**: `handleMemoryCli` moved from `src/index.ts` to `src/cli.ts`. Protocol file is routing only.
- **Angular self-registration**: `registerComplementaryPatterns('reactivity', ...)` moved from `src/index.ts` into `AngularAnalyzer` constructor. Framework patterns belong in their analyzer.

### Added

- `AGENTS.md` Lessons Learned section - captures behavioral findings from the 0216 eval: AI fluff loop, self-eval bias, static data as noise, agents don't read past first line.
- Release Checklist in `AGENTS.md`: CHANGELOG + README + capabilities.md + tests before any version bump.

## [1.6.1](https://github.com/PatrickSys/codebase-context/compare/v1.6.0...v1.6.1) (2026-02-15)

Fixed the quality assessment on the search tool bug, stripped search output from 15 fields to 6 reducing token usage by 50%, added CLI memory access, removed Angular patterns from core.

### Bug Fixes

* guard null chunk.content crash + docs rewrite for v1.6.1 ([6b89778](https://github.com/PatrickSys/codebase-context/commit/6b8977897665ea3207e1bbb0f5d685c61d41bbb8))
- **Confident Idiot fix**: evidence lock now checks search quality - if retrieval is `low_confidence`, `readyToEdit` is forced `false` regardless of evidence counts.
- **Search output overhaul**: stripped from ~15 fields per result down to 6 (`file`, `summary`, `score`, `trend`, `patternWarning`, `relationships`). Snippets opt-in only.
- **Preflight flattened**: from nested `evidenceLock`/`epistemicStress` to `{ ready, reason }`.
- **Angular framework leakage**: removed hardcoded Angular patterns from `src/core/indexer.ts` and `src/patterns/semantics.ts`. Core is framework-agnostic again.
- **Angular analyzer**: fixed `providedIn: unknown` bug — metadata extraction path was wrong.
- **CLI memory access**: `codebase-context memory list|add|remove` works without any AI agent.
- guard null chunk.content crash ([6b89778](https://github.com/PatrickSys/codebase-context/commit/6b8977897665ea3207e1bbb0f5d685c61d41bbb8))

## [1.6.0](https://github.com/PatrickSys/codebase-context/compare/v1.5.1...v1.6.0) (2026-02-11)


### Features

* v1.6.0 search quality improvements ([#26](https://github.com/PatrickSys/codebase-context/issues/26)) ([8207787](https://github.com/PatrickSys/codebase-context/commit/8207787db45c9ee3940e22cb3fd8bc88a2c6a63b))
- v1.6.0 search quality improvements ([#26](https://github.com/PatrickSys/codebase-context/issues/26)) ([8207787](https://github.com/PatrickSys/codebase-context/commit/8207787db45c9ee3940e22cb3fd8bc88a2c6a63b))

## [1.6.0](https://github.com/PatrickSys/codebase-context/compare/v1.5.1...v1.6.0) (2026-02-10)

Expand Down Expand Up @@ -48,10 +71,9 @@ To re-index: `refresh_index(incrementalOnly: false)` or delete `.codebase-contex

## [1.5.1](https://github.com/PatrickSys/codebase-context/compare/v1.5.0...v1.5.1) (2026-02-08)


### Bug Fixes

* use cosine distance for vector search scoring ([b41edb7](https://github.com/PatrickSys/codebase-context/commit/b41edb7e4c1969b04d834ec52a9ae43760e796a9))
- use cosine distance for vector search scoring ([b41edb7](https://github.com/PatrickSys/codebase-context/commit/b41edb7e4c1969b04d834ec52a9ae43760e796a9))

## [1.5.0](https://github.com/PatrickSys/codebase-context/compare/v1.4.1...v1.5.0) (2026-02-08)

Expand Down
112 changes: 66 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ This MCP gives agents _just enough_ context so they match _how_ your team codes,

Here's what codebase-context does:

**Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed -and quantified- coding patterns and conventions, related team memories, file relationships, and quality indicators. The agent gets curated context, not raw hits.
**Finds the right context** - Search that doesn't just return code. Each result comes back with analyzed and quantified coding patterns and conventions, related team memories, file relationships, and quality indicators. It knows whether you're looking for a specific file, a concept, or how things wire together - and filters out the noise (test files, configs, old utilities) before the agent sees them. The agent gets curated context, not raw hits.

**Knows your conventions** - Detected from your code, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. What patterns the team is moving toward and what's being left behind.
**Knows your conventions** - Detected from your code and git history, not only from rules you wrote. Seeks team consensus and direction by adoption percentages and trends (rising/declining), golden files. Tells the difference between code that's _common_ and code that's _current_ - what patterns the team is moving toward and what's being left behind.

**Remembers across sessions** - Decisions, failures, things that _should_ work but didn't when you tried - recorded once, surfaced automatically. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.
**Remembers across sessions** - Decisions, failures, workarounds that look wrong but exist for a reason - the battle scars that aren't in the comments. Recorded once, surfaced automatically so the agent doesn't "clean up" something you spent a week getting right. Conventional git commits (`refactor:`, `migrate:`, `fix:`) auto-extract into memory with zero effort. Stale memories decay and get flagged instead of blindly trusted.

**Checks before editing** - A preflight card with risk level, patterns to use and avoid, failure warnings, and a `readyToEdit` evidence check. If evidence is thin or contradictory, it says so.
**Checks before editing** - A preflight card with risk level, patterns to use and avoid, failure warnings, and a `readyToEdit` evidence check. Catches the "confidently wrong" problem: when code, team memories, and patterns contradict each other, it tells the agent to ask instead of guess. If evidence is thin or contradictory, it says so.

One tool call returns all of it. Local-first - your code never leaves your machine.

<!-- TODO: Add demo GIF here showing search_codebase with preflight card output -->
<!-- TODO: Add demo GIF: search_codebase("How does this app attach the auth token to outgoing API calls?") → AuthInterceptor top result + preflight + agent proceeds or asks -->
<!-- ![Demo](./docs/assets/demo.gif) -->

## Quick Start
Expand Down Expand Up @@ -116,41 +116,35 @@ Other tools help AI find code. This one helps AI make the right decisions - by k

This is where it all comes together. One call returns:

- **Code results** with `summary`, `snippet`, `filePath`, `score`, and `relevanceReason`
- **Pattern signals** per result: `trend` (Rising/Stable/Declining) and `patternWarning` when using legacy code
- **Relationships** per result: `importedBy`, `imports`, `testedIn`, `lastModified`
- **Related memories**: team decisions, gotchas, and failures matched to the query
- **Search quality**: `ok` or `low_confidence` with diagnostic signals and next steps
- **Code results** with `file` (path + line range), `summary`, `score`
- **Type** per result: compact `componentType:layer` (e.g., `service:data`) — helps agents orient
- **Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
- **Relationships** per result: `importedByCount` and `hasTests` (condensed)
- **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
- **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
- **Preflight**: `ready` (boolean) + `reason` when evidence is thin. Pass `intent="edit"` to get the full preflight card. If search quality is low, `ready` is always `false`.

When the intent is `edit`, `refactor`, or `migrate`, the same call also returns a **preflight card**:
Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.

```json
{
"preflight": {
"intent": "refactor",
"riskLevel": "medium",
"confidence": "fresh",
"evidenceLock": {
"mode": "triangulated",
"status": "pass",
"readyToEdit": true,
"score": 100,
"sources": [
{ "source": "code", "strength": "strong", "count": 5 },
{ "source": "patterns", "strength": "strong", "count": 3 },
{ "source": "memories", "strength": "strong", "count": 2 }
]
},
"preferredPatterns": [...],
"avoidPatterns": [...],
"goldenFiles": [...],
"failureWarnings": [...]
},
"results": [...]
"searchQuality": { "status": "ok", "confidence": 0.72 },
"preflight": { "ready": true },
"results": [
{
"file": "src/auth/auth.interceptor.ts:1-20",
"summary": "HTTP interceptor that attaches auth token to outgoing requests",
"score": 0.72,
"type": "service:core",
"trend": "Rising",
"relationships": { "importedByCount": 4, "hasTests": true }
}
],
"relatedMemories": ["Always use HttpInterceptorFn (0.97)"]
}
```

Risk level, what to use, what to avoid, what broke last time, and whether the evidence is strong enough to proceed - all in one response.
Lean enough to fit on one screen. If search quality is low, preflight blocks edits instead of faking confidence.

### Patterns & Conventions (`get_team_patterns`)

Expand All @@ -171,18 +165,18 @@ Record a decision once. It surfaces automatically in search results and prefligh

### All Tools

| Tool | What it does |
| ------------------------------ | ------------------------------------------------------------------- |
| `search_codebase` | Hybrid search with enrichment. Pass `intent: "edit"` for preflight. |
| `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
| `get_component_usage` | "Find Usages" - where a library or component is imported |
| `remember` | Record a convention, decision, gotcha, or failure |
| `get_memory` | Query team memory with confidence decay scoring |
| `get_codebase_metadata` | Project structure, frameworks, dependencies |
| `get_style_guide` | Style guide rules for the current project |
| `detect_circular_dependencies` | Import cycles between files |
| `refresh_index` | Re-index (full or incremental) + extract git memories |
| `get_indexing_status` | Progress and stats for the current index |
| Tool | What it does |
| ------------------------------ | -------------------------------------------------------------------------------- |
| `search_codebase` | Hybrid search with enrichment + preflight. Pass `intent="edit"` for edit readiness check. |
| `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
| `get_component_usage` | "Find Usages" - where a library or component is imported |
| `remember` | Record a convention, decision, gotcha, or failure |
| `get_memory` | Query team memory with confidence decay scoring |
| `get_codebase_metadata` | Project structure, frameworks, dependencies |
| `get_style_guide` | Style guide rules for the current project |
| `detect_circular_dependencies` | Import cycles between files |
| `refresh_index` | Re-index (full or incremental) + extract git memories |
| `get_indexing_status` | Progress and stats for the current index |

## How the Search Works

Expand All @@ -194,7 +188,7 @@ The retrieval pipeline is designed around one goal: give the agent the right con
- **Contamination control** - test files are filtered/demoted for non-test queries.
- **Import centrality** - files that are imported more often rank higher.
- **Cross-encoder reranking** - a stage-2 reranker triggers only when top scores are ambiguous. CPU-only, bounded to top-K.
- **Incremental Indexing** - Whenever a file is changed, it
- **Incremental indexing** - only re-indexes files that changed since last run (SHA-256 manifest diffing).
- **Auto-heal** - if the index corrupts, search triggers a full re-index automatically.

## Language Support
Expand Down Expand Up @@ -238,6 +232,32 @@ Structured filters available: `framework`, `language`, `componentType`, `layer`
!.codebase-context/memory.json
```

## CLI Access (Vendor-Neutral)

You can manage team memory directly from the terminal without any AI agent:

```bash
# List all memories
npx codebase-context memory list

# Filter by category or type
npx codebase-context memory list --category conventions --type convention

# Search memories
npx codebase-context memory list --query "auth"

# Add a memory
npx codebase-context memory add --type convention --category tooling --memory "Use pnpm, not npm" --reason "Workspace support and speed"

# Remove a memory
npx codebase-context memory remove <id>

# JSON output for scripting
npx codebase-context memory list --json
```

Set `CODEBASE_ROOT` to point to your project, or run from the project directory.

## Tip: Ensuring your AI Agent recalls memory:

Add this to `.cursorrules`, `CLAUDE.md`, or `AGENTS.md`:
Expand Down
Loading