feat(retrieve): add provenance metadata to search results by mvanhorn · Pull Request #852 · volcengine/OpenViking

mvanhorn · 2026-03-21T21:17:58Z

Description

Adds an opt-in include_provenance parameter to the search/find API endpoints. When enabled, the response includes a provenance array showing which directories were traversed, which tier (L0/L1/L2) each result came from, match reasons, and the full thinking trace.

The README states "Visualized Retrieval Trajectory" as a core feature, and the internal data structures already collect this information (MatchedContext.level, QueryResult.thinking_trace, QueryResult.searched_directories). This change surfaces it through the API.

Evidence:

Source	Evidence
README	"Visualized Retrieval Trajectory - Supports visualization of directory retrieval trajectories"
#274	Code retrieval optimization - retrieval quality is a priority
#350	Decoupling ingestion - 3 thumbsup, community wants pipeline visibility

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Added include_provenance: bool = False to FindRequest and SearchRequest in openviking/server/routers/search.py
Extended FindResult.to_dict() to accept include_provenance and conditionally include query_results with thinking trace
Added _query_result_to_dict() to serialize query results with tier labels (L0/L1/L2)
Passed include_provenance through the search/find endpoint handlers

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

5 unit tests added in tests/retrieve/test_provenance.py covering:

Default behavior (no provenance)
Provenance enabled with full data
Provenance enabled without query_results (no crash)
Backward compatibility (existing fields unchanged)

Screenshots

Provenance output (installed from modified branch, tested with realistic retrieval data):

The provenance section shows: which directories were searched, which tier (L0/L1/L2) each result came from, match reasons, and thinking trace statistics. All opt-in via include_provenance: true - existing clients see no change.

Example JSON response with provenance:

{
  "provenance": [{
    "query": "architecture design patterns",
    "searched_directories": ["resources/", "resources/docs/", "user/default/memories/"],
    "matched_contexts": [
      {"uri": "viking://resources/docs/architecture.md", "tier": "L2", "score": 0.87, "match_reason": "semantic_match"},
      {"uri": "viking://user/.../meeting-notes", "tier": "L1", "score": 0.62, "match_reason": "directory_match"}
    ],
    "thinking_trace": {"statistics": {"directories_searched": 2, "candidates_collected": 1}, "events": [...]}
  }]
}

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

This contribution was developed with AI assistance (Claude Code). The feature was proposed after dogfooding OpenViking's search API and noticing the provenance data was collected internally but not exposed to clients.

Adds an opt-in `include_provenance` parameter to the search/find API endpoints. When enabled, the response includes a `provenance` array with per-query retrieval details: which directories were traversed, which tier (L0/L1/L2) each result came from, match reasons, and the full thinking trace. The internal data was already being collected in MatchedContext.level, MatchedContext.context_type, and QueryResult.thinking_trace. This change surfaces it through the API for retrieval observability, which the README lists as a core design goal ("Visualized Retrieval Trajectory"). Backward compatible: defaults to false, existing clients see no change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-21T21:18:42Z

Failed to generate code suggestions for PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-project-automation bot added this to OpenViking project Mar 21, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 21, 2026

docs: add provenance feature screenshot

1a57f81

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieve): add provenance metadata to search results#852

feat(retrieve): add provenance metadata to search results#852
mvanhorn wants to merge 2 commits intovolcengine:mainfrom
mvanhorn:osc/feat-search-provenance-metadata

mvanhorn commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvanhorn commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Screenshots

Checklist

Additional Notes

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mvanhorn commented Mar 21, 2026 •

edited

Loading