Skip to content

feat: remote embedding backend (OpenAI-compatible HTTP)#97

Open
zm2231 wants to merge 4 commits intobartolli:mainfrom
zm2231:feat/remote-embedding
Open

feat: remote embedding backend (OpenAI-compatible HTTP)#97
zm2231 wants to merge 4 commits intobartolli:mainfrom
zm2231:feat/remote-embedding

Conversation

@zm2231
Copy link

@zm2231 zm2231 commented Mar 21, 2026

Summary

Adds support for an external embedding server as an alternative to the bundled local fastembed models. Drop-in replacement — existing local-only setups are unaffected.

Configuration

# settings.toml
[semantic_search]
remote_url = "http://host:8100"
remote_model = "bge-large-en-v1.5"
remote_dim = 1024

Or via env vars (take precedence over config):

CODANNA_EMBED_URL=http://host:8100
CODANNA_EMBED_MODEL=bge-large-en-v1.5
CODANNA_EMBED_DIM=1024

Compatible with Infinity, OpenAI, vLLM, and any server implementing POST /v1/embeddings.

Key design decisions

  • EmbeddingBackend enum wraps both EmbeddingPool (local fastembed) and RemoteEmbedder (HTTP). All indexing paths use this unified type.
  • model field is Option in SimpleSemanticSearchNone in remote mode. Query embedding is generated externally via the backend and passed to search_with_embedding.
  • EmbeddingBackendKind in metadata — explicit Local/Remote field (serde default = Local) so load() delegates to load_remote() reliably without heuristic model-name parsing.
  • run_async — safe async-from-sync helper that works in multi-thread Tokio, current-thread Tokio, and no-runtime contexts.
  • Backend-kind change warnings — if backend type changes between runs but dimensions match, a clear warning is emitted at load time (re-index with --force to fix).

What's validated

  • Dimension checked at load time (hard error if mismatch)
  • Backend kind checked at load time (warning if changed with same dim)
  • Remote response: count, contiguous indices, per-vector dim
  • Unicode-safe text truncation (chars not bytes)
  • CODANNA_EMBED_DIM rejects non-integer and zero values
  • store_embeddings warns when embeddings are dropped due to dim mismatch

zm2231 added 4 commits March 21, 2026 01:18
Add support for an external embedding server as an alternative to the
bundled local fastembed models. Configured via env vars or settings.toml.

## Configuration

  # settings.toml
  [semantic_search]
  remote_url = "http://host:8100"   # enables remote mode
  remote_model = "bge-large-en-v1.5"
  remote_dim = 1024

  # Or via env vars (take precedence over config)
  CODANNA_EMBED_URL=http://host:8100
  CODANNA_EMBED_MODEL=bge-large-en-v1.5
  CODANNA_EMBED_DIM=1024

Compatible with Infinity, OpenAI, vLLM, and any server that serves
POST /v1/embeddings with the standard request/response schema.

## Changes

src/semantic/remote.rs (new)
- RemoteEmbedder: async HTTP client, 64-text batches, 30s timeout
- Response validation: count, contiguous indices, per-vector dim check
- Unicode-safe truncation (chars not bytes)
- run_async helper: works in multi-thread Tokio, current-thread, no runtime

src/semantic/pool.rs
- EmbeddingBackend enum wrapping EmbeddingPool (local) or RemoteEmbedder
- Shared interface: dimensions(), embed_one(), embed_parallel(), log_usage_stats()

src/semantic/simple.rs
- model field is now Option<Mutex<TextEmbedding>> — None in remote mode
- new_empty(dim, model_name): create index without local model
- load_remote(): load stored vectors without initialising fastembed
- search_with_embedding(): search with a pre-computed query vector
- search_with_embedding_and_language(): same with language pre-filter
- search_with_embedding_threshold(): same with similarity threshold
- has_local_model(), is_remote_index(), dimensions() accessors
- store_embeddings warns on dropped embeddings (dim mismatch)

src/semantic/metadata.rs
- EmbeddingBackendKind enum (Local/Remote) with serde default=Local
- SemanticMetadata.backend field (backward compat: old metadata = Local)
- SemanticMetadata::new_remote() constructor
- is_remote() helper

src/config.rs
- SemanticSearchConfig: remote_url, remote_model, remote_dim fields

src/indexing/facade.rs
- build_embedding_backend(): resolves local/remote from config + env
- resolve_remote_model_name(): consistent env-first model name resolution
- enable_semantic_search(): uses new_empty in remote mode
- load_semantic_search(): uses load_remote in remote mode, restores backend,
  validates dimension and warns on backend-kind change after load
- semantic_search_docs_with_language: dispatches via has_local_model()

src/indexing/pipeline/{mod,stages/semantic_embed}.rs
- EmbeddingPool replaced by EmbeddingBackend throughout

src/cli/commands/index_parallel.rs
- create_semantic_search returns (semantic, backend) pair
- Validates dimension and backend-kind on load, exits on mismatch
text[last.1..start] panics when start < last.1 (overlapping ranges).
Check overlap first and skip the slice in that case.
persistence.rs and hot_reload.rs were swallowing DimensionMismatch as
plain warnings, hiding the re-index requirement from callers.

persistence: now returns Err so the facade startup fails with a clear
message instead of continuing with a broken semantic index.

hot_reload: cannot exit during a watcher tick — logs a specific warning
and disables semantic search until the user re-indexes with --force.
Add semantic_incompatible: bool to IndexFacade, set on both DimensionMismatch
exits in load_semantic_search(). Exposes is_semantic_incompatible() so callers
can avoid retrying a known-incompatible index.

persistence.rs: log DimensionMismatch at error level, continue text-only rather
than returning Err (which would discard a valid Tantivy index in all startup paths).

hot_reload.rs: guard the semantic reload retry with is_semantic_incompatible()
so a known-bad index does not produce duplicate warnings on every reload cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant