feat: remote embedding backend (OpenAI-compatible HTTP) by zm2231 · Pull Request #97 · bartolli/codanna

zm2231 · 2026-03-21T05:19:20Z

Summary

Adds support for an external embedding server as an alternative to the bundled local fastembed models. Drop-in replacement — existing local-only setups are unaffected.

Configuration

# settings.toml
[semantic_search]
remote_url = "http://host:8100"
remote_model = "bge-large-en-v1.5"
remote_dim = 1024

Or via env vars (take precedence over config):

CODANNA_EMBED_URL=http://host:8100
CODANNA_EMBED_MODEL=bge-large-en-v1.5
CODANNA_EMBED_DIM=1024

Compatible with Infinity, OpenAI, vLLM, and any server implementing POST /v1/embeddings.

Key design decisions

EmbeddingBackend enum wraps both EmbeddingPool (local fastembed) and RemoteEmbedder (HTTP). All indexing paths use this unified type.
model field is Option in SimpleSemanticSearch — None in remote mode. Query embedding is generated externally via the backend and passed to search_with_embedding.
EmbeddingBackendKind in metadata — explicit Local/Remote field (serde default = Local) so load() delegates to load_remote() reliably without heuristic model-name parsing.
run_async — safe async-from-sync helper that works in multi-thread Tokio, current-thread Tokio, and no-runtime contexts.
Backend-kind change warnings — if backend type changes between runs but dimensions match, a clear warning is emitted at load time (re-index with --force to fix).

What's validated

Dimension checked at load time (hard error if mismatch)
Backend kind checked at load time (warning if changed with same dim)
Remote response: count, contiguous indices, per-vector dim
Unicode-safe text truncation (chars not bytes)
CODANNA_EMBED_DIM rejects non-integer and zero values
store_embeddings warns when embeddings are dropped due to dim mismatch

Add support for an external embedding server as an alternative to the bundled local fastembed models. Configured via env vars or settings.toml. ## Configuration # settings.toml [semantic_search] remote_url = "http://host:8100" # enables remote mode remote_model = "bge-large-en-v1.5" remote_dim = 1024 # Or via env vars (take precedence over config) CODANNA_EMBED_URL=http://host:8100 CODANNA_EMBED_MODEL=bge-large-en-v1.5 CODANNA_EMBED_DIM=1024 Compatible with Infinity, OpenAI, vLLM, and any server that serves POST /v1/embeddings with the standard request/response schema. ## Changes src/semantic/remote.rs (new) - RemoteEmbedder: async HTTP client, 64-text batches, 30s timeout - Response validation: count, contiguous indices, per-vector dim check - Unicode-safe truncation (chars not bytes) - run_async helper: works in multi-thread Tokio, current-thread, no runtime src/semantic/pool.rs - EmbeddingBackend enum wrapping EmbeddingPool (local) or RemoteEmbedder - Shared interface: dimensions(), embed_one(), embed_parallel(), log_usage_stats() src/semantic/simple.rs - model field is now Option<Mutex<TextEmbedding>> — None in remote mode - new_empty(dim, model_name): create index without local model - load_remote(): load stored vectors without initialising fastembed - search_with_embedding(): search with a pre-computed query vector - search_with_embedding_and_language(): same with language pre-filter - search_with_embedding_threshold(): same with similarity threshold - has_local_model(), is_remote_index(), dimensions() accessors - store_embeddings warns on dropped embeddings (dim mismatch) src/semantic/metadata.rs - EmbeddingBackendKind enum (Local/Remote) with serde default=Local - SemanticMetadata.backend field (backward compat: old metadata = Local) - SemanticMetadata::new_remote() constructor - is_remote() helper src/config.rs - SemanticSearchConfig: remote_url, remote_model, remote_dim fields src/indexing/facade.rs - build_embedding_backend(): resolves local/remote from config + env - resolve_remote_model_name(): consistent env-first model name resolution - enable_semantic_search(): uses new_empty in remote mode - load_semantic_search(): uses load_remote in remote mode, restores backend, validates dimension and warns on backend-kind change after load - semantic_search_docs_with_language: dispatches via has_local_model() src/indexing/pipeline/{mod,stages/semantic_embed}.rs - EmbeddingPool replaced by EmbeddingBackend throughout src/cli/commands/index_parallel.rs - create_semantic_search returns (semantic, backend) pair - Validates dimension and backend-kind on load, exits on mismatch

text[last.1..start] panics when start < last.1 (overlapping ranges). Check overlap first and skip the slice in that case.

persistence.rs and hot_reload.rs were swallowing DimensionMismatch as plain warnings, hiding the re-index requirement from callers. persistence: now returns Err so the facade startup fails with a clear message instead of continuing with a broken semantic index. hot_reload: cannot exit during a watcher tick — logs a specific warning and disables semantic search until the user re-indexes with --force.

Add semantic_incompatible: bool to IndexFacade, set on both DimensionMismatch exits in load_semantic_search(). Exposes is_semantic_incompatible() so callers can avoid retrying a known-incompatible index. persistence.rs: log DimensionMismatch at error level, continue text-only rather than returning Err (which would discard a valid Tantivy index in all startup paths). hot_reload.rs: guard the semantic reload retry with is_semantic_incompatible() so a known-bad index does not produce duplicate warnings on every reload cycle.

zm2231 added 4 commits March 21, 2026 01:18

fix: panic in document search when highlight ranges overlap

601eebd

text[last.1..start] panics when start < last.1 (overlapping ranges). Check overlap first and skip the slice in that case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: remote embedding backend (OpenAI-compatible HTTP)#97

feat: remote embedding backend (OpenAI-compatible HTTP)#97
zm2231 wants to merge 4 commits intobartolli:mainfrom
zm2231:feat/remote-embedding

zm2231 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zm2231 commented Mar 21, 2026

Summary

Configuration

Key design decisions

What's validated

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant