feat: remote embedding backend (OpenAI-compatible HTTP)#97
Open
zm2231 wants to merge 4 commits intobartolli:mainfrom
Open
feat: remote embedding backend (OpenAI-compatible HTTP)#97zm2231 wants to merge 4 commits intobartolli:mainfrom
zm2231 wants to merge 4 commits intobartolli:mainfrom
Conversation
Add support for an external embedding server as an alternative to the
bundled local fastembed models. Configured via env vars or settings.toml.
## Configuration
# settings.toml
[semantic_search]
remote_url = "http://host:8100" # enables remote mode
remote_model = "bge-large-en-v1.5"
remote_dim = 1024
# Or via env vars (take precedence over config)
CODANNA_EMBED_URL=http://host:8100
CODANNA_EMBED_MODEL=bge-large-en-v1.5
CODANNA_EMBED_DIM=1024
Compatible with Infinity, OpenAI, vLLM, and any server that serves
POST /v1/embeddings with the standard request/response schema.
## Changes
src/semantic/remote.rs (new)
- RemoteEmbedder: async HTTP client, 64-text batches, 30s timeout
- Response validation: count, contiguous indices, per-vector dim check
- Unicode-safe truncation (chars not bytes)
- run_async helper: works in multi-thread Tokio, current-thread, no runtime
src/semantic/pool.rs
- EmbeddingBackend enum wrapping EmbeddingPool (local) or RemoteEmbedder
- Shared interface: dimensions(), embed_one(), embed_parallel(), log_usage_stats()
src/semantic/simple.rs
- model field is now Option<Mutex<TextEmbedding>> — None in remote mode
- new_empty(dim, model_name): create index without local model
- load_remote(): load stored vectors without initialising fastembed
- search_with_embedding(): search with a pre-computed query vector
- search_with_embedding_and_language(): same with language pre-filter
- search_with_embedding_threshold(): same with similarity threshold
- has_local_model(), is_remote_index(), dimensions() accessors
- store_embeddings warns on dropped embeddings (dim mismatch)
src/semantic/metadata.rs
- EmbeddingBackendKind enum (Local/Remote) with serde default=Local
- SemanticMetadata.backend field (backward compat: old metadata = Local)
- SemanticMetadata::new_remote() constructor
- is_remote() helper
src/config.rs
- SemanticSearchConfig: remote_url, remote_model, remote_dim fields
src/indexing/facade.rs
- build_embedding_backend(): resolves local/remote from config + env
- resolve_remote_model_name(): consistent env-first model name resolution
- enable_semantic_search(): uses new_empty in remote mode
- load_semantic_search(): uses load_remote in remote mode, restores backend,
validates dimension and warns on backend-kind change after load
- semantic_search_docs_with_language: dispatches via has_local_model()
src/indexing/pipeline/{mod,stages/semantic_embed}.rs
- EmbeddingPool replaced by EmbeddingBackend throughout
src/cli/commands/index_parallel.rs
- create_semantic_search returns (semantic, backend) pair
- Validates dimension and backend-kind on load, exits on mismatch
text[last.1..start] panics when start < last.1 (overlapping ranges). Check overlap first and skip the slice in that case.
persistence.rs and hot_reload.rs were swallowing DimensionMismatch as plain warnings, hiding the re-index requirement from callers. persistence: now returns Err so the facade startup fails with a clear message instead of continuing with a broken semantic index. hot_reload: cannot exit during a watcher tick — logs a specific warning and disables semantic search until the user re-indexes with --force.
Add semantic_incompatible: bool to IndexFacade, set on both DimensionMismatch exits in load_semantic_search(). Exposes is_semantic_incompatible() so callers can avoid retrying a known-incompatible index. persistence.rs: log DimensionMismatch at error level, continue text-only rather than returning Err (which would discard a valid Tantivy index in all startup paths). hot_reload.rs: guard the semantic reload retry with is_semantic_incompatible() so a known-bad index does not produce duplicate warnings on every reload cycle.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for an external embedding server as an alternative to the bundled local fastembed models. Drop-in replacement — existing local-only setups are unaffected.
Configuration
Or via env vars (take precedence over config):
Compatible with Infinity, OpenAI, vLLM, and any server implementing POST
/v1/embeddings.Key design decisions
EmbeddingBackendenum wraps bothEmbeddingPool(local fastembed) andRemoteEmbedder(HTTP). All indexing paths use this unified type.modelfield isOptioninSimpleSemanticSearch—Nonein remote mode. Query embedding is generated externally via the backend and passed tosearch_with_embedding.EmbeddingBackendKindin metadata — explicitLocal/Remotefield (serde default =Local) soload()delegates toload_remote()reliably without heuristic model-name parsing.run_async— safe async-from-sync helper that works in multi-thread Tokio, current-thread Tokio, and no-runtime contexts.--forceto fix).What's validated
CODANNA_EMBED_DIMrejects non-integer and zero valuesstore_embeddingswarns when embeddings are dropped due to dim mismatch