-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Bug: LiteLLM + Ollama Integration Returns 404 Error
Description
When configuring OpenViking to use a local Ollama model as VLM, memory extraction fails with 404 page not found error, even though the Ollama API is working correctly when tested directly.
Environment
- OpenViking version: 0.2.9
- Python: 3.12.13
- OS: macOS (Darwin 25.3.0 arm64)
- Ollama version: 0.6.x
- Model:
qwen3-vl:2b(local)
Configuration
{
"vlm": {
"provider": "litellm",
"model": "ollama/qwen3-vl:2b",
"api_base": "http://localhost:11434/v1",
"api_key": "EMPTY",
"temperature": 0.3,
"max_tokens": 512,
"max_retries": 2,
"max_concurrent": 10
}
}Symptoms
- Memory extraction fails with 404 error
extractZeroCountincreases rapidly- VLM observer shows "No token usage data available"
Logs
2026-03-23 14:21:58,052 - openviking.session.memory_extractor - ERROR - Memory extraction failed: litellm.APIConnectionError: OllamaException - 404 page not found
2026-03-23 14:21:58,053 - uvicorn.access - INFO - 127.0.0.1:65106 - "POST /api/v1/sessions/fe282187-4e2e-439a-80e9-73fc6c69ff03/extract HTTP/1.1" 200
Verification
Direct Ollama API calls work correctly:
# OpenAI-compatible endpoint - works
curl -s -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "qwen3-vl:2b", "messages": [{"role": "user", "content": "hello"}]}'
# Returns: {"id":"chatcmpl-668","object":"chat.completion",...}
# Native Ollama API - works
curl -s -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model": "qwen3-vl:2b", "messages": [{"role": "user", "content": "hello"}]}'
# Returns streaming response correctly
# Model list - works
curl -s http://localhost:11434/v1/models
# Returns: {"object":"list","data":[{"id":"qwen3-vl:2b",...}]}Root Cause Hypothesis
The issue appears to be in how LiteLLM constructs the request to Ollama. Possible causes:
- Model name format: LiteLLM may be using
ollama/qwen3-vl:2bliterally instead of stripping theollama/prefix - Endpoint path mismatch: LiteLLM may be calling a different endpoint than
/v1/chat/completions - Request format: LiteLLM may be sending parameters that Ollama doesn't recognize
Workaround
Switching to a cloud VLM provider (e.g., DashScope qwen3-vl-flash) resolves the issue, but defeats the purpose of using local models for privacy/cost reasons.
Expected Behavior
LiteLLM should successfully call Ollama's OpenAI-compatible API at http://localhost:11434/v1/chat/completions with the model name qwen3-vl:2b (without the ollama/ prefix if that's the convention).
Additional Context
This issue affects users who want to use local Ollama models as VLM for privacy, cost, or offline scenarios. The integration should work seamlessly since Ollama provides an OpenAI-compatible API.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status