feat: Add Ultralytics SAM3 backend with enhanced features#5
feat: Add Ultralytics SAM3 backend with enhanced features#5Rajkisan wants to merge 5 commits intoagfianf:mainfrom
Conversation
- Implement api-inference-yolo backend using Ultralytics SAM3SemanticPredictor - Add FP16 inference support for faster processing on CUDA - Enable batch processing mode for both text and bbox prompts - Add auto-apply mode for automatic image processing - Implement per-label text prompt memory using localStorage - Add bbox exemplar-based segmentation for finding similar objects - Create setup and run scripts for both Ultralytics and HuggingFace backends - Update .gitignore to exclude venv and model weights - Enhance README with new features and expanded roadmap - Fix modal rendering issues in BboxPromptPanel Features: - Text prompt segmentation with semantic understanding - Bounding box exemplar-based segmentation - Single, Auto-Apply, and Batch processing modes - Smart prompt memory per label class - Cross-platform setup scripts (Linux/Mac/Windows)
Review Summary by QodoAdd Ultralytics SAM3 backend with enhanced segmentation features and dual-backend support
WalkthroughsDescription• Implement complete Ultralytics SAM3 backend with FastAPI REST API for semantic segmentation • Add text prompt and bounding box-based inference endpoints with batch processing support • Implement mask-to-polygon conversion with Douglas-Peucker simplification and visualization utilities • Add per-label text prompt memory using localStorage for improved user experience • Implement auto-apply and batch processing modes in BboxPromptPanel component • Create cross-platform setup and run scripts for both Ultralytics and HuggingFace backends (Linux/Mac/Windows) • Add comprehensive documentation for SAM3 model usage, API endpoints, and deployment • Configure Docker support with NVIDIA CUDA 12.1 for GPU acceleration • Implement standardized JSON response formatting and comprehensive error handling • Add application configuration management with environment variable support Diagramflowchart LR
A["User Interface<br/>TextPromptPanel<br/>BboxPromptPanel"] -->|"text/bbox prompts"| B["FastAPI Router<br/>sam3.py"]
B -->|"inference requests"| C["SAM3 Inference<br/>inference.py"]
C -->|"model prediction"| D["Ultralytics<br/>SAM3SemanticPredictor"]
C -->|"mask processing"| E["Mask Utils<br/>mask_utils.py"]
E -->|"polygon coords"| F["Visualizer<br/>visualizer.py"]
F -->|"visualization"| B
B -->|"JSON response"| A
G["Config<br/>config.py"] -->|"settings"| C
H["Schemas<br/>schemas.py"] -->|"validation"| B
File Changes1. apps/api-inference-yolo/src/app/integrations/sam3/inference.py
|
Code Review by Qodo
1. lifespan() yields twice
|
| yield { | ||
| "sam3_inference": sam3_inference, | ||
| } | ||
|
|
||
| logger.info("Application startup complete") | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Failed to initialize application: {e}") | ||
| raise | ||
|
|
||
| yield | ||
|
|
There was a problem hiding this comment.
1. lifespan() yields twice 📘 Rule violation ⛯ Reliability
• The FastAPI lifespan() context manager yields twice, which can break startup/shutdown flow and make startup state unreliable. • Route handlers assume sam3_inference exists on request.state, but the lifespan code does not explicitly attach it there, risking runtime AttributeError and 500s instead of graceful degradation. • This violates the requirement to handle failure points and edge cases explicitly, especially around dependency initialization.
Agent Prompt
## Issue description
`lifespan()` currently yields twice and the app’s `sam3_inference` instance is accessed via `request.state` without a guaranteed assignment path. This can lead to startup/shutdown issues and runtime `AttributeError` in request handlers.
## Issue Context
The SAM3 model is intended to be loaded once at startup and reused. Handlers should either access `request.app.state.sam3_inference` (if set during lifespan) or a dependency injection pattern should be used.
## Fix Focus Areas
- apps/api-inference-yolo/src/app/main.py[17-57]
- apps/api-inference-yolo/src/app/routers/sam3.py[54-57]
- apps/api-inference-yolo/src/app/routers/sam3.py[136-138]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| except ValueError as e: | ||
| logger.error(f"Validation error in text inference: {e}") | ||
| raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e)) | ||
| except Exception as e: | ||
| logger.error(f"Error in text inference: {e}") | ||
| raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Inference failed") |
There was a problem hiding this comment.
3. detail=str(e) exposed 📘 Rule violation ⛨ Security
• API handlers return HTTPException(..., detail=str(e)) for ValueError, directly exposing internal exception messages to clients. • Exception strings can unintentionally leak implementation details (e.g., file paths, library messages) that should remain internal. • This violates the requirement that user-facing errors remain generic while detailed information is kept in internal logs.
Agent Prompt
## Issue description
Handlers currently expose raw exception strings to the client through `detail=str(e)`. This can leak internal details.
## Issue Context
You already log the exception server-side. Client responses should be generic (or constrained to a safe, curated message) while logs keep full detail.
## Fix Focus Areas
- apps/api-inference-yolo/src/app/routers/sam3.py[78-83]
- apps/api-inference-yolo/src/app/routers/sam3.py[160-166]
- apps/api-inference-yolo/src/app/main.py[119-145]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| try: | ||
| # Parse bounding boxes JSON | ||
| try: | ||
| boxes_data = json.loads(bounding_boxes) | ||
| except json.JSONDecodeError: | ||
| raise ValueError("Invalid JSON format for bounding_boxes") | ||
|
|
||
| # Extract boxes and labels | ||
| boxes = [[b[0], b[1], b[2], b[3]] for b in boxes_data] | ||
| labels = [b[4] if len(b) > 4 else 1 for b in boxes_data] # Default to positive | ||
|
|
There was a problem hiding this comment.
4. bounding_boxes lacks validation 📘 Rule violation ⛨ Security
• bounding_boxes is parsed from user input JSON and then indexed as b[0]..b[3] without validating item length/type, so malformed input can cause IndexError/type errors. • This turns a client input problem into a server 500 path, rather than a controlled 400/422 with safe messaging. • This violates the requirement for security-first input validation and explicit handling of null/empty/boundary cases.
Agent Prompt
## Issue description
The API trusts `bounding_boxes` JSON structure and indexes into elements without validating lengths/types. Malformed input can trigger server exceptions and 500 responses.
## Issue Context
This endpoint is externally facing (multipart form). It should strictly validate and reject invalid bbox payloads with a controlled error response.
## Fix Focus Areas
- apps/api-inference-yolo/src/app/routers/sam3.py[125-166]
- apps/api-inference-yolo/src/app/schemas/sam3.py[6-14]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| import io | ||
| import time | ||
| import base64 | ||
|
|
||
| import cv2 | ||
| import numpy as np | ||
| import torch | ||
| from fastapi import UploadFile | ||
| from PIL import Image | ||
| from ultralytics.models.sam import SAM3SemanticPredictor | ||
|
|
||
| from app.config import settings | ||
| from app.helpers.logger import logger | ||
| from app.integrations.sam3.mask_utils import masks_to_polygon_data | ||
| from app.integrations.sam3.visualizer import Sam3Visualizer | ||
|
|
||
|
|
||
| class SAM3Inference: | ||
| """SAM3 inference implementation using Ultralytics.""" | ||
|
|
||
| def __init__(self): | ||
| """Initialize SAM3 model configuration.""" | ||
| import os | ||
|
|
||
| # Support both local file paths and model names |
There was a problem hiding this comment.
5. Ruff f401 unused imports 📘 Rule violation ✓ Correctness
• The backend code introduces unused imports (e.g., cv2, FastAPIResponse, and a local `import os), which will trigger Ruff F401` violations under the configured lint selection. • This breaks the requirement that backend changes are Ruff lint-clean.
Agent Prompt
## Issue description
Backend files include unused imports that will fail Ruff linting (F401).
## Issue Context
`pyproject.toml` enables Ruff linting with `select = ["E", "F", "I", "N", "W", "UP"]`, so unused imports are not allowed.
## Fix Focus Areas
- apps/api-inference-yolo/src/app/integrations/sam3/inference.py[1-30]
- apps/api-inference-yolo/src/app/routers/sam3.py[1-12]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Copy application code | ||
| COPY src/ ./src/ | ||
|
|
||
| EXPOSE 8000 | ||
|
|
||
| # Set Python path | ||
| ENV PYTHONPATH=/code/src | ||
|
|
||
| # 7. Run application | ||
| # We use 'uv run' which ensures the correct python environment is used | ||
| CMD ["uv", "run", "app/main.py"] |
There was a problem hiding this comment.
6. Docker cmd wrong path 🐞 Bug ⛯ Reliability
• The api-inference-yolo image copies code under /code/src but runs uv run app/main.py, which does not exist at that path. • This will cause the backend container to fail immediately on startup, blocking the recommended backend option.
Agent Prompt
### Issue description
The `apps/api-inference-yolo` Docker image will not start because the `CMD` points at `app/main.py`, but the file is located at `src/app/main.py` inside the container.
### Issue Context
The Dockerfile copies `src/` to `/code/src` and sets `PYTHONPATH=/code/src`, so Python imports should use `app.*`.
### Fix Focus Areas
- apps/api-inference-yolo/Dockerfile[35-45]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| # Update predictor confidence if specified | ||
| if conf_threshold is not None and hasattr(self.predictor, 'args'): | ||
| self.predictor.args.conf = conf_threshold | ||
| logger.info(f"Set confidence threshold to {conf_threshold}") | ||
|
|
||
| # Set image (like predictor.set_image() in test script) | ||
| self.predictor.set_image(image_np) | ||
|
|
||
| # Run prediction (like predictor(text=[...]) in test script) | ||
| results = self.predictor(**kwargs) | ||
|
|
There was a problem hiding this comment.
7. Predictor shared mutable state 🐞 Bug ⛯ Reliability
• A single SAM3SemanticPredictor instance is stored globally and mutated per request (args.conf, set_image). • Concurrent requests can overwrite each other’s threshold/image, producing incorrect masks/boxes and non-deterministic behavior.
Agent Prompt
### Issue description
A shared Ultralytics predictor is mutated per request (`args.conf`, `set_image`), which can cause cross-request contamination under concurrency.
### Issue Context
The predictor is loaded once in app lifespan and reused across requests.
### Fix Focus Areas
- apps/api-inference-yolo/src/app/integrations/sam3/inference.py[96-116]
- apps/api-inference-yolo/src/app/integrations/sam3/inference.py[137-156]
- apps/api-inference-yolo/src/app/main.py[34-43]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
0cc6350 to
ab8050e
Compare
…tence per label - Install missing react-router-dom package for routing support - Implement per-label text prompt persistence in localStorage - Auto-load saved prompts when switching labels - Save prompts after successful inference runs - Prompts persist across image navigation within session
b4974f4 to
e7b63e7
Compare
ce0c2ab to
ed8ce79
Compare
Features: