Upgrade deploy example with production-ready Helm chart by aliev · Pull Request #396 · GetStream/Vision-Agents

aliev · 2026-03-05T22:19:10Z

Summary

Turns the 07_deploy_example into a production-ready Helm chart that works on any Kubernetes cluster — GKE, EKS, AKS, Minikube, Orbstack, Docker Desktop.

Helm chart additions

Service (ClusterIP) and Ingress templates
Bundled Redis via Bitnami subchart — auto-wires REDIS_URL env var
Health probes — liveness on /health, readiness on /ready
ServiceMonitor — Prometheus scraping with configurable additionalLabels
Grafana dashboard — auto-provisioned ConfigMap with 8 panels (sessions, latency, tokens, CPU/memory, TTFT, turn duration)
Removed hardcoded Nebius GPU nodeSelector — now cloud-agnostic

Application changes

Redis SessionRegistry support (conditional on REDIS_URL env var)
Prometheus /metrics endpoint with 9 custom gauges
Pin getstream>=2.0.0,<3.0.0 (v3 has breaking ChannelMember import)
Add prometheus-client and vision-agents[redis] dependencies
Fix Dockerfile: serve --host 0.0.0.0 (required for k8s health probes)

Configuration highlights (`values.yaml`)

Setting	Default	Description
`redis.deploy.enabled`	`true`	Bundle Redis for session storage
`redis.url`	`""`	External Redis URL (when bundled is off)
`metrics.enabled`	`true`	Create ServiceMonitor for Prometheus
`grafana.enabled`	`true`	Deploy Grafana dashboard ConfigMap
`ingress.enabled`	`true`	Create Ingress resource
`gpu.enabled`	`false`	GPU resources and tolerations

Documentation

Step-by-step deployment guide: GetStream/Agents-Docs#71

Test plan

Built and deployed on local Orbstack k8s cluster
Verified health probes, session creation, Redis storage
Verified Prometheus scraping and Grafana dashboard with live data
Tested full cleanup and re-deployment from scratch
Test on GKE/EKS cluster

Summary by CodeRabbit

New Features
- Added Prometheus metrics collection for monitoring application performance (sessions, latencies, tokens).
- Added Grafana dashboard for visualizing agent metrics and system performance.
- Added Kubernetes Ingress support for external traffic routing.
- Added optional Redis session registry support.
Chores
- Updated project dependencies for Redis and Prometheus monitoring integration.
- Enhanced Kubernetes Helm deployment templates and configurations.

Turn the deploy example into a production-ready Helm chart that works on any Kubernetes cluster (GKE, EKS, AKS, local). Helm chart additions: - Service (ClusterIP) and Ingress templates - Bundled Redis via Bitnami subchart (optional, for session storage) - Auto-wiring: REDIS_URL env var set automatically when Redis is deployed - Health probes (liveness on /health, readiness on /ready) - ServiceMonitor for Prometheus scraping with configurable labels - Grafana dashboard ConfigMap (8 panels: sessions, latency, tokens, CPU/memory) - Removed hardcoded Nebius GPU nodeSelector — now fully cloud-agnostic Application changes: - Add Redis SessionRegistry support (conditional on REDIS_URL env var) - Add Prometheus /metrics endpoint with 9 custom gauges - Pin getstream<3.0.0 (v3 has breaking changes) - Add prometheus-client and vision-agents[redis] dependencies - Fix Dockerfile: serve mode with --host 0.0.0.0 (required for k8s probes)

coderabbitai · 2026-03-05T22:19:44Z

📝 Walkthrough

Walkthrough

The deployment example is enhanced with observability infrastructure (Prometheus metrics, Grafana dashboard), Kubernetes resource templates (Service, Ingress, ServiceMonitor), Redis-based session management, and application health checks. The application now collects and exposes metrics, with corresponding Helm configurations and updated dependencies.

Changes

Cohort / File(s)	Summary
Application Core `deploy_example.py`, `Dockerfile`, `pyproject.toml`	Added Prometheus metrics gauges (session count, latencies, token counts), metrics collection aggregation logic, Redis-based session registry support, and metrics HTTP endpoint. Updated CMD to serve instead of verbose execution. Updated dependencies to include redis, getstream, and prometheus-client.
Helm Chart Configuration `Chart.yaml`, `values.yaml`	Added Redis as a Helm dependency (~25.3.0). Introduced comprehensive Helm values for containerPort, service, ingress, redis, metrics, grafana, and API secrets.
Helm Deployment Template `deployment.yaml`	Added container ports specification, conditional Redis environment variable, liveness and readiness probes for HTTP health checks, GPU-aware resource allocation, and streamlined nodeSelector and tolerations handling.
Helm Kubernetes Resources `service.yaml`, `ingress.yaml`, `servicemonitor.yaml`, `grafana-dashboard.yaml`	Added four new templates: Service routing traffic to the deployment, Ingress for external HTTP access, ServiceMonitor for Prometheus scraping at /metrics endpoint, and Grafana dashboard ConfigMap displaying nine metric panels (sessions, latencies, tokens, CPU/memory usage).

Sequence Diagram

sequenceDiagram
    participant App as deploy_example.py
    participant Metrics as Prometheus Client
    participant Redis as Redis Registry
    participant Endpoint as /metrics Endpoint
    participant Prometheus as Prometheus
    participant Grafana as Grafana
    
    App->>Redis: Register session (if REDIS_URL set)
    App->>App: Collect session metrics from registry
    App->>Metrics: Update Gauge metrics (latencies, tokens, etc.)
    Prometheus->>Endpoint: Scrape /metrics (via ServiceMonitor)
    Endpoint->>Metrics: generate_latest()
    Endpoint->>Prometheus: Return metric exposition format
    Prometheus->>Prometheus: Store time-series data
    Grafana->>Prometheus: Query metrics (PromQL)
    Grafana->>Grafana: Render dashboard panels

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

Bell jar descends on metrics bright—
each gauge a breath, a measured blight,
Redis pools the sessions deep,
while Prometheus counts what agents reap.
Grafana's dashboard pierces through,
observing all we dare not do. 📊

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective of the PR: converting the deploy example into a production-ready Helm chart with infrastructure enhancements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/deploy-example-helm-chart

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

examples/07_deploy_example/deploy_example.py (2)

115-119: Move Redis session-registry imports to module scope.

Line 116 introduces local imports in runtime flow; keep imports at the top of the file.

🔧 Proposed fix

@@
 from fastapi import Response
 from prometheus_client import Gauge, generate_latest, CONTENT_TYPE_LATEST
 from vision_agents.core import Agent, AgentLauncher, Runner, User
+from vision_agents.core.agents.session_registry import RedisSessionKVStore, SessionRegistry
@@
 if redis_url:
-    from vision_agents.core.agents.session_registry import (
-        RedisSessionKVStore,
-        SessionRegistry,
-    )
-
     store = RedisSessionKVStore(url=redis_url)
     registry = SessionRegistry(store=store)

As per coding guidelines, "Do not use local imports; import all modules at the top of the file."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/07_deploy_example/deploy_example.py` around lines 115 - 119, The
conditional imports for RedisSessionKVStore and SessionRegistry are done inside
runtime flow (if redis_url:) which violates the guideline to keep imports at
module scope; move the import statements for RedisSessionKVStore and
SessionRegistry to the top-level imports of the module (module scope) so they
are imported on module load rather than inside the conditional, and keep the
existing runtime conditional logic that instantiates or uses
RedisSessionKVStore/SessionRegistry unchanged.

26-27: Remove decorative section-divider comments to match repo Python style.

Line 26, Line 79, and Line 110 use divider-style section comments; please replace with plain comments or remove them.

As per coding guidelines, "Do not use section comments like # -- some section --."

Also applies to: 79-80, 110-111

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/07_deploy_example/deploy_example.py` around lines 26 - 27, Remove
the decorative section-divider comments (e.g., the lines like "# ── Prometheus
metrics ──────────────────────────────────────────────") to conform to repo
Python style; replace them with plain single-line comments or delete them where
found (the Prometheus metrics header and the two other divider comments later in
the file) so only normal comments remain and no section-divider art is used.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/07_deploy_example/deploy_example.py`:
- Around line 59-65: Replace the blanket "except Exception:" catch blocks around
the metrics collection logic (the try that calls
s.agent.metrics.to_dict().get(key) and the similar block later) with a narrow
exception tuple (AttributeError, TypeError, ValueError) and log the caught
exception with logger.exception(...) so tracebacks are recorded; specifically,
change the catch for the s.agent.metrics.to_dict() access and the other metrics
parsing try/except to "except (AttributeError, TypeError, ValueError) as e:" and
call logger.exception("... contextual message ...", exc_info=e) before
continuing, leaving the gauge.set(...) logic unchanged.

In `@examples/07_deploy_example/Dockerfile`:
- Line 18: Change the Docker CMD so the final server process replaces PID 1 (so
SIGTERM reaches it): either add an explicit exec before the server command in
the existing shell invocation (so the uv run process is exec'd), or replace the
CMD with a small entrypoint script that runs "uv sync --frozen" then execs "uv
run deploy_example.py serve --host 0.0.0.0 --port 8080"; update the CMD line
that currently wraps "uv sync" and "uv run" in "sh -c" to use one of these
approaches so the uv run process receives signals directly.

In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml`:
- Line 135: The dashboard uses a fixed Grafana UID ("uid":
"vision-agent-overview") which can collide across releases; change it to a
release-unique value (e.g., derive from .Release.Name and/or .Release.Namespace,
or append a short random suffix) or remove the UID so Grafana generates one.
Update the template that emits the "uid" field (the entry with "uid":
"vision-agent-overview") to interpolate a unique identifier using Helm template
helpers like .Release.Name, .Release.Namespace, or a generated token (or omit
the field entirely).
- Around line 82-97: Replace the hardcoded pod regex ".*vision-agent.*" with a
Helm-scoped regex that includes the current release name; update the two PromQL
targets (the CPU target expression in the timeseries panel with refId "A" and
the "Pod Memory Usage" target expression) to use the release variable (e.g.
pod=~"{{ .Release.Name }}.*vision-agent.*" or similar templated pattern) so the
queries only match pods from the current release instead of all pods matching
"vision-agent".

In `@examples/07_deploy_example/helm/templates/servicemonitor.yaml`:
- Line 1: The template only checks .Values.metrics.enabled and should also
verify the ServiceMonitor CRD exists to avoid install failures on clusters
without monitoring.coreos.com/v1; update the conditional around the
ServiceMonitor manifest in servicemonitor.yaml to require both
.Values.metrics.enabled and a successful lookup call (e.g.
lookup("apiextensions.k8s.io/v1","CustomResourceDefinition","","servicemonitors.monitoring.coreos.com")
) so the resource is rendered only when the CRD is present.

In `@examples/07_deploy_example/helm/values.yaml`:
- Around line 45-47: The Ingress is enabled by default with an empty host
causing a catch-all rule; update the Helm values to set enabled: false by
default (change the values.yaml key "enabled" to false) and add validation in
the chart templates/notes to only render/create ingress when
.Values.ingress.enabled is true and .Values.ingress.host is non-empty (check
.Values.ingress.host in the template or use a required() helper) so ingress is
only created when host/TLS are explicitly provided; reference the "enabled",
"className", and "host" keys when making these changes.
- Around line 58-66: The bundled Redis is enabled by default and has auth
disabled (redis.deploy.enabled / auth.enabled), which is insecure; update the
Helm values to require authentication by default by setting auth.enabled to true
(and provide guidance to set a secure password or use a managed URL via the url
field when disabling redis.deploy.enabled), ensuring the redis.auth.enabled and
redis.deploy.enabled/url settings are consistent so in-cluster Redis is not left
unauthenticated.

---

Nitpick comments:
In `@examples/07_deploy_example/deploy_example.py`:
- Around line 115-119: The conditional imports for RedisSessionKVStore and
SessionRegistry are done inside runtime flow (if redis_url:) which violates the
guideline to keep imports at module scope; move the import statements for
RedisSessionKVStore and SessionRegistry to the top-level imports of the module
(module scope) so they are imported on module load rather than inside the
conditional, and keep the existing runtime conditional logic that instantiates
or uses RedisSessionKVStore/SessionRegistry unchanged.
- Around line 26-27: Remove the decorative section-divider comments (e.g., the
lines like "# ── Prometheus metrics
──────────────────────────────────────────────") to conform to repo Python
style; replace them with plain single-line comments or delete them where found
(the Prometheus metrics header and the two other divider comments later in the
file) so only normal comments remain and no section-divider art is used.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9bc5d131-7d74-4466-9caa-9975a4010837

📥 Commits

Reviewing files that changed from the base of the PR and between 57c892b and 2ea816a.

⛔ Files ignored due to path filters (1)

examples/07_deploy_example/helm/Chart.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

examples/07_deploy_example/Dockerfile
examples/07_deploy_example/deploy_example.py
examples/07_deploy_example/helm/Chart.yaml
examples/07_deploy_example/helm/templates/deployment.yaml
examples/07_deploy_example/helm/templates/grafana-dashboard.yaml
examples/07_deploy_example/helm/templates/ingress.yaml
examples/07_deploy_example/helm/templates/service.yaml
examples/07_deploy_example/helm/templates/servicemonitor.yaml
examples/07_deploy_example/helm/values.yaml
examples/07_deploy_example/pyproject.toml

coderabbitai · 2026-03-05T22:30:28Z

examples/07_deploy_example/deploy_example.py

+            try:
+                v = s.agent.metrics.to_dict().get(key)
+                if v is not None:
+                    values.append(v)
+            except Exception:
+                continue
+        gauge.set(sum(values) / len(values) / 1000 if values else 0)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n examples/07_deploy_example/deploy_example.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 4262

Catch specific exceptions in metrics collection instead of blanket Exception handling.

Lines 63 and 74 catch all exceptions without logging, which hides real defects in the metrics collection path and violates coding guidelines. Replace except Exception: with specific exception types (AttributeError, TypeError, ValueError) and log the error using logger.exception() for observability.

Proposed fix

for key, gauge in _AVG_METRICS: values = [] for s in sessions.values(): try: v = s.agent.metrics.to_dict().get(key) if v is not None: values.append(v) - except Exception: + except (AttributeError, TypeError, ValueError): + logger.exception("Failed to collect average metric for key '%s'", key) continue gauge.set(sum(values) / len(values) / 1000 if values else 0) for key, gauge in _SUM_METRICS: total = 0 for s in sessions.values(): try: v = s.agent.metrics.to_dict().get(key) if v is not None: total += int(v) - except Exception: + except (AttributeError, TypeError, ValueError): + logger.exception("Failed to collect sum metric for key '%s'", key) continue gauge.set(total)

Per coding guidelines: "Never write except Exception as e; catch specific exceptions instead" and "Use logger.exception() when logging an error with a traceback."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/deploy_example.py` around lines 59 - 65, Replace the blanket "except Exception:" catch blocks around the metrics collection logic (the try that calls s.agent.metrics.to_dict().get(key) and the similar block later) with a narrow exception tuple (AttributeError, TypeError, ValueError) and log the caught exception with logger.exception(...) so tracebacks are recorded; specifically, change the catch for the s.agent.metrics.to_dict() access and the other metrics parsing try/except to "except (AttributeError, TypeError, ValueError) as e:" and call logger.exception("... contextual message ...", exc_info=e) before continuing, leaving the gauge.set(...) logic unchanged.

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/Dockerfile


 # Install deps at startup (faster push, slower cold start)
-CMD ["sh", "-c", "uv sync --frozen -v && uv run deploy_example.py run"]
+CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n examples/07_deploy_example/Dockerfile

Repository: GetStream/Vision-Agents

Length of output: 650

🌐 Web query:

Docker exec signal handling PID 1 graceful shutdown SIGTERM

💡 Result:

Only PID 1 in the container gets the “stop” signal. docker stop <ctr> sends SIGTERM to the container’s main process (PID 1), waits a grace period, then sends SIGKILL if it didn’t exit. You can change the first signal (STOPSIGNAL / --stop-signal) and the grace period (--timeout / --stop-timeout). [1]

docker exec is not a shutdown mechanism. It starts an additional process; signals you send to that exec’d process don’t automatically reach PID 1. For graceful shutdown, signal PID 1 (usually via docker stop, or docker kill --signal=SIGTERM <ctr>). [1]

Common reason SIGTERM “doesn’t work”: PID 1 is a shell wrapper. If you use the shell form of ENTRYPOINT/CMD, Docker runs it under /bin/sh -c, which does not pass signals, so your real app won’t receive SIGTERM from docker stop. Use exec form so your app is PID 1. [2]

Linux PID 1 is special: a process running as PID 1 won’t terminate on SIGINT/SIGTERM unless it’s coded to do so (i.e., it must handle the signal or you must have an init/signal-forwarder). [3]

Fix options:

Use exec-form ENTRYPOINT/CMD (and in any entrypoint script, end with exec "$@") so the app becomes PID 1 and can receive SIGTERM. [2]

Add an init to forward signals/reap zombies: docker run --init … (uses Docker’s docker-init, backed by tini). [3]

Configure stop signal if your app expects something else: STOPSIGNAL <signal> (default is SIGTERM). [4]

Sources: Docker stop behavior and timeouts [1]; Dockerfile shell-form ENTRYPOINT doesn’t pass signals [2]; PID 1 signal caveat + --init/tini [3]; STOPSIGNAL default and meaning [4].

Use exec in CMD so SIGTERM reaches the app process.

Line 18 starts the server through a shell that remains PID 1. When the container receives SIGTERM (from docker stop or Kubernetes shutdown), the signal goes to the shell, not the application—preventing graceful shutdown.

🔧 Proposed fix

-CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"] +CMD ["sh", "-c", "uv sync --frozen && exec uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]

CMD ["sh", "-c", "uv sync --frozen && exec uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/Dockerfile` at line 18, Change the Docker CMD so the final server process replaces PID 1 (so SIGTERM reaches it): either add an explicit exec before the server command in the existing shell invocation (so the uv run process is exec'd), or replace the CMD with a small entrypoint script that runs "uv sync --frozen" then execs "uv run deploy_example.py serve --host 0.0.0.0 --port 8080"; update the CMD line that currently wraps "uv sync" and "uv run" in "sh -c" to use one of these approaches so the uv run process receives signals directly.

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/helm/templates/grafana-dashboard.yaml

+            "expr": "rate(container_cpu_usage_seconds_total{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}[5m])",
+            "legendFormat": "{{`{{`}} pod {{`}}`}}",
+            "refId": "A"
+          }]
+        },
+        {
+          "title": "Pod Memory Usage",
+          "type": "timeseries",
+          "gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 },
+          "fieldConfig": {
+            "defaults": { "color": { "mode": "palette-classic" }, "unit": "bytes" },
+            "overrides": []
+          },
+          "targets": [{
+            "expr": "container_memory_working_set_bytes{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}",
+            "legendFormat": "{{`{{`}} pod {{`}}`}}",


⚠️ Potential issue | 🟠 Major

Scope pod PromQL to the current release, not .*vision-agent.*.

Line 82 and Line 96 currently match pods by a hardcoded substring, which can pull metrics from other releases and can miss pods when naming is overridden.

🔧 Suggested fix

- "expr": "rate(container_cpu_usage_seconds_total{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}[5m])", + "expr": "rate(container_cpu_usage_seconds_total{namespace=\"{{ .Release.Namespace }}\",pod=~\".*{{ include "vision-agent.fullname" . }}.*\", container!=\"POD\", container!=\"\"}[5m])", ... - "expr": "container_memory_working_set_bytes{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}", + "expr": "container_memory_working_set_bytes{namespace=\"{{ .Release.Namespace }}\",pod=~\".*{{ include "vision-agent.fullname" . }}.*\", container!=\"POD\", container!=\"\"}",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml` around lines 82 - 97, Replace the hardcoded pod regex ".*vision-agent.*" with a Helm-scoped regex that includes the current release name; update the two PromQL targets (the CPU target expression in the timeseries panel with refId "A" and the "Pod Memory Usage" target expression) to use the release variable (e.g. pod=~"{{ .Release.Name }}.*vision-agent.*" or similar templated pattern) so the queries only match pods from the current release instead of all pods matching "vision-agent".

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/helm/templates/grafana-dashboard.yaml

+      "templating": { "list": [] },
+      "time": { "from": "now-1h", "to": "now" },
+      "title": "Vision Agent Overview",
+      "uid": "vision-agent-overview"


⚠️ Potential issue | 🟠 Major

Avoid a globally fixed Grafana UID.

Line 135 uses a static UID, which can collide across namespaces/releases in shared Grafana environments.

🔧 Suggested fix

- "uid": "vision-agent-overview" + "uid": "{{ printf "%s-%s" .Release.Namespace .Release.Name | trunc 40 }}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"uid": "vision-agent-overview"

"uid": "{{ printf "%s-%s" .Release.Namespace .Release.Name | trunc 40 }}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml` at line 135, The dashboard uses a fixed Grafana UID ("uid": "vision-agent-overview") which can collide across releases; change it to a release-unique value (e.g., derive from .Release.Name and/or .Release.Namespace, or append a short random suffix) or remove the UID so Grafana generates one. Update the template that emits the "uid" field (the entry with "uid": "vision-agent-overview") to interpolate a unique identifier using Helm template helpers like .Release.Name, .Release.Namespace, or a generated token (or omit the field entirely).

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/helm/templates/servicemonitor.yaml

@@ -0,0 +1,19 @@
+{{- if .Values.metrics.enabled }}


⚠️ Potential issue | 🟠 Major

Guard ServiceMonitor creation on CRD availability.

Line 1 only checks values; installs will fail on clusters that do not have monitoring.coreos.com/v1 CRDs.

🔧 Proposed fix

-{{- if .Values.metrics.enabled }} +{{- if and .Values.metrics.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{{- if .Values.metrics.enabled }}

{{- if and .Values.metrics.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}

🧰 Tools

🪛 YAMLlint (1.38.0)

[error] 1-1: syntax error: expected the node content, but found '-'

(syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/helm/templates/servicemonitor.yaml` at line 1, The template only checks .Values.metrics.enabled and should also verify the ServiceMonitor CRD exists to avoid install failures on clusters without monitoring.coreos.com/v1; update the conditional around the ServiceMonitor manifest in servicemonitor.yaml to require both .Values.metrics.enabled and a successful lookup call (e.g. lookup("apiextensions.k8s.io/v1","CustomResourceDefinition","","servicemonitors.monitoring.coreos.com") ) so the resource is rendered only when the CRD is present.

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/helm/values.yaml

+  enabled: true
+  className: ""   # nginx, traefik, etc.
+  host: ""        # e.g. agent.example.com


⚠️ Potential issue | 🟠 Major

Ingress is enabled by default with an empty host (wildcard exposure).

With Line 45 enabled and Line 47 empty, this commonly creates a catch-all ingress rule. Safer default is to keep ingress off until host/TLS are explicitly set.

🔧 Proposed fix

ingress: - enabled: true + enabled: false className: "" # nginx, traefik, etc. host: "" # e.g. agent.example.com

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

enabled: true

className: "" # nginx, traefik, etc.

host: "" # e.g. agent.example.com

enabled: false

className: "" # nginx, traefik, etc.

host: "" # e.g. agent.example.com

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/helm/values.yaml` around lines 45 - 47, The Ingress is enabled by default with an empty host causing a catch-all rule; update the Helm values to set enabled: false by default (change the values.yaml key "enabled" to false) and add validation in the chart templates/notes to only render/create ingress when .Values.ingress.enabled is true and .Values.ingress.host is non-empty (check .Values.ingress.host in the template or use a required() helper) so ingress is only created when host/TLS are explicitly provided; reference the "enabled", "className", and "host" keys when making these changes.

coderabbitai · 2026-03-05T22:30:29Z

examples/07_deploy_example/helm/values.yaml

+    enabled: true
+  # External Redis URL (for production — use managed Redis)
+  # When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis
+  # When redis.deploy.enabled=false, set this to your managed Redis URL
+  url: ""  # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0
+  # Bitnami Redis chart overrides (only when redis.deploy.enabled=true)
+  auth:
+    enabled: false
+  architecture: standalone


⚠️ Potential issue | 🟠 Major

Bundled Redis is unauthenticated by default.

Line 58 enables in-cluster Redis by default, while Line 65 disables auth. That weakens session-store security in shared clusters.

🔧 Proposed fix

redis: deploy: enabled: true @@ auth: - enabled: false + enabled: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

enabled: true

# External Redis URL (for production — use managed Redis)

# When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis

# When redis.deploy.enabled=false, set this to your managed Redis URL

url: "" # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0

# Bitnami Redis chart overrides (only when redis.deploy.enabled=true)

auth:

enabled: false

architecture: standalone

enabled: true

# External Redis URL (for production — use managed Redis)

# When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis

# When redis.deploy.enabled=false, set this to your managed Redis URL

url: "" # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0

# Bitnami Redis chart overrides (only when redis.deploy.enabled=true)

auth:

enabled: true

architecture: standalone

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/helm/values.yaml` around lines 58 - 66, The bundled Redis is enabled by default and has auth disabled (redis.deploy.enabled / auth.enabled), which is insecure; update the Helm values to require authentication by default by setting auth.enabled to true (and provide guidance to set a secure password or use a managed URL via the url field when disabling redis.deploy.enabled), ensuring the redis.auth.enabled and redis.deploy.enabled/url settings are consistent so in-cluster Redis is not left unauthenticated.

github-actions bot added examples config labels Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade deploy example with production-ready Helm chart#396

Upgrade deploy example with production-ready Helm chart#396
aliev wants to merge 1 commit intomainfrom
feat/deploy-example-helm-chart

aliev commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]
	CMD ["sh", "-c", "uv sync --frozen && exec uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]

	"uid": "vision-agent-overview"
	"uid": "{{ printf "%s-%s" .Release.Namespace .Release.Name \| trunc 40 }}"

	{{- if .Values.metrics.enabled }}
	{{- if and .Values.metrics.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}

Conversation

aliev commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Helm chart additions

Application changes

Configuration highlights (values.yaml)

Documentation

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aliev commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

Configuration highlights (`values.yaml`)

coderabbitai bot commented Mar 5, 2026 •

edited

Loading