Upgrade deploy example with production-ready Helm chart#396
Upgrade deploy example with production-ready Helm chart#396
Conversation
Turn the deploy example into a production-ready Helm chart that works on any Kubernetes cluster (GKE, EKS, AKS, local). Helm chart additions: - Service (ClusterIP) and Ingress templates - Bundled Redis via Bitnami subchart (optional, for session storage) - Auto-wiring: REDIS_URL env var set automatically when Redis is deployed - Health probes (liveness on /health, readiness on /ready) - ServiceMonitor for Prometheus scraping with configurable labels - Grafana dashboard ConfigMap (8 panels: sessions, latency, tokens, CPU/memory) - Removed hardcoded Nebius GPU nodeSelector — now fully cloud-agnostic Application changes: - Add Redis SessionRegistry support (conditional on REDIS_URL env var) - Add Prometheus /metrics endpoint with 9 custom gauges - Pin getstream<3.0.0 (v3 has breaking changes) - Add prometheus-client and vision-agents[redis] dependencies - Fix Dockerfile: serve mode with --host 0.0.0.0 (required for k8s probes)
📝 WalkthroughWalkthroughThe deployment example is enhanced with observability infrastructure (Prometheus metrics, Grafana dashboard), Kubernetes resource templates (Service, Ingress, ServiceMonitor), Redis-based session management, and application health checks. The application now collects and exposes metrics, with corresponding Helm configurations and updated dependencies. Changes
Sequence DiagramsequenceDiagram
participant App as deploy_example.py
participant Metrics as Prometheus Client
participant Redis as Redis Registry
participant Endpoint as /metrics Endpoint
participant Prometheus as Prometheus
participant Grafana as Grafana
App->>Redis: Register session (if REDIS_URL set)
App->>App: Collect session metrics from registry
App->>Metrics: Update Gauge metrics (latencies, tokens, etc.)
Prometheus->>Endpoint: Scrape /metrics (via ServiceMonitor)
Endpoint->>Metrics: generate_latest()
Endpoint->>Prometheus: Return metric exposition format
Prometheus->>Prometheus: Store time-series data
Grafana->>Prometheus: Query metrics (PromQL)
Grafana->>Grafana: Render dashboard panels
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
examples/07_deploy_example/deploy_example.py (2)
115-119: Move Redis session-registry imports to module scope.Line 116 introduces local imports in runtime flow; keep imports at the top of the file.
As per coding guidelines, "Do not use local imports; import all modules at the top of the file."🔧 Proposed fix
@@ from fastapi import Response from prometheus_client import Gauge, generate_latest, CONTENT_TYPE_LATEST from vision_agents.core import Agent, AgentLauncher, Runner, User +from vision_agents.core.agents.session_registry import RedisSessionKVStore, SessionRegistry @@ if redis_url: - from vision_agents.core.agents.session_registry import ( - RedisSessionKVStore, - SessionRegistry, - ) - store = RedisSessionKVStore(url=redis_url) registry = SessionRegistry(store=store)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/deploy_example.py` around lines 115 - 119, The conditional imports for RedisSessionKVStore and SessionRegistry are done inside runtime flow (if redis_url:) which violates the guideline to keep imports at module scope; move the import statements for RedisSessionKVStore and SessionRegistry to the top-level imports of the module (module scope) so they are imported on module load rather than inside the conditional, and keep the existing runtime conditional logic that instantiates or uses RedisSessionKVStore/SessionRegistry unchanged.
26-27: Remove decorative section-divider comments to match repo Python style.Line 26, Line 79, and Line 110 use divider-style section comments; please replace with plain comments or remove them.
As per coding guidelines, "Do not use section comments like
# -- some section --."Also applies to: 79-80, 110-111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/07_deploy_example/deploy_example.py` around lines 26 - 27, Remove the decorative section-divider comments (e.g., the lines like "# ── Prometheus metrics ──────────────────────────────────────────────") to conform to repo Python style; replace them with plain single-line comments or delete them where found (the Prometheus metrics header and the two other divider comments later in the file) so only normal comments remain and no section-divider art is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/07_deploy_example/deploy_example.py`:
- Around line 59-65: Replace the blanket "except Exception:" catch blocks around
the metrics collection logic (the try that calls
s.agent.metrics.to_dict().get(key) and the similar block later) with a narrow
exception tuple (AttributeError, TypeError, ValueError) and log the caught
exception with logger.exception(...) so tracebacks are recorded; specifically,
change the catch for the s.agent.metrics.to_dict() access and the other metrics
parsing try/except to "except (AttributeError, TypeError, ValueError) as e:" and
call logger.exception("... contextual message ...", exc_info=e) before
continuing, leaving the gauge.set(...) logic unchanged.
In `@examples/07_deploy_example/Dockerfile`:
- Line 18: Change the Docker CMD so the final server process replaces PID 1 (so
SIGTERM reaches it): either add an explicit exec before the server command in
the existing shell invocation (so the uv run process is exec'd), or replace the
CMD with a small entrypoint script that runs "uv sync --frozen" then execs "uv
run deploy_example.py serve --host 0.0.0.0 --port 8080"; update the CMD line
that currently wraps "uv sync" and "uv run" in "sh -c" to use one of these
approaches so the uv run process receives signals directly.
In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml`:
- Line 135: The dashboard uses a fixed Grafana UID ("uid":
"vision-agent-overview") which can collide across releases; change it to a
release-unique value (e.g., derive from .Release.Name and/or .Release.Namespace,
or append a short random suffix) or remove the UID so Grafana generates one.
Update the template that emits the "uid" field (the entry with "uid":
"vision-agent-overview") to interpolate a unique identifier using Helm template
helpers like .Release.Name, .Release.Namespace, or a generated token (or omit
the field entirely).
- Around line 82-97: Replace the hardcoded pod regex ".*vision-agent.*" with a
Helm-scoped regex that includes the current release name; update the two PromQL
targets (the CPU target expression in the timeseries panel with refId "A" and
the "Pod Memory Usage" target expression) to use the release variable (e.g.
pod=~"{{ .Release.Name }}.*vision-agent.*" or similar templated pattern) so the
queries only match pods from the current release instead of all pods matching
"vision-agent".
In `@examples/07_deploy_example/helm/templates/servicemonitor.yaml`:
- Line 1: The template only checks .Values.metrics.enabled and should also
verify the ServiceMonitor CRD exists to avoid install failures on clusters
without monitoring.coreos.com/v1; update the conditional around the
ServiceMonitor manifest in servicemonitor.yaml to require both
.Values.metrics.enabled and a successful lookup call (e.g.
lookup("apiextensions.k8s.io/v1","CustomResourceDefinition","","servicemonitors.monitoring.coreos.com")
) so the resource is rendered only when the CRD is present.
In `@examples/07_deploy_example/helm/values.yaml`:
- Around line 45-47: The Ingress is enabled by default with an empty host
causing a catch-all rule; update the Helm values to set enabled: false by
default (change the values.yaml key "enabled" to false) and add validation in
the chart templates/notes to only render/create ingress when
.Values.ingress.enabled is true and .Values.ingress.host is non-empty (check
.Values.ingress.host in the template or use a required() helper) so ingress is
only created when host/TLS are explicitly provided; reference the "enabled",
"className", and "host" keys when making these changes.
- Around line 58-66: The bundled Redis is enabled by default and has auth
disabled (redis.deploy.enabled / auth.enabled), which is insecure; update the
Helm values to require authentication by default by setting auth.enabled to true
(and provide guidance to set a secure password or use a managed URL via the url
field when disabling redis.deploy.enabled), ensuring the redis.auth.enabled and
redis.deploy.enabled/url settings are consistent so in-cluster Redis is not left
unauthenticated.
---
Nitpick comments:
In `@examples/07_deploy_example/deploy_example.py`:
- Around line 115-119: The conditional imports for RedisSessionKVStore and
SessionRegistry are done inside runtime flow (if redis_url:) which violates the
guideline to keep imports at module scope; move the import statements for
RedisSessionKVStore and SessionRegistry to the top-level imports of the module
(module scope) so they are imported on module load rather than inside the
conditional, and keep the existing runtime conditional logic that instantiates
or uses RedisSessionKVStore/SessionRegistry unchanged.
- Around line 26-27: Remove the decorative section-divider comments (e.g., the
lines like "# ── Prometheus metrics
──────────────────────────────────────────────") to conform to repo Python
style; replace them with plain single-line comments or delete them where found
(the Prometheus metrics header and the two other divider comments later in the
file) so only normal comments remain and no section-divider art is used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9bc5d131-7d74-4466-9caa-9975a4010837
⛔ Files ignored due to path filters (1)
examples/07_deploy_example/helm/Chart.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
examples/07_deploy_example/Dockerfileexamples/07_deploy_example/deploy_example.pyexamples/07_deploy_example/helm/Chart.yamlexamples/07_deploy_example/helm/templates/deployment.yamlexamples/07_deploy_example/helm/templates/grafana-dashboard.yamlexamples/07_deploy_example/helm/templates/ingress.yamlexamples/07_deploy_example/helm/templates/service.yamlexamples/07_deploy_example/helm/templates/servicemonitor.yamlexamples/07_deploy_example/helm/values.yamlexamples/07_deploy_example/pyproject.toml
| try: | ||
| v = s.agent.metrics.to_dict().get(key) | ||
| if v is not None: | ||
| values.append(v) | ||
| except Exception: | ||
| continue | ||
| gauge.set(sum(values) / len(values) / 1000 if values else 0) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n examples/07_deploy_example/deploy_example.py | head -100Repository: GetStream/Vision-Agents
Length of output: 4262
Catch specific exceptions in metrics collection instead of blanket Exception handling.
Lines 63 and 74 catch all exceptions without logging, which hides real defects in the metrics collection path and violates coding guidelines. Replace except Exception: with specific exception types (AttributeError, TypeError, ValueError) and log the error using logger.exception() for observability.
Proposed fix
for key, gauge in _AVG_METRICS:
values = []
for s in sessions.values():
try:
v = s.agent.metrics.to_dict().get(key)
if v is not None:
values.append(v)
- except Exception:
+ except (AttributeError, TypeError, ValueError):
+ logger.exception("Failed to collect average metric for key '%s'", key)
continue
gauge.set(sum(values) / len(values) / 1000 if values else 0)
for key, gauge in _SUM_METRICS:
total = 0
for s in sessions.values():
try:
v = s.agent.metrics.to_dict().get(key)
if v is not None:
total += int(v)
- except Exception:
+ except (AttributeError, TypeError, ValueError):
+ logger.exception("Failed to collect sum metric for key '%s'", key)
continue
gauge.set(total)Per coding guidelines: "Never write except Exception as e; catch specific exceptions instead" and "Use logger.exception() when logging an error with a traceback."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/deploy_example.py` around lines 59 - 65, Replace
the blanket "except Exception:" catch blocks around the metrics collection logic
(the try that calls s.agent.metrics.to_dict().get(key) and the similar block
later) with a narrow exception tuple (AttributeError, TypeError, ValueError) and
log the caught exception with logger.exception(...) so tracebacks are recorded;
specifically, change the catch for the s.agent.metrics.to_dict() access and the
other metrics parsing try/except to "except (AttributeError, TypeError,
ValueError) as e:" and call logger.exception("... contextual message ...",
exc_info=e) before continuing, leaving the gauge.set(...) logic unchanged.
|
|
||
| # Install deps at startup (faster push, slower cold start) | ||
| CMD ["sh", "-c", "uv sync --frozen -v && uv run deploy_example.py run"] | ||
| CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n examples/07_deploy_example/DockerfileRepository: GetStream/Vision-Agents
Length of output: 650
🌐 Web query:
Docker exec signal handling PID 1 graceful shutdown SIGTERM
💡 Result:
- Only PID 1 in the container gets the “stop” signal.
docker stop <ctr>sends SIGTERM to the container’s main process (PID 1), waits a grace period, then sends SIGKILL if it didn’t exit. You can change the first signal (STOPSIGNAL/--stop-signal) and the grace period (--timeout/--stop-timeout). [1] docker execis not a shutdown mechanism. It starts an additional process; signals you send to thatexec’d process don’t automatically reach PID 1. For graceful shutdown, signal PID 1 (usually viadocker stop, ordocker kill --signal=SIGTERM <ctr>). [1]- Common reason SIGTERM “doesn’t work”: PID 1 is a shell wrapper. If you use the shell form of
ENTRYPOINT/CMD, Docker runs it under/bin/sh -c, which does not pass signals, so your real app won’t receive SIGTERM fromdocker stop. Use exec form so your app is PID 1. [2] - Linux PID 1 is special: a process running as PID 1 won’t terminate on SIGINT/SIGTERM unless it’s coded to do so (i.e., it must handle the signal or you must have an init/signal-forwarder). [3]
- Fix options:
- Use exec-form
ENTRYPOINT/CMD(and in any entrypoint script, end withexec "$@") so the app becomes PID 1 and can receive SIGTERM. [2] - Add an init to forward signals/reap zombies:
docker run --init …(uses Docker’sdocker-init, backed by tini). [3] - Configure stop signal if your app expects something else:
STOPSIGNAL <signal>(default is SIGTERM). [4]
- Use exec-form
Sources: Docker stop behavior and timeouts [1]; Dockerfile shell-form ENTRYPOINT doesn’t pass signals [2]; PID 1 signal caveat + --init/tini [3]; STOPSIGNAL default and meaning [4].
Use exec in CMD so SIGTERM reaches the app process.
Line 18 starts the server through a shell that remains PID 1. When the container receives SIGTERM (from docker stop or Kubernetes shutdown), the signal goes to the shell, not the application—preventing graceful shutdown.
🔧 Proposed fix
-CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]
+CMD ["sh", "-c", "uv sync --frozen && exec uv run deploy_example.py serve --host 0.0.0.0 --port 8080"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CMD ["sh", "-c", "uv sync --frozen && uv run deploy_example.py serve --host 0.0.0.0 --port 8080"] | |
| CMD ["sh", "-c", "uv sync --frozen && exec uv run deploy_example.py serve --host 0.0.0.0 --port 8080"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/Dockerfile` at line 18, Change the Docker CMD so
the final server process replaces PID 1 (so SIGTERM reaches it): either add an
explicit exec before the server command in the existing shell invocation (so the
uv run process is exec'd), or replace the CMD with a small entrypoint script
that runs "uv sync --frozen" then execs "uv run deploy_example.py serve --host
0.0.0.0 --port 8080"; update the CMD line that currently wraps "uv sync" and "uv
run" in "sh -c" to use one of these approaches so the uv run process receives
signals directly.
| "expr": "rate(container_cpu_usage_seconds_total{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}[5m])", | ||
| "legendFormat": "{{`{{`}} pod {{`}}`}}", | ||
| "refId": "A" | ||
| }] | ||
| }, | ||
| { | ||
| "title": "Pod Memory Usage", | ||
| "type": "timeseries", | ||
| "gridPos": { "h": 8, "w": 12, "x": 12, "y": 16 }, | ||
| "fieldConfig": { | ||
| "defaults": { "color": { "mode": "palette-classic" }, "unit": "bytes" }, | ||
| "overrides": [] | ||
| }, | ||
| "targets": [{ | ||
| "expr": "container_memory_working_set_bytes{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}", | ||
| "legendFormat": "{{`{{`}} pod {{`}}`}}", |
There was a problem hiding this comment.
Scope pod PromQL to the current release, not .*vision-agent.*.
Line 82 and Line 96 currently match pods by a hardcoded substring, which can pull metrics from other releases and can miss pods when naming is overridden.
🔧 Suggested fix
- "expr": "rate(container_cpu_usage_seconds_total{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}[5m])",
+ "expr": "rate(container_cpu_usage_seconds_total{namespace=\"{{ .Release.Namespace }}\",pod=~\".*{{ include "vision-agent.fullname" . }}.*\", container!=\"POD\", container!=\"\"}[5m])",
...
- "expr": "container_memory_working_set_bytes{pod=~\".*vision-agent.*\", container!=\"POD\", container!=\"\"}",
+ "expr": "container_memory_working_set_bytes{namespace=\"{{ .Release.Namespace }}\",pod=~\".*{{ include "vision-agent.fullname" . }}.*\", container!=\"POD\", container!=\"\"}",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml` around
lines 82 - 97, Replace the hardcoded pod regex ".*vision-agent.*" with a
Helm-scoped regex that includes the current release name; update the two PromQL
targets (the CPU target expression in the timeseries panel with refId "A" and
the "Pod Memory Usage" target expression) to use the release variable (e.g.
pod=~"{{ .Release.Name }}.*vision-agent.*" or similar templated pattern) so the
queries only match pods from the current release instead of all pods matching
"vision-agent".
| "templating": { "list": [] }, | ||
| "time": { "from": "now-1h", "to": "now" }, | ||
| "title": "Vision Agent Overview", | ||
| "uid": "vision-agent-overview" |
There was a problem hiding this comment.
Avoid a globally fixed Grafana UID.
Line 135 uses a static UID, which can collide across namespaces/releases in shared Grafana environments.
🔧 Suggested fix
- "uid": "vision-agent-overview"
+ "uid": "{{ printf "%s-%s" .Release.Namespace .Release.Name | trunc 40 }}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "uid": "vision-agent-overview" | |
| "uid": "{{ printf "%s-%s" .Release.Namespace .Release.Name | trunc 40 }}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/helm/templates/grafana-dashboard.yaml` at line
135, The dashboard uses a fixed Grafana UID ("uid": "vision-agent-overview")
which can collide across releases; change it to a release-unique value (e.g.,
derive from .Release.Name and/or .Release.Namespace, or append a short random
suffix) or remove the UID so Grafana generates one. Update the template that
emits the "uid" field (the entry with "uid": "vision-agent-overview") to
interpolate a unique identifier using Helm template helpers like .Release.Name,
.Release.Namespace, or a generated token (or omit the field entirely).
| @@ -0,0 +1,19 @@ | |||
| {{- if .Values.metrics.enabled }} | |||
There was a problem hiding this comment.
Guard ServiceMonitor creation on CRD availability.
Line 1 only checks values; installs will fail on clusters that do not have monitoring.coreos.com/v1 CRDs.
🔧 Proposed fix
-{{- if .Values.metrics.enabled }}
+{{- if and .Values.metrics.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if .Values.metrics.enabled }} | |
| {{- if and .Values.metrics.enabled (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }} |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/helm/templates/servicemonitor.yaml` at line 1, The
template only checks .Values.metrics.enabled and should also verify the
ServiceMonitor CRD exists to avoid install failures on clusters without
monitoring.coreos.com/v1; update the conditional around the ServiceMonitor
manifest in servicemonitor.yaml to require both .Values.metrics.enabled and a
successful lookup call (e.g.
lookup("apiextensions.k8s.io/v1","CustomResourceDefinition","","servicemonitors.monitoring.coreos.com")
) so the resource is rendered only when the CRD is present.
| enabled: true | ||
| className: "" # nginx, traefik, etc. | ||
| host: "" # e.g. agent.example.com |
There was a problem hiding this comment.
Ingress is enabled by default with an empty host (wildcard exposure).
With Line 45 enabled and Line 47 empty, this commonly creates a catch-all ingress rule. Safer default is to keep ingress off until host/TLS are explicitly set.
🔧 Proposed fix
ingress:
- enabled: true
+ enabled: false
className: "" # nginx, traefik, etc.
host: "" # e.g. agent.example.com📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| enabled: true | |
| className: "" # nginx, traefik, etc. | |
| host: "" # e.g. agent.example.com | |
| enabled: false | |
| className: "" # nginx, traefik, etc. | |
| host: "" # e.g. agent.example.com |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/helm/values.yaml` around lines 45 - 47, The
Ingress is enabled by default with an empty host causing a catch-all rule;
update the Helm values to set enabled: false by default (change the values.yaml
key "enabled" to false) and add validation in the chart templates/notes to only
render/create ingress when .Values.ingress.enabled is true and
.Values.ingress.host is non-empty (check .Values.ingress.host in the template or
use a required() helper) so ingress is only created when host/TLS are explicitly
provided; reference the "enabled", "className", and "host" keys when making
these changes.
| enabled: true | ||
| # External Redis URL (for production — use managed Redis) | ||
| # When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis | ||
| # When redis.deploy.enabled=false, set this to your managed Redis URL | ||
| url: "" # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0 | ||
| # Bitnami Redis chart overrides (only when redis.deploy.enabled=true) | ||
| auth: | ||
| enabled: false | ||
| architecture: standalone |
There was a problem hiding this comment.
Bundled Redis is unauthenticated by default.
Line 58 enables in-cluster Redis by default, while Line 65 disables auth. That weakens session-store security in shared clusters.
🔧 Proposed fix
redis:
deploy:
enabled: true
@@
auth:
- enabled: false
+ enabled: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| enabled: true | |
| # External Redis URL (for production — use managed Redis) | |
| # When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis | |
| # When redis.deploy.enabled=false, set this to your managed Redis URL | |
| url: "" # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0 | |
| # Bitnami Redis chart overrides (only when redis.deploy.enabled=true) | |
| auth: | |
| enabled: false | |
| architecture: standalone | |
| enabled: true | |
| # External Redis URL (for production — use managed Redis) | |
| # When redis.deploy.enabled=true and url is empty, auto-connects to the bundled Redis | |
| # When redis.deploy.enabled=false, set this to your managed Redis URL | |
| url: "" # e.g. redis://redis:6379/0 or rediss://:auth@host:6380/0 | |
| # Bitnami Redis chart overrides (only when redis.deploy.enabled=true) | |
| auth: | |
| enabled: true | |
| architecture: standalone |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/07_deploy_example/helm/values.yaml` around lines 58 - 66, The
bundled Redis is enabled by default and has auth disabled (redis.deploy.enabled
/ auth.enabled), which is insecure; update the Helm values to require
authentication by default by setting auth.enabled to true (and provide guidance
to set a secure password or use a managed URL via the url field when disabling
redis.deploy.enabled), ensuring the redis.auth.enabled and
redis.deploy.enabled/url settings are consistent so in-cluster Redis is not left
unauthenticated.
Summary
Turns the
07_deploy_exampleinto a production-ready Helm chart that works on any Kubernetes cluster — GKE, EKS, AKS, Minikube, Orbstack, Docker Desktop.Helm chart additions
REDIS_URLenv var/health, readiness on/readyadditionalLabelsnodeSelector— now cloud-agnosticApplication changes
SessionRegistrysupport (conditional onREDIS_URLenv var)/metricsendpoint with 9 custom gaugesgetstream>=2.0.0,<3.0.0(v3 has breakingChannelMemberimport)prometheus-clientandvision-agents[redis]dependenciesserve --host 0.0.0.0(required for k8s health probes)Configuration highlights (
values.yaml)redis.deploy.enabledtrueredis.url""metrics.enabledtruegrafana.enabledtrueingress.enabledtruegpu.enabledfalseDocumentation
Step-by-step deployment guide: GetStream/Agents-Docs#71
Test plan
Summary by CodeRabbit
New Features
Chores