Skip to content

Add debug-mcp-deploy skill#205

Closed
RafaelPo wants to merge 1 commit intomainfrom
add/debug-mcp-deploy-skill
Closed

Add debug-mcp-deploy skill#205
RafaelPo wants to merge 1 commit intomainfrom
add/debug-mcp-deploy-skill

Conversation

@RafaelPo
Copy link
Contributor

Summary

  • Adds a Claude Code skill for debugging MCP server deployments (staging and production)
  • Covers Helm values layering, SOPS secrets inspection, auth/JWKS flow debugging, and common failure modes
  • Born from diagnosing the JWKS staging verification bug (see Fix JWKS verification failure on staging MCP server #204)

Test plan

  • Verify skill is auto-discovered by Claude Code when working in the repo
  • Test trigger phrases: "debug deploy", "staging error", "auth failure", "JWKS"

🤖 Generated with Claude Code

@RafaelPo RafaelPo force-pushed the add/debug-mcp-deploy-skill branch 2 times, most recently from b3716fb to 0cd0bc0 Compare February 24, 2026 18:52
@RafaelPo RafaelPo requested review from hnykda and rgambee February 24, 2026 18:54
@RafaelPo RafaelPo force-pushed the add/debug-mcp-deploy-skill branch from 0cd0bc0 to d18cd0f Compare February 24, 2026 18:54
@RafaelPo
Copy link
Contributor Author

@claude code review

@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2026

Claude finished @RafaelPo's task in 1m 32s —— View job


Code Review

  • Read repo conventions
  • Read changed files
  • Review SKILL.md content
  • Review workflow change
  • Post review feedback

.claude/skills/debug-mcp-deploy/SKILL.md

Well-structured debugging runbook. The content is thorough, clearly born from real incident debugging (#204). A few observations:

Looks good:

  • Clean frontmatter with useful trigger keywords for auto-discovery
  • Logical progression from "check pods" → "inspect env" → "compare secrets" → "fix"
  • The "Common Issues" section documents real failure modes with clear symptom/cause/fix structure
  • Gotchas section captures non-obvious operational knowledge (shared API, Redis DB isolation, SOPS KMS access)

Minor suggestions:

  1. Line 8 — The instruction to resolve <staging-namespace> from values.staging.yaml is helpful, but Claude might struggle to find the actual namespace on its own. Consider adding a concrete example like everyrow-mcp-staging (or whatever it is) alongside the placeholder, so Claude has a fallback if it can't locate the file.

  2. Lines 92-98 / 103-112 / 154-159 — The kubectl exec ... -- python3 -c commands use multi-line Python strings. These work but can be fragile when pasted across different shells. This is fine for a skill doc (Claude will adapt), just noting it.

  3. Line 87app.py:_http_client_factory is a useful cross-reference. If this function moves or gets renamed, this reference will silently go stale. Not a blocker, but worth noting for future maintenance.


.github/workflows/skill-version-check.yaml

-      - "**/skills/**"
+      - "skills/**"

This narrows the trigger from "any skills/ directory at any depth" to only skills/ at the repo root. The effect: changes to .claude/skills/ (where the new skill lives) no longer trigger the version-bump check, while changes to skills/everyrow-sdk/ (the plugin skill) still do.

This makes sense — Claude Code native skills in .claude/skills/ are repo-local and don't participate in the plugin versioning system, so requiring a plugin.json version bump for them would be incorrect. Just confirming: this is intentional, right? If so, LGTM.


Summary

Clean PR. The skill content is comprehensive and well-organized, covering the full debugging lifecycle from pod inspection to fix deployment. The workflow change is logical. No blocking issues found.

pull_request:
paths:
- "**/skills/**"
- "skills/**"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The workflow path for skill-version-check.yaml is being narrowed to skills/**, which will prevent it from triggering for changes in the .claude/skills/ directory.
Severity: MEDIUM

Suggested Fix

Revert the path filter in skill-version-check.yaml back to **/skills/** to ensure it monitors all skill directories. Alternatively, explicitly add both paths, such as ['skills/**', '.claude/skills/**'], to ensure the version check workflow runs for changes in both locations.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: .github/workflows/skill-version-check.yaml#L6

Potential issue: The `skill-version-check.yaml` workflow path is being changed from
`**/skills/**` to `skills/**`. This narrows the scope to only the top-level `skills/`
directory. However, this pull request also adds a new skill in the `.claude/skills/`
directory. Because of the path change, the workflow will no longer trigger for changes
in `.claude/skills/`, defeating its purpose of enforcing version bumps for all skill
modifications. This allows the new skill and any future changes to skills in that
directory to bypass the required version check.

Did we get this right? 👍 / 👎 to inform future reviews.

pull_request:
paths:
- "**/skills/**"
- "skills/**"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The workflow trigger path was changed from **/skills/** to skills/**, which excludes the new skill location at .claude/skills/ from the required version bump check.
Severity: HIGH

Suggested Fix

Revert the workflow path pattern from skills/** back to **/skills/**. This will ensure the version check workflow triggers for skill changes in any subdirectory, including both the root skills/ directory and the new .claude/skills/ directory.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: .github/workflows/skill-version-check.yaml#L6

Potential issue: The workflow path pattern in `skill-version-check.yaml` was changed
from `**/skills/**` to `skills/**`. This change means the workflow, which is intended to
enforce version bumps in `.claude-plugin/plugin.json` when skills are modified, will no
longer trigger for changes in the `.claude/skills/` directory. Since a new skill is
being added in this PR at `.claude/skills/debug-mcp-deploy/SKILL.md`, changes to it will
bypass the version enforcement check. This breaks the intended governance that ensures
the plugin version stays in sync with all its constituent skill changes.

Did we get this right? 👍 / 👎 to inform future reviews.

### Step 4: Update a secret value

```bash
sops --set '["secrets"]["data"]["KEY_NAME"] "new-value"' everyrow-mcp/deploy/chart/secrets.staging.enc.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The sops --set command in the skill documentation has incorrect syntax, combining the path and value into a single argument, which will cause the command to fail.
Severity: MEDIUM

Suggested Fix

Correct the sops command syntax by separating the path and value arguments. The command should be changed to sops --set '["secrets"]["data"]["KEY_NAME"]' 'new-value' everyrow-mcp/deploy/chart/secrets.staging.enc.yaml to match the tool's expected format.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: .claude/skills/debug-mcp-deploy/SKILL.md#L74

Potential issue: A `sops` command within the `debug-mcp-deploy` skill documentation is
syntactically incorrect. The command `sops --set '["secrets"]["data"]["KEY_NAME"]
"new-value"' ...` incorrectly combines the JSON path and the new value into a single
string argument. According to `sops` documentation, the path and value should be
separate arguments. An operator following this guide to update a secret during a
production or staging issue will encounter a command failure, hindering their ability to
resolve the problem.

Did we get this right? 👍 / 👎 to inform future reviews.

@RafaelPo RafaelPo requested a review from straeter February 26, 2026 16:59
Comment on lines +36 to +39
docker compose \
-f docker-compose.yaml \
-f docker-compose.local.yaml \
up -d --build

This comment was marked as outdated.

@RafaelPo RafaelPo force-pushed the add/debug-mcp-deploy-skill branch from 9c30a73 to e894ff2 Compare February 27, 2026 23:28
- deploy-mcp: staging/production GKE deployment commands and monitoring
- run-mcp-local: local Docker Compose setup with Cloudflare tunnel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RafaelPo RafaelPo force-pushed the add/debug-mcp-deploy-skill branch from e894ff2 to c8f349f Compare February 27, 2026 23:29
@RafaelPo RafaelPo closed this Feb 27, 2026
@RafaelPo RafaelPo deleted the add/debug-mcp-deploy-skill branch February 27, 2026 23:31
Comment on lines +67 to +68
ln -s /Users/rafaelpoyiadzi/Documents/git/everyrow-sdk/everyrow-mcp/deploy/.env \
<worktree-path>/everyrow-mcp/deploy/.env
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A command in the SKILL.md documentation contains a hardcoded, user-specific file path, which will fail for other developers.
Severity: HIGH

Suggested Fix

Replace the hardcoded path /Users/rafaelpoyiadzi/Documents/git/everyrow-sdk/ with a generic placeholder like <main>/ or <path-to-your-main-worktree>/ to match the pattern used elsewhere in the document and make the command universally usable.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: .claude/skills/run-mcp-local/SKILL.md#L67-L68

Potential issue: The `ln -s` command on lines 67-68 of
`.claude/skills/run-mcp-local/SKILL.md` uses the hardcoded path
`/Users/rafaelpoyiadzi/...`. This path is specific to the author's machine. Any other
developer attempting to run this command as part of the local setup will encounter an
error because the source path for the symbolic link does not exist on their system. This
prevents them from successfully completing the documented worktree setup. The same file
correctly uses a placeholder like `<main>/` for a similar command in the troubleshooting
section.

Comment on lines +50 to +52
| `ENABLE_SHEETS_TOOLS` | `false` | Register Google Sheets tools |
| `TRUST_PROXY_HEADERS` | `false` | Trust X-Forwarded-For (required behind tunnel) |
| `EXTRA_ALLOWED_HOSTS` | (empty) | Extra hostnames for DNS rebinding allowlist |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Documentation references several configuration files, scripts, and environment variables that do not exist in the codebase.
Severity: HIGH

Suggested Fix

Remove the sections of the documentation that refer to non-existent files and unimplemented environment variables (docker-compose.local.yaml, run-no-auth.sh, ENABLE_SHEETS_TOOLS, EXTRA_ALLOWED_HOSTS). Alternatively, implement the corresponding features and add the missing files to the repository so the documentation is accurate.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: .claude/skills/run-mcp-local/SKILL.md#L50-L52

Potential issue: The documentation in `.claude/skills/run-mcp-local/SKILL.md` instructs
developers to use configurations and files that are not present in the codebase.
Specifically, it references the file `docker-compose.local.yaml`, the script
`scripts/run-no-auth.sh`, and the environment variables `ENABLE_SHEETS_TOOLS` and
`EXTRA_ALLOWED_HOSTS`. None of these exist or are implemented in the application.
Developers following these instructions will encounter file-not-found errors and will be
unable to configure the system as described, leading to a non-functional local setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant