Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 31 additions & 5 deletions pkg/tools/mcp/mcp.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"io"
"iter"
"log/slog"
"net"
"net/url"
"strings"
"sync"
Expand Down Expand Up @@ -454,12 +455,13 @@ func (ts *Toolset) callTool(ctx context.Context, toolCall tools.ToolCall) (*tool

resp, err := ts.mcpClient.CallTool(ctx, request)

// If the server lost our session (e.g. it restarted), force a
// reconnection and retry the call once.
if errors.Is(err, mcp.ErrSessionMissing) {
slog.Warn("MCP session missing, forcing reconnect and retrying", "tool", toolCall.Function.Name, "server", ts.logID)
// If the call failed with a connection or session error (e.g. the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM: Comment claims to retry only connection/session errors but code retries all non-canceled errors

The comment states "If the call failed with a connection or session error (e.g. the server restarted), trigger or wait for a reconnection" but the actual code checks if err != nil && !errors.Is(err, context.Canceled) && ctx.Err() == nil, which retries on ANY error.

Impact: This mismatch between comment and implementation can mislead future maintainers about the actual behavior of the code.

Recommendation: Update the comment to accurately reflect the implementation, or update the implementation to match the comment's intent. For example:

// If the call failed with any error (except context cancellation),
// trigger or wait for a reconnection and retry the call once.

Or implement selective error checking as suggested in the HIGH severity finding above.

// server restarted), trigger or wait for a reconnection and retry
// the call once.
if err != nil && isConnectionError(err) && ctx.Err() == nil {
slog.Warn("MCP call failed, forcing reconnect and retrying", "tool", toolCall.Function.Name, "server", ts.logID, "error", err)
if waitErr := ts.forceReconnectAndWait(ctx); waitErr != nil {
return nil, fmt.Errorf("failed to reconnect after session loss: %w", waitErr)
return nil, fmt.Errorf("failed to reconnect after call failure: %w", waitErr)
}
resp, err = ts.mcpClient.CallTool(ctx, request)
}
Expand Down Expand Up @@ -690,3 +692,27 @@ func (ts *Toolset) GetPrompt(ctx context.Context, name string, arguments map[str
slog.Debug("Retrieved MCP prompt", "prompt", name, "messages_count", len(result.Messages))
return result, nil
}

// isConnectionError reports whether err is a connection or session error
// that warrants a reconnect-and-retry (as opposed to an application-level
// error that would fail again even after reconnecting).
func isConnectionError(err error) bool {
if errors.Is(err, mcp.ErrSessionMissing) || errors.Is(err, io.EOF) {
return true
}
var netErr net.Error
if errors.As(err, &netErr) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM: All net.Error types treated as retriable connection errors

The code treats any net.Error as a retriable connection error (lines 704-706):

if errors.As(err, &netErr) {
    return true
}

However, some net.Error implementations represent permanent failures that won't be fixed by retrying:

  • DNS lookup failures for non-existent hosts (no such host)
  • Invalid address formats
  • Network unreachable errors (may indicate configuration issues)

Recommendation: Consider checking netErr.Temporary() or filtering for specific retriable error types to avoid unnecessary retry attempts on permanent errors.

return true
}
// The MCP SDK wraps transport failures (e.g. connection reset, EOF from
// client.Do) with its internal ErrRejected sentinel using %v, which
// drops the original error from the chain. Detect these by checking
// the error message for common transport-failure substrings.
if msg := err.Error(); strings.Contains(msg, "connection reset") ||
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM: String-based error detection is brittle and locale-dependent

The string-matching logic (lines 711-714) checks for 'connection reset', 'connection refused', 'broken pipe', and 'EOF'. While the comment explains this is needed because the MCP SDK wraps errors with %v, this approach has several risks:

  1. Not stable: Error messages are not part of Go's compatibility guarantee and can change across versions or operating systems
  2. Locale-dependent: Non-English systems may have different error messages
  3. Phrasing variations: Different libraries might phrase errors differently (e.g., 'reset by peer' vs 'connection reset')

Recommendation: While this is a documented workaround for upstream SDK limitations, consider:

  • Filing an issue with the MCP SDK to preserve error chains
  • Adding test coverage for the specific error messages you expect
  • Documenting the known limitation in code comments (already partially done)

strings.Contains(msg, "connection refused") ||
strings.Contains(msg, "broken pipe") ||
strings.Contains(msg, "EOF") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM: Redundant EOF string matching could cause false positives

The isConnectionError function checks for EOF twice:

  • Line 709: errors.Is(err, io.EOF) (proper error chain check)
  • Line 714: strings.Contains(msg, "EOF") (string matching)

If an error is io.EOF, it's already caught by line 709, making the string check redundant. More critically, the substring check could match unrelated errors that mention 'EOF' in their message (e.g., 'invalid EOF character in JSON', 'unexpected EOF in response body'), causing false-positive reconnection attempts.

Recommendation: Remove "EOF" from the string matching on line 714, since actual io.EOF errors are already handled by line 709.

return true
}
return false
}
Loading