-
Notifications
You must be signed in to change notification settings - Fork 316
fix: retry MCP callTool on any connection error, not just ErrSessionMissing #2215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,7 @@ import ( | |
| "io" | ||
| "iter" | ||
| "log/slog" | ||
| "net" | ||
| "net/url" | ||
| "strings" | ||
| "sync" | ||
|
|
@@ -454,12 +455,13 @@ func (ts *Toolset) callTool(ctx context.Context, toolCall tools.ToolCall) (*tool | |
|
|
||
| resp, err := ts.mcpClient.CallTool(ctx, request) | ||
|
|
||
| // If the server lost our session (e.g. it restarted), force a | ||
| // reconnection and retry the call once. | ||
| if errors.Is(err, mcp.ErrSessionMissing) { | ||
| slog.Warn("MCP session missing, forcing reconnect and retrying", "tool", toolCall.Function.Name, "server", ts.logID) | ||
| // If the call failed with a connection or session error (e.g. the | ||
| // server restarted), trigger or wait for a reconnection and retry | ||
| // the call once. | ||
| if err != nil && isConnectionError(err) && ctx.Err() == nil { | ||
| slog.Warn("MCP call failed, forcing reconnect and retrying", "tool", toolCall.Function.Name, "server", ts.logID, "error", err) | ||
| if waitErr := ts.forceReconnectAndWait(ctx); waitErr != nil { | ||
| return nil, fmt.Errorf("failed to reconnect after session loss: %w", waitErr) | ||
| return nil, fmt.Errorf("failed to reconnect after call failure: %w", waitErr) | ||
| } | ||
| resp, err = ts.mcpClient.CallTool(ctx, request) | ||
| } | ||
|
|
@@ -690,3 +692,27 @@ func (ts *Toolset) GetPrompt(ctx context.Context, name string, arguments map[str | |
| slog.Debug("Retrieved MCP prompt", "prompt", name, "messages_count", len(result.Messages)) | ||
| return result, nil | ||
| } | ||
|
|
||
| // isConnectionError reports whether err is a connection or session error | ||
| // that warrants a reconnect-and-retry (as opposed to an application-level | ||
| // error that would fail again even after reconnecting). | ||
| func isConnectionError(err error) bool { | ||
| if errors.Is(err, mcp.ErrSessionMissing) || errors.Is(err, io.EOF) { | ||
| return true | ||
| } | ||
| var netErr net.Error | ||
| if errors.As(err, &netErr) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: All net.Error types treated as retriable connection errors The code treats any if errors.As(err, &netErr) {
return true
}However, some
Recommendation: Consider checking |
||
| return true | ||
| } | ||
| // The MCP SDK wraps transport failures (e.g. connection reset, EOF from | ||
| // client.Do) with its internal ErrRejected sentinel using %v, which | ||
| // drops the original error from the chain. Detect these by checking | ||
| // the error message for common transport-failure substrings. | ||
| if msg := err.Error(); strings.Contains(msg, "connection reset") || | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: String-based error detection is brittle and locale-dependent The string-matching logic (lines 711-714) checks for 'connection reset', 'connection refused', 'broken pipe', and 'EOF'. While the comment explains this is needed because the MCP SDK wraps errors with
Recommendation: While this is a documented workaround for upstream SDK limitations, consider:
|
||
| strings.Contains(msg, "connection refused") || | ||
| strings.Contains(msg, "broken pipe") || | ||
| strings.Contains(msg, "EOF") { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: Redundant EOF string matching could cause false positives The
If an error is Recommendation: Remove "EOF" from the string matching on line 714, since actual |
||
| return true | ||
| } | ||
| return false | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 MEDIUM: Comment claims to retry only connection/session errors but code retries all non-canceled errors
The comment states "If the call failed with a connection or session error (e.g. the server restarted), trigger or wait for a reconnection" but the actual code checks
if err != nil && !errors.Is(err, context.Canceled) && ctx.Err() == nil, which retries on ANY error.Impact: This mismatch between comment and implementation can mislead future maintainers about the actual behavior of the code.
Recommendation: Update the comment to accurately reflect the implementation, or update the implementation to match the comment's intent. For example:
Or implement selective error checking as suggested in the HIGH severity finding above.