Skip to content

chore(api): log reason for failing to connect, resume sandboxes on state change#2145

Merged
matthewlouisbrockman merged 9 commits intomainfrom
pause-resume-failure-logs
Mar 24, 2026
Merged

chore(api): log reason for failing to connect, resume sandboxes on state change#2145
matthewlouisbrockman merged 9 commits intomainfrom
pause-resume-failure-logs

Conversation

@matthewlouisbrockman
Copy link
Contributor

@matthewlouisbrockman matthewlouisbrockman commented Mar 16, 2026

currently, we drop the err from err = a.orchestrator.WaitForStateChange(ctx, teamID, sandboxID). This makes it hard for us to know why requests during pausing errored.

This change logs the dropped err so we can get better confidence in why we're gettign 500 errors during state transitions.

also updates the error logging to use the telemetry.reportCriticalError to maintain Error text while helping with the spans


Note

Low Risk
Low risk: changes are limited to error-path telemetry/logging while keeping existing control flow and responses largely the same, with added visibility into WaitForStateChange failures and snapshot/token errors.

Overview
Improves observability for sandbox connect and resume during state transitions by reporting previously dropped WaitForStateChange errors and standardizing several failure logs to telemetry.ReportCriticalError/ReportErrorByCode with sandbox/team (and where relevant template/build) metadata, making 500s and other failures easier to diagnose without changing the happy-path behavior.

Written by Cursor Bugbot for commit b289483. This will update automatically on new commits. Configure here.

@matthewlouisbrockman matthewlouisbrockman marked this pull request as ready for review March 16, 2026 22:09
Copy link
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the telemetry.ReportError and telemetry.ReportCriticalError instead to also populate the traces correctly. It does logging automatically

@matthewlouisbrockman
Copy link
Contributor Author

you can use the telemetry.ReportError and telemetry.ReportCriticalError instead to also populate the traces correctly. It does logging automatically

shoud we add him to logger.L().Error(ctx, "Error getting last snapshot", logger.WithSandboxID(sandboxID), zap.Error(err)) too to debug those guys? (i think it's a 30s timeout somewhere but can't find it yet)

@matthewlouisbrockman
Copy link
Contributor Author

eh, can worry about getting the reason for the failed db guys later, pretty sure those are all just timing out on the 30 second callback from the redis route

@matthewlouisbrockman matthewlouisbrockman changed the title log reason for failing to connect, resume sandboxes on state change observability: log reason for failing to connect, resume sandboxes on state change Mar 17, 2026
@matthewlouisbrockman matthewlouisbrockman changed the title observability: log reason for failing to connect, resume sandboxes on state change chore[api]: log reason for failing to connect, resume sandboxes on state change Mar 17, 2026
@matthewlouisbrockman matthewlouisbrockman changed the title chore[api]: log reason for failing to connect, resume sandboxes on state change chore(api): log reason for failing to connect, resume sandboxes on state change Mar 17, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@ValentaTomas ValentaTomas removed their request for review March 21, 2026 06:41
@matthewlouisbrockman matthewlouisbrockman merged commit f69d8a7 into main Mar 24, 2026
36 checks passed
@matthewlouisbrockman matthewlouisbrockman deleted the pause-resume-failure-logs branch March 24, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants