Skip to content

Graceful failure when extensions fail to start: suppress verbose errors and surface update hints#6832

Open
Copilot wants to merge 10 commits intomainfrom
copilot/improve-extension-error-messages
Open

Graceful failure when extensions fail to start: suppress verbose errors and surface update hints#6832
Copilot wants to merge 10 commits intomainfrom
copilot/improve-extension-error-messages

Conversation

Copy link
Contributor

Copilot AI commented Feb 20, 2026

When an extension fails to start, azd was dumping the extension's full stdout/stderr to the console—including internal gRPC logs, timestamps, and EOF traces—producing noise irrelevant to the user. There was also no guidance that an outdated extension might be the root cause.

Changes

  • cmd/middleware/extensions.go

    • Suppress verbose invocation errors from console: Extension process stdout/stderr on failure is now routed to log.Printf only, keeping it available for debug inspection (AZD_DEBUG) without polluting normal output.
    • Differentiate timeout vs real startup errors: Introduced extensionStartFailure struct tracking each extension alongside a timedOut bool (set via errors.Is(err, context.DeadlineExceeded)). The failure message now says "failed to initialize" generically; the AZD_EXT_TIMEOUT hint is only shown when at least one failure was actually a deadline exceeded.
    • Add update check on startup failure: After extensions reach a terminal state, checkUpdatesForExtensions consults the already-cached registry data for failed extensions (no network requests in the failure path). Any extension with an available upgrade emits a ux.WarningMessage with actionable upgrade instructions before the generic startup-failure block.
  • pkg/extensions/update_checker.go

    • Updated FormatUpdateWarning to use the correct azd extension upgrade command and to display hints in the "Fix by running:" / "If you don't use these extensions, you can uninstall them:" format, with the specific-extension upgrade listed first followed by --all.
  • pkg/extensions/registry_cache.go

    • Simplified GetExtensionLatestVersion to find the highest version via semver comparison across all listed versions (replacing complex prerelease-fallback logic that incorrectly returned a stable version when the only newer entry was a prerelease).

Before (on extension startup failure):

extension 'spboyer.azd.doctor' run failed: exit code: 1, stdout: Error: failed to register predeploy handler: EOF
, stderr: 2026/02/09 16:41:56 [spboyer.azd.doctor] Registered handler for MessageType=*azdext.InvokeProjectHandler
2026/02/09 16:41:56 [spboyer.azd.doctor] [RequestId=] ERROR: Failed to send fire-and-forget message, ...

WARNING: Extension startup failures detected
The following extensions failed to initialize within the timeout period:
  - azd doctor (spboyer.azd.doctor)

Some features may be unavailable. Increase timeout with AZD_EXT_TIMEOUT=<seconds> if needed.

After — timeout failure with a newer version available:

WARNING: The following extensions are outdated:
  - azd doctor (installed: 0.1.0, latest: 0.1.1)
  • Fix by running:
        azd extension upgrade spboyer.azd.doctor
        azd extension upgrade --all
  • If you don't use these extensions, you can uninstall them:
        azd extension uninstall spboyer.azd.doctor

WARNING: Extension startup failures detected
The following extensions failed to initialize:
  - azd doctor (spboyer.azd.doctor)

Some features may be unavailable. Increase timeout with AZD_EXT_TIMEOUT=<seconds> if needed.

After — real startup error (e.g. invalid binary, permission denied):

WARNING: Extension startup failures detected
The following extensions failed to initialize:
  - azd doctor (spboyer.azd.doctor)

Some features may be unavailable.
Original prompt

This section details on the original issue you should resolve

<issue_title>Graceful failure when extensions are not updated</issue_title>
<issue_description>Opportunity to improve extension installation error messages.

Describe the bug
When extensions (including test extensions) aren't updated, the tool currently fails with verbose errors that aren't helpful or relevant particularly when the extension isn't related to the current task. The messaging is noisy and doesn't guide the user to resolution.

It feels like a tidy task that's "in the way" unless it's needed for what I'm trying to accomplish (which in this case with azd up the updates aren't needed).

Image

To Reproduce

  • Attempt to use Azure Developer CLI features when certain extensions are out-of-date (especially test extension).
  • Observe the verbose failure output.

Expected behavior
Instead of verbose errors, the tool should fail gracefully with clear, plain-language guidance. Error messaging should briefly explain that an extension needs updating and provide instructions/link on how to update.

Additional context
Current error output is distracting and not relevant to the user flow. The experience could be improved by minimizing noise and offering actionable steps.
</issue_description>

<agent_instructions>1. Add a check for extensions to see if the user installed version is the latest version. If not, print a warning message like what core azd cli does to ask the user to update the extensions.
2. Check the error call stack as shown in the screenshot. Do not expose a long call stack to users in error message. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

…tensions

Co-authored-by: hemarina <104857065+hemarina@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve error messaging for outdated extensions Graceful failure when extensions fail to start: suppress verbose errors and surface update hints Feb 20, 2026
Copilot AI requested a review from hemarina February 20, 2026 22:59
Copy link
Member

@spboyer spboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns:

  1. Non-timeout startup errors are masked as timeout failures (High)
    The fallback warning always says extensions failed to initialize within the timeout period even for real startup errors (invalid binary, permission denied, bad args). WaitUntilReady returns the original failure error, not just timeout, but the message hardcodes timeout wording for all failures. Consider conditionally branching the messaging on context.DeadlineExceeded vs other error types.

  2. Synchronous network work in failure path (Medium)
    On extension startup failure, the middleware now performs update checks (including source refresh via FindExtensions which can issue HTTP requests) before continuing to next(ctx). This adds blocking I/O in a failure path that should be fast. Consider making update checks async with a short timeout, or skip remote refresh entirely during startup failure handling.

@spboyer
Copy link
Member

spboyer commented Feb 22, 2026

@copilot address feedback

…rk I/O from failure path

Co-authored-by: spboyer <7681382+spboyer@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 22, 2026

Two concerns:

  1. Non-timeout startup errors are masked as timeout failures (High)...

Both issues addressed in 19b8566:

  1. Timeout vs real startup errors: Added an extensionStartFailure struct that pairs each extension with a timedOut bool (set via errors.Is(err, context.DeadlineExceeded)). The intro line now says "failed to initialize" generically, and the AZD_EXT_TIMEOUT hint is only shown when at least one failure was actually a deadline exceeded.

  2. Network I/O in failure path: checkUpdatesForExtensions no longer refreshes the registry cache — it only consults already-cached data. Extensions whose source cache is expired or missing are skipped entirely, so no HTTP requests are made in the failure path.

Copilot AI requested a review from spboyer February 22, 2026 15:24
@hemarina
Copy link
Contributor

image

var exitErr *exec.ExitError
if errors.As(err, &exitErr) {
m.console.Message(ctx, fmt.Sprintf(
"Extension '%s' failed to start (exit code: %d). Run with --debug for details.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid print out a whole call stack in StdErr. How does folks think? Should we print out the whole error message?

output.WithHighLightFormat("azd extension upgrade %s", result.ExtensionId)),
fmt.Sprintf("To upgrade all: %s",
output.WithHighLightFormat("azd extension upgrade --all")),
fmt.Sprintf("Fix by running:\n\t%s\n\t%s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a meeting with @hyoshis to discuss more on UI. Will update this after meeting.

@hemarina hemarina marked this pull request as ready for review February 25, 2026 23:18
Copilot AI review requested due to automatic review settings February 25, 2026 23:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the user experience when extensions fail to start by suppressing verbose error output and surfacing actionable update hints.

Changes:

  • Replaced raw extension stdout/stderr console output with debug-level logging only
  • Added logic to check for available extension updates using cached registry data when startup fails
  • Differentiated timeout vs. real startup errors in failure messaging

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cli/azd/cmd/middleware/extensions.go Routes verbose extension errors to logs; checks for updates on failed extensions; conditionally shows timeout hint
cli/azd/pkg/extensions/update_checker.go Updated warning message to list outdated extensions with upgrade/uninstall commands
cli/azd/pkg/extensions/registry_cache.go Added semver comparison to correctly identify the latest version regardless of registry sort order
cli/azd/extensions/microsoft.azd.concurx/go.mod Updated dependency from eiannone/keyboard to AlecAivazis/survey/v2
cli/azd/extensions/microsoft.azd.concurx/go.sum Dependency checksum updates reflecting the module changes
cli/azd/extensions/azure.ai.agents/go.mod Moved stretchr/testify from indirect to direct dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hemarina
Copy link
Contributor

@copilot Run go mode tidy and update go related files

…ommand and prerelease version detection

Co-authored-by: hemarina <104857065+hemarina@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 26, 2026

@copilot Run go mode tidy and update go related files

Done in 9b88ddb. go mod tidy confirmed cli/azd/go.mod/go.sum are already tidy. Additionally fixed three related issues found during tidy:

  • Reverted unrelated microsoft.azd.concurx/go.mod and go.sum changes to match main
  • Fixed wrong command name azd extension updateazd extension upgrade (the update subcommand doesn't exist)
  • Fixed prerelease version detection bug (GetExtensionLatestVersion was incorrectly falling back to the stable version even when the only newer version was a prerelease); replaced with a simple max-version semver scan
  • Updated tests to match the intended "Fix by running:" hint format

Co-authored-by: hemarina <104857065+hemarina@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Graceful failure when extensions are not updated

5 participants