Add structured error handling to azure.ai.agents and update service error mapping#6901
Conversation
15806a1 to
81577eb
Compare
81577eb to
63cebb1
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive structured error handling to the azure.ai.agents extension and improves telemetry classification for extension service errors. The changes introduce centralized error codes, helper functions for creating structured errors with suggestions, and better telemetry result codes based on operation+errorCode rather than host+statusCode.
Changes:
- Introduces centralized error code constants in
exterrors/codes.gocovering cancellation, validation, dependency, auth, compatibility, and internal error scenarios - Adds helper functions (
Validation(),Dependency(),Auth(), etc.) inexterrors/local_error.gofor creating structured errors with user-facing suggestions - Updates telemetry error mapping to use
ext.service.<operation>.<errorCode>format instead ofext.service.<host>.<statusCode>for better actionability - Simplifies extension main.go to use
azdext.Run()helper for lifecycle boilerplate - Updates
normalizeCodeSegment()to preserve dot-separated error codes while sanitizing each segment
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/internal/cmd/errors.go | Updates ServiceError telemetry mapping to use operation.errorCode format; refactors normalizeCodeSegment to handle dot-separated codes |
| cli/azd/internal/cmd/errors_test.go | Adds comprehensive test cases for new ServiceError telemetry format with various error scenarios |
| cli/azd/extensions/azure.ai.agents/main.go | Simplifies main function to use azdext.Run() helper instead of manual boilerplate |
| cli/azd/extensions/azure.ai.agents/internal/exterrors/codes.go | Defines centralized error code constants organized by category (validation, dependency, auth, etc.) |
| cli/azd/extensions/azure.ai.agents/internal/exterrors/local_error.go | Implements helper functions for creating structured LocalError and ServiceError instances with suggestions |
| cli/azd/extensions/azure.ai.agents/internal/exterrors/azd_host_error.go | Wraps gRPC errors from azd host services into structured LocalError with ErrorInfo reason preservation |
| cli/azd/extensions/azure.ai.agents/internal/project/service_target_agent.go | Updates error handling throughout to use structured errors with appropriate codes and suggestions |
| cli/azd/extensions/azure.ai.agents/internal/cmd/init.go | Adds structured error handling with cancellation detection and appropriate error codes |
| cli/azd/extensions/azure.ai.agents/internal/cmd/init_from_code.go | Implements structured error handling with cancellation detection for init workflow |
| cli/azd/extensions/azure.ai.agents/internal/cmd/init_models.go | Wraps azd host service errors using FromAzdHost helper |
| cli/azd/extensions/azure.ai.agents/go.mod | Updates dependencies including azd, azcore, and various indirect dependencies |
| cli/azd/extensions/azure.ai.agents/go.sum | Corresponding checksum updates for go.mod changes |
| cli/azd/extensions/azure.ai.agents/extension.yaml | Updates requiredAzdVersion to >1.23.6 to match new dependency requirements |
| cli/azd/extensions/azure.ai.agents/cspell.yaml | Adds extension-specific spell check dictionary |
| cli/azd/.vscode/cspell.yaml | Removes extension-specific overrides now handled in extension's own cspell.yaml |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Resolves #6728
Resolves #6807
This PR adds structured error handling and errors with suggestions to
azure.ai.agents, and improves telemetry classification for extension service errors.azure.ai.agents extension
A centralized set of error codes is introduced in
exterrors/codes.goto classify every error the extension can produce. Each code maps to a specific failure mode for telemetry and user-facing messages:cancelledinvalid_agent_manifest,invalid_manifest_pointer,invalid_project_resource_id,invalid_foundry_resource_id,invalid_ai_project_id,invalid_service_config,invalid_agent_request,unsupported_host,unsupported_agent_kind,missing_agent_kind,agent_definition_not_found,subscription_mismatch,location_mismatch,missing_published_container_artifact,scaffold_template_failedproject_not_found,project_init_failed,environment_not_found,environment_creation_failed,environment_values_failed,missing_ai_project_endpoint,missing_ai_project_id,missing_azure_subscription_id,missing_agent_env_vars,github_download_failedcredential_creation_failed,tenant_lookup_failedincompatible_azd_versionazd_client_failed,cognitiveservices_client_failed,container_start_failed,container_start_timeoutSeveral errors now include suggestions, using the new error suggestion UX from #6827:
Sample trace
{ "Name": "cmd.deploy", "Attributes": [ { "Key": "cmd.entry", "Value": { "Type": "STRING", "Value": "cmd.deploy" } }, { "Key": "error.service.name", "Value": { "Type": "STRING", "Value": "ai" } }, { "Key": "error.service.host", "Value": { "Type": "STRING", "Value": "services.ai.azure.com" } }, { "Key": "error.service.statusCode", "Value": { "Type": "INT64", "Value": 400 } }, { "Key": "error.service.errorCode", "Value": { "Type": "STRING", "Value": "start_container.invalid_payload" } } ], "Status": { "Code": "Error", "Description": "ext.service.start_container.invalid_payload" } }Sample output
Telemetry classification improvements
This PR also updates extension service error result codes from
ext.service.<host>.<statusCode>(e.g.ext.service.ai.400) toext.service.<operation>.<errorCode>(e.g.ext.service.start_container.invalid_payload), providing actionable classification instead of grouping unrelated failures together. Service name (ai), host (ai.azure.com), and status code (400) remain available as span attributes and would still be queryable in Kusto.Validation
Tested combination of versions of azd core and the extensions (with/without change and older versions) and verified errors and tracing are properly rendered and reported.