feat(designer): Agent evaluations tab#8932
Conversation
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
✅ Risk Level
❌ What & Why
✅ Impact of Change
❌ Test Plan
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | Keep as-is |
| Commit Type | ✅ | OK |
| Risk Level | ✅ (declared: Medium) | Consider raising to High or add tests to keep Medium. |
| What & Why | ✅ | OK; optionally add backend rollout note |
| Impact of Change | ✅ | OK; mention feature flag or migration if any |
| Test Plan | ❌ | Add unit/E2E tests or a justified manual test plan |
| Contributors | ✅ | Add other contributors if applicable |
| Screenshots/Videos | Add screenshots/screencast of UI changes |
Final notes & action items
- This PR cannot pass the PR body/title compliance check because the Test Plan is empty for a significant functional change. Please update the PR body Test Plan to either:
- Include unit tests + E2E/integration tests (preferred) and check the corresponding boxes, or
- Provide a clear, concrete manual testing plan and explain why automated tests are not feasible in this PR (and create a follow-up issue with a timeline for adding automated tests).
- Add screenshots/screencast demonstrating the new Evaluate tab and key panels.
- If you want to keep the risk level at Medium, please add the tests above. Otherwise, change label to
risk:highand add a short justification for the higher risk (e.g., "no automated tests in this PR").
Please update the PR title/body with the requested Test Plan and screenshots, add the unit/E2E tests (or a clear manual test justification), and then re-submit. Thank you for the thorough feature implementation — once tests and screenshots are added this will be much closer to merge-ready.
Last updated: Tue, 17 Mar 2026 18:17:19 GMT
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | Keep as-is or slightly expand for clarity |
| Commit Type | ✅ | OK |
| Risk Level | Recommend bump to risk:high and update label |
|
| What & Why | ✅ | Good; optionally mention high-level files changed |
| Impact of Change | ❌ | Expand to list system-level impacts and API changes |
| Test Plan | ❌ | Add unit/E2E tests or a detailed manual test plan |
| Contributors | ✅ | OK; add others if applicable |
| Screenshots/Videos | Add visual proof for UI changes |
Summary:
This PR introduces a large feature set (new evaluation UI, new redux slice, queries, models, and a new StandardEvaluationService). Because this touches core libraries, the store, service initialization, and adds network/API interactions, I recommend raising the risk to High (please update label) and adding tests or a detailed manual test plan. At present, the PR does NOT pass the PR body checklist because the Test Plan is empty — please add automated tests or a robust manual testing section and address the risk label.
Please update the PR title/body with the following specific items and then re-submit:
- Risk label: change to
risk:high(comment in PR explaining why: touches core libs/store/services/API). - Test Plan: either add test files (unit tests for evaluationSlice, queries, EvaluateView components; integration/E2E flow that covers create/run evaluation) OR add a detailed manual testing section with step-by-step instructions and expected results.
- Impact of Change: expand to describe system/backend/API impacts (new endpoints, potential runtime/cost), and any migration steps (none seen — if none, explicitly state so).
- Screenshots/Videos: include a screenshot of the Evaluate tab, the create evaluator form, and an evaluation result (or a short demo GIF).
Thank you for the thorough implementation. Once tests/manual test plan and the risk label are addressed, this will be in much better shape for merging.
Helpful file-specific test suggestions:
- libs/designer-v2/src/lib/core/state/evaluation/evaluationSlice.ts -> unit tests for reducer actions and reset behavior.
- libs/designer-v2/src/lib/core/queries/evaluations.ts -> mock EvaluationService and test query keys, enabled/disabled logic, and onSuccess invalidations for mutations.
- libs/logic-apps-shared/src/designer-client-services/lib/standard/evaluation.ts -> unit tests for URL/HTTP calls using a mocked IHttpClient.
- EvaluateView & panels -> component tests for rendering states (empty, loading, error, result) and form submission flows (EvaluatorFormPanel).
Please update and ping reviewers when ready. Thank you!
Last updated: Tue, 17 Mar 2026 17:31:38 GMT
|
📊 Coverage check completed. See workflow run for details. |
…vice, update views
Commit Type
Risk Level
What & Why
Add agent evaluations functionality in a new designer tab. Allows users to evaluate A2A/agentic workflow runs using a predefined set of evaluators (tool call trajectory, semantic similarity, custom prompt). All evaluators either use reference runs as ground truth or a separate evaluator model as a judge.
Impact of Change
Test Plan
Contributors
@andrew-eldridge
Screenshots/Videos