Skip to content

More maven parsing fix for experiment detector#1724

Merged
zhenghao104 merged 9 commits intomainfrom
users/zhentan/more-maven-fix-v2
Mar 24, 2026
Merged

More maven parsing fix for experiment detector#1724
zhenghao104 merged 9 commits intomainfrom
users/zhentan/more-maven-fix-v2

Conversation

@zhenghao104
Copy link
Contributor

@zhenghao104 zhenghao104 commented Mar 16, 2026

PR: Maven Detection Improvements - Variable Resolution & Centralized Cleanup

Summary

This PR addresses two critical issues in the Maven detection system:

  1. Variable resolution race conditions causing incorrect version resolution in multi-module Maven projects
  2. Premature file cleanup causing race conditions between MvnCliComponentDetector and MavenWithFallbackDetector

Problem Statement

Issue 1: Variable Resolution Order Dependency

Note: This issue exists in the current detector's implementation—it is not a new bug introduced by this PR. This PR fixes the existing behavior.

Before: The variable resolution logic processed POM files in arbitrary order and used a flat ConcurrentDictionary<string, string> for variable storage. When the same variable (e.g., ${revision}) was defined in multiple POM files (parent and child), the resolution result depended on processing order—whichever file was processed last would overwrite the variable value.

Example scenario:

parent/pom.xml:     <revision>1.0.0</revision>
child/pom.xml:      <revision>2.0.0</revision>  (inherits from parent)

If parent was processed after child, the child's dependency would incorrectly resolve to 1.0.0 instead of 2.0.0 (Maven's rule: child properties take precedence over parent).

Issue 2: Premature bcde.mvndeps File Cleanup

Note: This race condition only occurs during the experimental phase when both MvnCliComponentDetector and MavenWithFallbackDetector run concurrently. Once MavenWithFallbackDetector is promoted from experimental to production and replaces MvnCliComponentDetector, only one Maven detector will run and the centralized cleanup will no longer be necessary (though it remains harmless).

Before: Each detector (MvnCliComponentDetector and MavenWithFallbackDetector) independently tracked and cleaned up bcde.mvndeps files it created. Since both detectors run concurrently and share the same temporary files, one detector could delete files before the other finished reading them.

Technical Changes

1. Hierarchy-Aware Variable Resolution

New approach: Variables are now stored with file path context:

// Old: flat storage (order-dependent)
collectedVariables["revision"] = "1.0.0";

// New: file-scoped storage
collectedVariables["C:\project\parent\pom.xml::revision"] = "1.0.0";
collectedVariables["C:\project\child\pom.xml::revision"] = "2.0.0";

Resolution algorithm:

  1. Build Maven parent-child relationships based on <parent> declarations
  2. When resolving ${variable} for a file, walk up the Maven hierarchy (child → parent → grandparent)
  3. Return the first match found (implements Maven's "closest ancestor wins" rule)

Key methods:

  • FindVariableInMavenHierarchy() - Walks Maven parent chain to find variable definition
  • ResolveVersionWithHierarchyAwareness() - Resolves version templates using hierarchy search
  • ResolveDeferredParentRelationships() - Second pass to resolve parent relationships when POMs are processed out-of-order

2. Three-Pass Processing Model

The static parser now uses three passes:

Pass Purpose
First Collect variables from all POM files, establish parent relationships where possible
Second Resolve deferred parent relationships (for out-of-order processing)
Third Resolve pending components using hierarchy-aware variable lookup

3. Centralized Cleanup in DetectorProcessingService

Before: Cleanup was distributed across detectors

// In MvnCliComponentDetector
private readonly HashSet<string> _createdFiles = new();
// Each detector tracks and deletes its own files

After: Cleanup moved to DetectorProcessingService.CleanupMavenFiles()

// After ALL detectors finish:
if (settings.CleanupCreatedFiles ?? true)
{
    this.CleanupMavenFiles(settings.SourceDirectory, detectors);
}

Benefits:

  • No race condition between detectors
  • Single responsibility for cleanup
  • Respects CleanupCreatedFiles setting
  • Uses symlink-safe directory traversal (same pattern as FastDirectoryWalkerFactory)

4. Shared Constants

Created MavenConstants.cs to keep detector IDs and filenames in sync:

public static class MavenConstants
{
    public const string BcdeMvnDependencyFileName = "bcde.mvndeps";
    public const string MvnCliDetectorId = "MvnCli";
    public const string MavenWithFallbackDetectorId = "MavenWithFallback";
}

5. Telemetry Improvements

  • Added UnresolvedVariableCount telemetry (aggregated count instead of per-variable warnings)
  • Fixed PendingComponentCount to capture count before draining the queue
  • Changed verbose logs to debug level to reduce noise for customers

6. Symlink-Safe File Enumeration

The cleanup logic now uses the same symlink handling pattern as FastDirectoryWalkerFactory:

  • Detects reparse points (symlinks/junctions)
  • Resolves physical paths via IPathUtilityService
  • Tracks visited directories by real path to prevent infinite loops
  • Wraps symlink resolution in try/catch to handle broken symlinks gracefully

Behavioral Changes

Aspect Before After
Variable resolution Order-dependent, flat storage Hierarchy-aware, file-scoped
Multi-module projects Could resolve wrong version Correctly follows Maven inheritance
File cleanup Per-detector, race-prone Centralized, after all detectors complete
Circular symlinks Could cause infinite loop in cleanup Safely handled with visited tracking
Telemetry Per-variable warnings, incorrect pending count Aggregated counts, accurate metrics

Testing

  • All existing unit tests pass (800+ MavenWithFallbackDetector tests, 140+ DetectorProcessingService tests)
  • Verified on multiple repositories locally:
  • Component counts match between old and new implementations
  • No regression in detection accuracy

Files Changed

File Changes
MavenWithFallbackDetector.cs Hierarchy-aware variable resolution, three-pass processing, telemetry fixes
MavenConstants.cs New - shared constants for detector IDs and filename
MavenCliCleanupTelemetryRecord.cs New - telemetry for cleanup operations
DetectorProcessingService.cs Centralized cleanup with symlink-safe enumeration
MavenCommandService.cs Use shared constants
MvnCliComponentDetector.cs Use shared constants, removed file tracking
Test files Updated to use shared constants, removed timing-based assertions

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the experimental MavenWithFallbackDetector to improve static POM parsing behavior by introducing “smart” POM preloading intended to help cross-document property/version resolution during fallback parsing.

Changes:

  • Added per-scan state to support preloading all POM XmlDocuments for variable resolution.
  • Wired preloading into the fallback path and variable-replacement path to enable cross-document lookups.

@zhenghao104 zhenghao104 force-pushed the users/zhentan/more-maven-fix-v2 branch from 981dd8d to 6f27926 Compare March 16, 2026 23:52
@github-actions
Copy link

github-actions bot commented Mar 16, 2026

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

  • The detector detects more or fewer components than before
  • The detector generates different parent/child graph relationships than before
  • The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Copilot AI review requested due to automatic review settings March 17, 2026 23:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Maven detector flow to avoid .mvndeps file cleanup races by deferring cleanup until detector execution completes, and enhances the experimental Maven fallback detector’s static parsing to better handle version variables.

Changes:

  • Add orchestrator-level cleanup of Maven .mvndeps files after all detectors finish.
  • Refactor Maven detectors to remove per-file reader coordination and introduce a two-pass static POM parsing approach (collect variables, then resolve pending components).
  • Add orchestrator unit tests covering Maven cleanup behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests validating .mvndeps cleanup behavior depending on whether Maven detectors ran.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds post-processing cleanup routine for Maven .mvndeps files after detector execution completes.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record type to queue unresolved Maven components for second-pass resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes reader registration/unregistration logic and tweaks debug logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Implements two-pass variable collection + pending component resolution and updates telemetry/logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader coordination and deletion logic from the command service.
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader coordination APIs from the service interface.

Copilot AI review requested due to automatic review settings March 18, 2026 01:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Maven detector behavior to avoid Maven CLI dependency-file cleanup races, and improves MavenWithFallback static parsing by deferring variable-based version resolution to a second pass.

Changes:

  • Add orchestrator-level cleanup of Maven CLI *.mvndeps files after all detectors complete.
  • Rework MavenWithFallbackDetector static parsing to collect variables first and resolve variable-based versions in a second pass (via PendingComponent).
  • Remove Maven CLI file-reader registration/unregistration coordination APIs from IMavenCommandService/MavenCommandService, and update detectors accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests asserting Maven deps files are deleted only when Maven detectors run.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds post-scan Maven deps file cleanup in the orchestrator.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record to hold unresolved Maven components for second-pass processing.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes per-file reader unregister logic; adds additional debug logging.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Implements two-pass variable collection + resolution for static parsing.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader reference counting/cleanup logic.
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader coordination methods from the interface.

Copilot AI review requested due to automatic review settings March 20, 2026 22:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens Maven detection behavior by (1) improving MavenWithFallback’s static POM parsing to resolve variables across parent/child hierarchies using multi-pass processing, and (2) moving Maven CLI .mvndeps cleanup to the orchestrator after all detectors complete to avoid cross-detector races.

Changes:

  • Add orchestrator-level cleanup of Maven CLI-generated dependency files after detector execution.
  • Implement multi-pass variable collection + hierarchy-aware resolution in MavenWithFallbackDetector and add a PendingComponent model for deferred registrations.
  • Expand test coverage for variable resolution edge cases and new cleanup behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DetectorProcessingServiceTests.cs Adds tests validating post-processing cleanup of .mvndeps files depending on which detectors ran.
test/Microsoft.ComponentDetection.Detectors.Tests/MavenWithFallbackDetectorTests.cs Adds many tests covering parent/child/sibling variable resolution, deferred resolution, and traversal robustness.
src/Microsoft.ComponentDetection.Orchestrator/Services/DetectorProcessingService.cs Adds CleanupMavenFilesAsync and invokes it after all detectors finish.
src/Microsoft.ComponentDetection.Detectors/maven/PendingComponent.cs Introduces a record to hold unresolved components for second/third-pass resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MvnCliComponentDetector.cs Removes per-file reader unregister logic and adds debug logging around contribution counts.
src/Microsoft.ComponentDetection.Detectors/maven/MavenWithFallbackDetector.cs Introduces multi-pass parsing with concurrent state structures, hierarchy tracking, and deferred resolution.
src/Microsoft.ComponentDetection.Detectors/maven/MavenCommandService.cs Removes file-reader tracking/cleanup coordination APIs (cleanup is now orchestrator-driven).
src/Microsoft.ComponentDetection.Detectors/maven/IMavenCommandService.cs Removes file-reader tracking/cleanup coordination API from the interface.

Copilot AI review requested due to automatic review settings March 24, 2026 00:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

@zhenghao104 zhenghao104 merged commit 62ce022 into main Mar 24, 2026
27 of 29 checks passed
@zhenghao104 zhenghao104 deleted the users/zhentan/more-maven-fix-v2 branch March 24, 2026 18:31
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.0%. Comparing base (36fca68) to head (143781c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff      @@
##   main   #1724   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants