igvmfilegen: add diff subcommand for IGVM reproducibility investigations#3033
igvmfilegen: add diff subcommand for IGVM reproducibility investigations#3033justus-camp-microsoft wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an igvmfilegen diff subcommand to help investigate IGVM reproducibility issues by extracting IGVM contents into structured temp directories and invoking diffoscope on the extracted trees.
Changes:
- Add
Diffsubcommand wiring toigvmfilegenCLI. - Implement IGVM extraction + region naming/coalescing +
diffoscopeinvocation in a newdiffmodule. - Document the new tool/subcommand in the Guide and link it from
SUMMARY.md.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/loader/igvmfilegen/src/main.rs | Adds Diff subcommand to CLI and dispatches to the new diff implementation. |
| vm/loader/igvmfilegen/src/diff.rs | New extraction/coalescing logic and diffoscope runner for comparing IGVM outputs. |
| vm/loader/igvmfilegen/Cargo.toml | Adds tempfile dependency for managing extracted temp directories. |
| Guide/src/dev_guide/dev_tools/igvmfilegen.md | New documentation page describing igvmfilegen diff usage and workflows. |
| Guide/src/SUMMARY.md | Adds the new igvmfilegen doc page to the Guide table of contents. |
| Cargo.lock | Locks tempfile as a dependency due to the new module. |
| let component = lookup_map_name(map, *gpa).unwrap_or("unmapped").to_string(); | ||
| // Deduplicate pages at the same GPA (different compatibility masks | ||
| // produce duplicate PageData entries with identical content). | ||
| if seen_gpas.insert(*gpa) { | ||
| page_data_entries.push(PageDataEntry { |
| /// Parse an IGVM .map file, extracting all layout entries across all isolation sections. | ||
| /// Deduplicates by (start_gpa, end_gpa). | ||
| fn parse_map_file(path: &Path) -> anyhow::Result<Vec<MapEntry>> { | ||
| let content = fs_err::read_to_string(path).context("reading map file")?; | ||
| let mut entries: Vec<MapEntry> = Vec::new(); | ||
| let mut seen: std::collections::HashSet<(u64, u64)> = std::collections::HashSet::new(); | ||
| let mut in_layout = false; | ||
|
|
|
|
||
| //! Implements IGVM file diffing by extracting constituent parts and running diffoscope. | ||
|
|
||
| use anyhow::Context; |
There was a problem hiding this comment.
If this isn't actually using any other parts of igvmfilegen maybe it should be in a separate crate. After all disassembling files is really the opposite of generating them.
There was a problem hiding this comment.
That's fine, was trying to avoid splitting out a new crate but maybe while I'm at it I can move the Dump command to whatever I name the new crate too.
There was a problem hiding this comment.
tbh i think it's fine to leave in here for now.
There was a problem hiding this comment.
Nah I'd prefer Justus's idea, We should separate out igvm-creation and igvm-processing if they're not sharing any code (beyond the igvm crate itself)
This adds a diff command to the igvmfilegen tool to help with current and future investigations into issues with IGVM reproducibility. This tool uses the output IGVM .bin and .map files to deconstruct it into its individual parts and then passes them through diffoscope. The tool assumes you have diffoscope installed.
Disclaimer - this tool was mostly written by copilot but in my experience in using it to diagnose reproducibility issues it's been super useful.