ide: ignore non-DMA commands in enlightened INT13 path#3040
ide: ignore non-DMA commands in enlightened INT13 path#3040juantian8seattle wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-provided IDE command and rejecting non-DMA commands that would otherwise leave the device in a state where the deferred write never completes.
Changes:
- Add a DMA-command allowlist in
enlightened_hdd_command()and ignore non-DMA commands with a rate-limited warning. - Adjust command handling to use the validated command value throughout the enlightened HDD path.
- Add a regression test that reproduces the prior hang by issuing a PIO command (
READ_SECTORS) via the enlightened port.
Comments suppressed due to low confidence (1)
vm/devices/storage/ide/src/lib.rs:471
WRITE_DMA_FUA_EXTis included in the DMA allowlist, but the 48-bit LBA setup only runs forREAD_DMA_EXT/WRITE_DMA_EXT.WRITE_DMA_FUA_EXTis also a 48-bit command, so skipping the high LBA/sector-count register writes can make the device use stale high bytes and access the wrong LBA. Either handle it in the 48-bit setup path (same register sequence as other *_EXT commands) or remove it from the allowlist if it isn’t supported here.
if cmd == IdeCommand::READ_DMA_EXT || cmd == IdeCommand::WRITE_DMA_EXT {
// 48-bit LBA, high 24 bits of logical block address
self.write_drive_register(
DriveRegister::LbaLow,
(eint13_cmd.lba_low >> 24) as u8,
bus_master_state,
);
self.write_drive_register(
DriveRegister::LbaMid,
eint13_cmd.lba_high as u8,
bus_master_state,
);
self.write_drive_register(
DriveRegister::LbaHigh,
(eint13_cmd.lba_high >> 8) as u8,
bus_master_state,
);
// 48-bit LBA, high 8 bits of sector count
// Write the low-byte of the sector count
self.write_drive_register(
DriveRegister::SectorCount,
(eint13_cmd.block_count >> 8) as u8,
bus_master_state,
);
You can also share your feedback on Copilot code review. Take the survey.
vm/devices/storage/ide/src/lib.rs
Outdated
| for _ in 0..MAX_POLLS { | ||
| let mut cx = std::task::Context::from_waker(std::task::Waker::noop()); |
vm/devices/storage/ide/src/lib.rs
Outdated
| // After fix: non-DMA commands are rejected early | ||
| } | ||
| IoResult::Defer(mut deferred) => { | ||
| // Poll a bounded number of times. With the bug present, | ||
| // this never completes (DRQ stays set after IO finishes). | ||
| const MAX_POLLS: usize = 1024; | ||
| for _ in 0..MAX_POLLS { | ||
| let mut cx = std::task::Context::from_waker(std::task::Waker::noop()); | ||
| ide_device.poll_device(&mut cx); | ||
| if let Poll::Ready(result) = deferred.poll_write(&mut cx) { | ||
| result.unwrap(); | ||
| return; | ||
| } | ||
| } | ||
| panic!( | ||
| "non-DMA command (READ_SECTORS) via enlightened path \ | ||
| didn't complete after {MAX_POLLS} polls -- deferred write stuck" |
chris-oo
left a comment
There was a problem hiding this comment.
LGTM but not sure if you want someone else on the storage side to review.
Was this a real issue we saw in production?
|
Yep, the fuzzer did catch it -- there are 4 OneFuzzer bugs for exactly this crash from fuzz_ide. |
enlightened_hdd_command() does not validate the command byte from guest memory. Non-DMA commands (like READ_SECTORS) produce a PIO buffer that DMA can not drain, so the deferred write completion check never passes and we poll forever. Only allow DMA commands through this path. Anything else gets ignored with a warning, same as the other early-exit cases (drive error, command pending, bad GPA, etc). Add a regression test that reproduces the hang by sending READ_SECTORS via the enlightened port.
277c096 to
2803cc2
Compare
|
@SrinivasShekar, would you take a look? He's the current IDE device expert, so let's get his review before signing off. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-supplied IDE command and rejecting non-DMA commands that would otherwise leave the device stuck with DRQ asserted.
Changes:
- Add a DMA-command allowlist to
enlightened_hdd_command()and early-exit (with a rate-limited warning) for non-DMA commands. - Add a regression test that reproduces the historical hang by issuing
READ_SECTORSvia the enlightened port.
You can also share your feedback on Copilot code review. Take the survey.
| // Now that we know what the IDE command is, disambiguate between | ||
| // 28-bit LBA and 48-bit LBA | ||
| let cmd = eint13_cmd.command; | ||
| if cmd == IdeCommand::READ_DMA_EXT || cmd == IdeCommand::WRITE_DMA_EXT { | ||
| // 48-bit LBA, high 24 bits of logical block address |
vm/devices/storage/ide/src/lib.rs
Outdated
| // After fix: non-DMA commands are rejected early | ||
| } | ||
| IoResult::Defer(mut deferred) => { | ||
| // Poll a bounded number of times. With the bug present, | ||
| // this never completes (DRQ stays set after IO finishes). | ||
| const MAX_POLLS: usize = 1024; | ||
| for _ in 0..MAX_POLLS { | ||
| let mut cx = std::task::Context::from_waker(std::task::Waker::noop()); | ||
| ide_device.poll_device(&mut cx); | ||
| if let Poll::Ready(result) = deferred.poll_write(&mut cx) { | ||
| result.unwrap(); | ||
| return; | ||
| } | ||
| } | ||
| panic!( | ||
| "non-DMA command (READ_SECTORS) via enlightened path \ | ||
| didn't complete after {MAX_POLLS} polls -- deferred write stuck" | ||
| ); | ||
| } | ||
| IoResult::Err(e) => panic!("unexpected error: {e:?}"), |
The test previously accepted both IoResult::Ok and IoResult::Defer. Since the fix rejects non-DMA commands early, assert Ok specifically so a future regression that re-accepts non-DMA commands gets caught. Also removes unused Poll import (clippy unused-qualifications).
|
Addressing review comments: @chris-oo: Found via OneFuzzer (fuzz_ide target), not a production incident. Defense-in-depth fix for a guest-triggerable emulator thread hang. @smalis-msft: Yep, the fuzzer did catch it -- there are 4 OneFuzzer bugs for exactly this crash. WRITE_DMA_FUA_EXT 48-bit path gap (Copilot review): Good catch, but this is a pre-existing bug on Test tightened per review feedback: the test now asserts |
Fixes #3039
enlightened_hdd_command()does not validate the command byte from guest memory. Non-DMA commands (likeREAD_SECTORS) produce a PIO buffer that DMA can not drain, so the deferred write completion check never passes and we poll forever.Only allow DMA commands through this path. Anything else gets ignored with a warning, same as the other early-exit cases (drive error, command pending, bad GPA, etc).
Includes a regression test that reproduces the hang by sending
READ_SECTORSvia the enlightened port.