Skip to content

ide: ignore non-DMA commands in enlightened INT13 path#3040

Open
juantian8seattle wants to merge 2 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/ide-enlightened-dma-fix
Open

ide: ignore non-DMA commands in enlightened INT13 path#3040
juantian8seattle wants to merge 2 commits intomicrosoft:mainfrom
juantian8seattle:user/juantian/ide-enlightened-dma-fix

Conversation

@juantian8seattle
Copy link
Contributor

Fixes #3039

enlightened_hdd_command() does not validate the command byte from guest memory. Non-DMA commands (like READ_SECTORS) produce a PIO buffer that DMA can not drain, so the deferred write completion check never passes and we poll forever.

Only allow DMA commands through this path. Anything else gets ignored with a warning, same as the other early-exit cases (drive error, command pending, bad GPA, etc).

Includes a regression test that reproduces the hang by sending READ_SECTORS via the enlightened port.

@juantian8seattle juantian8seattle marked this pull request as ready for review March 18, 2026 06:29
@juantian8seattle juantian8seattle requested review from a team as code owners March 18, 2026 06:29
Copilot AI review requested due to automatic review settings March 18, 2026 06:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-provided IDE command and rejecting non-DMA commands that would otherwise leave the device in a state where the deferred write never completes.

Changes:

  • Add a DMA-command allowlist in enlightened_hdd_command() and ignore non-DMA commands with a rate-limited warning.
  • Adjust command handling to use the validated command value throughout the enlightened HDD path.
  • Add a regression test that reproduces the prior hang by issuing a PIO command (READ_SECTORS) via the enlightened port.
Comments suppressed due to low confidence (1)

vm/devices/storage/ide/src/lib.rs:471

  • WRITE_DMA_FUA_EXT is included in the DMA allowlist, but the 48-bit LBA setup only runs for READ_DMA_EXT/WRITE_DMA_EXT. WRITE_DMA_FUA_EXT is also a 48-bit command, so skipping the high LBA/sector-count register writes can make the device use stale high bytes and access the wrong LBA. Either handle it in the 48-bit setup path (same register sequence as other *_EXT commands) or remove it from the allowlist if it isn’t supported here.
        if cmd == IdeCommand::READ_DMA_EXT || cmd == IdeCommand::WRITE_DMA_EXT {
            // 48-bit LBA, high 24 bits of logical block address
            self.write_drive_register(
                DriveRegister::LbaLow,
                (eint13_cmd.lba_low >> 24) as u8,
                bus_master_state,
            );

            self.write_drive_register(
                DriveRegister::LbaMid,
                eint13_cmd.lba_high as u8,
                bus_master_state,
            );

            self.write_drive_register(
                DriveRegister::LbaHigh,
                (eint13_cmd.lba_high >> 8) as u8,
                bus_master_state,
            );

            // 48-bit LBA, high 8 bits of sector count
            // Write the low-byte of the sector count
            self.write_drive_register(
                DriveRegister::SectorCount,
                (eint13_cmd.block_count >> 8) as u8,
                bus_master_state,
            );

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +3018 to +3019
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
Comment on lines +3012 to +3028
// After fix: non-DMA commands are rejected early
}
IoResult::Defer(mut deferred) => {
// Poll a bounded number of times. With the bug present,
// this never completes (DRQ stays set after IO finishes).
const MAX_POLLS: usize = 1024;
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
ide_device.poll_device(&mut cx);
if let Poll::Ready(result) = deferred.poll_write(&mut cx) {
result.unwrap();
return;
}
}
panic!(
"non-DMA command (READ_SECTORS) via enlightened path \
didn't complete after {MAX_POLLS} polls -- deferred write stuck"
@github-actions
Copy link

chris-oo
chris-oo previously approved these changes Mar 18, 2026
Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but not sure if you want someone else on the storage side to review.

Was this a real issue we saw in production?

@smalis-msft
Copy link
Contributor

smalis-msft commented Mar 18, 2026

Yep, the fuzzer did catch it -- there are 4 OneFuzzer bugs for exactly this crash from fuzz_ide.

enlightened_hdd_command() does not validate the command byte from
guest memory. Non-DMA commands (like READ_SECTORS) produce a PIO
buffer that DMA can not drain, so the deferred write completion
check never passes and we poll forever.

Only allow DMA commands through this path. Anything else gets
ignored with a warning, same as the other early-exit cases
(drive error, command pending, bad GPA, etc).

Add a regression test that reproduces the hang by sending
READ_SECTORS via the enlightened port.
Copilot AI review requested due to automatic review settings March 18, 2026 21:44
@juantian8seattle juantian8seattle force-pushed the user/juantian/ide-enlightened-dma-fix branch from 277c096 to 2803cc2 Compare March 18, 2026 21:44
@mattkur
Copy link
Contributor

mattkur commented Mar 18, 2026

@SrinivasShekar, would you take a look? He's the current IDE device expert, so let's get his review before signing off. Thanks!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in the IDE device’s enlightened INT13 fast path by validating the guest-supplied IDE command and rejecting non-DMA commands that would otherwise leave the device stuck with DRQ asserted.

Changes:

  • Add a DMA-command allowlist to enlightened_hdd_command() and early-exit (with a rate-limited warning) for non-DMA commands.
  • Add a regression test that reproduces the historical hang by issuing READ_SECTORS via the enlightened port.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 443 to 446
// Now that we know what the IDE command is, disambiguate between
// 28-bit LBA and 48-bit LBA
let cmd = eint13_cmd.command;
if cmd == IdeCommand::READ_DMA_EXT || cmd == IdeCommand::WRITE_DMA_EXT {
// 48-bit LBA, high 24 bits of logical block address
Comment on lines +3012 to +3031
// After fix: non-DMA commands are rejected early
}
IoResult::Defer(mut deferred) => {
// Poll a bounded number of times. With the bug present,
// this never completes (DRQ stays set after IO finishes).
const MAX_POLLS: usize = 1024;
for _ in 0..MAX_POLLS {
let mut cx = std::task::Context::from_waker(std::task::Waker::noop());
ide_device.poll_device(&mut cx);
if let Poll::Ready(result) = deferred.poll_write(&mut cx) {
result.unwrap();
return;
}
}
panic!(
"non-DMA command (READ_SECTORS) via enlightened path \
didn't complete after {MAX_POLLS} polls -- deferred write stuck"
);
}
IoResult::Err(e) => panic!("unexpected error: {e:?}"),
The test previously accepted both IoResult::Ok and IoResult::Defer.
Since the fix rejects non-DMA commands early, assert Ok specifically
so a future regression that re-accepts non-DMA commands gets caught.

Also removes unused Poll import (clippy unused-qualifications).
@juantian8seattle
Copy link
Contributor Author

Addressing review comments:

@chris-oo: Found via OneFuzzer (fuzz_ide target), not a production incident. Defense-in-depth fix for a guest-triggerable emulator thread hang.

@smalis-msft: Yep, the fuzzer did catch it -- there are 4 OneFuzzer bugs for exactly this crash.

WRITE_DMA_FUA_EXT 48-bit path gap (Copilot review): Good catch, but this is a pre-existing bug on main, not introduced by this PR. Filed as #3061 with a separate fix incoming.

Test tightened per review feedback: the test now asserts IoResult::Ok specifically instead of accepting both Ok and Defer. Pushed as a new commit (f3eda27).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ide: enlightened INT13 path hangs forever on non-DMA commands

7 participants