vmm_tests: fix create_io_queue validation in vmm testing by gurasinghMS · Pull Request #2966 · microsoft/openvmm

gurasinghMS · 2026-03-13T00:17:19Z

The servicing_keepalive_create_io_queue_on_cpu does not validate the creation of the io queue (and previously the queue was not being created because of storvsp operating on a single channel). This PR adds a total of 4 channels to storvsp, one per CPU and then updates the vmm test to verify queue creation.

Copilot

Pull request overview

Fixes the servicing_keepalive_create_io_queue_on_new_cpu VMM test so it reliably exercises and validates NVMe create_io_queue() behavior by ensuring storvsp exposes enough channels for per-CPU IO and by asserting that an IO issuer is created on the targeted CPU.

Changes:

Configure the VM/VTL2 settings to provide multiple SCSI sub-channels (matching VP count) so IO can be issued on all CPUs.
Refactor “find CPUs with IO issuers” inspect parsing into a helper and use it to select a target CPU deterministically.
Add a post-IO assertion that the target CPU gained an IO issuer after pinning IO.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Updates an OpenHCL servicing regression test to reliably trigger and verify NVMe create_io_queue() after keepalive restore by ensuring per-CPU IO can be driven (via additional storvsp channels) and by asserting the IO-issuer appears for the selected CPU.

Changes:

Configure the test VM/storage setup to provide enough SCSI sub-channels for per-CPU IO (matching vp_count).
Refactor CPU-pinned IO + inspect parsing into helper functions and add an explicit post-IO assertion that the target CPU gained an IO issuer.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates an OpenHCL servicing vmm test to actually validate that create_io_queue() runs by forcing I/O on a CPU that initially has no NVMe IO issuer, and adjusts VTL2 SCSI channel settings so the scenario can occur on multi-CPU VMs.

Changes:

Configure VTL2 SCSI sub-channels to allow I/O across all VPs (enabling per-CPU queue creation coverage).
Refactor CPU-pinned I/O and NVMe issuer discovery into helper functions and add an assertion that an issuer appears on the targeted CPU.
Set processor topology (vp_count, vps_per_socket) to make CPU pinning deterministic for this test.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

Copilot

Pull request overview

Improves the reliability of the servicing_keepalive_create_io_queue_on_new_cpu VMM test by ensuring the guest storage stack can create per-CPU IO queues and by explicitly validating that queue/issuer creation occurs after pinned IO.

Changes:

Configure VTL2 SCSI to use multiple channels (sub-channels) so IO can occur on all vCPUs.
Refactor the test to (a) pick a CPU without an existing IO issuer via inspect, (b) run CPU-pinned IO, and (c) assert the IO issuer appears afterward.
Add helper functions to run CPU-pinned IO via cpuset cgroups and to query CPUs with IO issuers via inspect.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

gurasinghMS · 2026-03-13T02:09:21Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

+            // channel which is always present. so vp_count - 1 yeilds a total
+            // of vp_count channels.
+            v.fixed.as_mut().unwrap().scsi_sub_channels = Some(vp_count - 1);
+        })


This approach of creating the Vtl2FixedSettings struct and defining scsi_sub_channels is hacky but, in the moment, the only way I was able to get this working. The struct seems to be defined in some proto files and clicking on here took me to the rust_analyzer folder instead of a crate that we own. Maybe I missed something here though.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Updates an OpenHCL servicing keepalive VMM test to reliably exercise and validate NVMe create_io_queue() on a CPU that initially lacks an IO issuer, addressing a prior gap where the test didn’t actually validate queue creation due to single-channel storvsp behavior.

Changes:

Configure the VM topology explicitly (vp_count, vps_per_socket) and adjust VTL2 SCSI channel configuration to provide one channel per vCPU.
Refactor CPU-pinned IO into a reusable helper and add a helper to query IO issuer CPUs via inspect.
Add post-IO validation that the targeted CPU now has an IO issuer (i.e., queue creation occurred).

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

alandau · 2026-03-13T21:40:52Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

+// Uses inspect to find and return a vector of issuers in the nvme driver. Proto
+// queues are not reported in the returned value.
+async fn find_cpus_with_io_issuers(
+    vm: &PetriVm<OpenVmmPetriBackend>,
+) -> Result<Vec<String>, anyhow::Error> {
+    // Query inspect to find which CPUs have IO issuers after boot.
+    // Only CPUs with initialized issuers appear in the per_cpu map (unset
+    // OnceLock entries are absent). This makes the test deterministic —
+    // we guarantee we target a CPU that triggers create_io_queue().
+    let devices = vm.inspect_openhcl("vm/nvme/devices", None, None).await?;
+    let devices: serde_json::Value = serde_json::from_str(&format!("{}", devices.json()))?;
+    let device = devices
+        .as_object()
+        .expect("devices should be an object")
+        .values()
+        .next()
+        .expect("should have at least one NVMe device");
+    let per_cpu = &device["driver"]["driver"]["io_issuers"]["per_cpu"];
+    let per_cpu_map = per_cpu.as_object().expect("per_cpu should be an object");
+    Ok(per_cpu_map.keys().cloned().collect::<Vec<_>>())


Any reason to NOT return a Vec<u32>? Vec<u32> seems like a natural container for a list of CPUs (or a set), if one ignores the implementation detail that the numbers come as strings from inspect.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Updates an OpenHCL servicing vmm test to reliably exercise and validate NVMe create_io_queue() on a CPU that did not previously have an IO issuer, by ensuring storvsp provides enough channels for per-CPU IO and by asserting issuer creation after pinned IO.

Changes:

Configure processor topology (incl. vps_per_socket: Some(1)) and set scsi_sub_channels to provide one storvsp channel per VP.
Refactor CPU selection logic into an inspect-based helper and add an assertion that pinned IO results in issuer creation on the targeted CPU.
Extract cpuset-cgroup pinned IO into a reusable helper used across both servicing cycles.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

alandau

LGTM, thanks! A few nits, mainly the u32 thing. Also, copilot has a lot to say on this PR, feel free to take or ignore its feedback.

alandau · 2026-03-13T21:40:52Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

+// Uses inspect to find and return a vector of issuers in the nvme driver. Proto
+// queues are not reported in the returned value.
+async fn find_cpus_with_io_issuers(
+    vm: &PetriVm<OpenVmmPetriBackend>,
+) -> Result<Vec<String>, anyhow::Error> {
+    // Query inspect to find which CPUs have IO issuers after boot.
+    // Only CPUs with initialized issuers appear in the per_cpu map (unset
+    // OnceLock entries are absent). This makes the test deterministic —
+    // we guarantee we target a CPU that triggers create_io_queue().
+    let devices = vm.inspect_openhcl("vm/nvme/devices", None, None).await?;
+    let devices: serde_json::Value = serde_json::from_str(&format!("{}", devices.json()))?;
+    let device = devices
+        .as_object()
+        .expect("devices should be an object")
+        .values()
+        .next()
+        .expect("should have at least one NVMe device");
+    let per_cpu = &device["driver"]["driver"]["io_issuers"]["per_cpu"];
+    let per_cpu_map = per_cpu.as_object().expect("per_cpu should be an object");
+    Ok(per_cpu_map.keys().cloned().collect::<Vec<_>>())


Any reason to NOT return a Vec<u32>? Vec<u32> seems like a natural container for a list of CPUs (or a set), if one ignores the implementation detail that the numbers come as strings from inspect.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

Copilot

Pull request overview

This PR makes the servicing_keepalive_create_io_queue_on_cpu VMM test validate that an IO queue is actually created after pinning IO to a CPU that initially lacks an issuer, and adjusts VTL2/SCSI channel configuration to enable per-CPU IO.

Changes:

Configure VTL2 SCSI sub-channels so the guest can perform IO on all vCPUs (one channel per CPU).
Refactor CPU-selection logic into find_cpus_with_io_issuers() and add an assertion that pinning IO results in an issuer on the target CPU.
Extract cpuset-based pinned IO logic into run_cpu_pinned_io().

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR updates the OpenHCL servicing keepalive NVMe vmm test to reliably exercise and validate the NVMe driver’s create_io_queue() path by ensuring StorVSP/SCSI has enough channels to allow IO dispatch across all vCPUs, and by asserting that the targeted CPU gains an IO issuer after pinned IO.

Changes:

Configure VM topology and VTL2 SCSI settings so total SCSI channels match vp_count (enabling IO on all CPUs).
Refactor the test logic to use helpers for (a) finding CPUs with NVMe IO issuers via inspect and (b) running cpuset-pinned IO.
Strengthen validation by asserting inspect reflects IO issuer creation on the targeted CPU after pinned IO.

You can also share your feedback on Copilot code review. Take the survey.

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

github-actions · 2026-03-17T21:47:15Z

At least one Petri test failed.

alandau

LGTM, thanks.

alandau · 2026-03-18T18:15:17Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

+    // Ensure the agent is responsive after the restart before returning.
+    agent.ping().await?;


A general comment, not for this PR: These 2 lines seem like a worthwhile change for any servicing test. Instead of repeating and synchronizing them between all the tests, do we want to create a mini-framework that handles that and all the other pre- and post- things? E.g. one that either gets a callback that runs the test and does the setup and teardown, or (preferably) something like the function that creates the common config that will do all the other pre- and post- things.

alandau · 2026-03-18T18:17:40Z

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs

@@ -1226,7 +1234,7 @@ async fn run_cpu_pinned_io(
    // command didn't complete (the exact production failure scenario).
    let pinned_dd = format!(
        "echo $$ > /sys/fs/cgroup/pin{target_cpu}/cgroup.procs; \
-         dd if={disk_path} of=/dev/null bs=4k count=256 iflag=direct"
+         dd if={disk_path} of=/dev/null bs=4k count=256 iflag=direct status=none"


Do we expect the dd to fail? I guess not, but just double checking. If yes, status=none would hide the stats dd prints in the end telling how much it was able to copy. IIRC, adding it was copilot's suggestion, so up to you what to do here.

It really should not be failing imo. Unless someone is messing around with the test tbh.

gurasinghMS added 2 commits March 12, 2026 17:12

Updated the cpu test to validate proper io queue creation

a9bbdcc

fixed the test

e6a4e48

gurasinghMS changed the title ~~wip: fix io_queue_create validation in vmm testing~~ vmm_tests: fix create_io_queue validation in vmm testing Mar 13, 2026

gurasinghMS marked this pull request as ready for review March 13, 2026 01:22

gurasinghMS requested a review from a team as a code owner March 13, 2026 01:22

Merge branch 'main' into add-scsi-channels-to-test

aa23d99

Copilot AI review requested due to automatic review settings March 13, 2026 01:22

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 01:23 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

gurasinghMS added 2 commits March 12, 2026 18:38

Updated openhcl servicing

40b73fe

Removed unused deps

ca480cc

Copilot AI review requested due to automatic review settings March 13, 2026 01:39

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 01:40 View session

gurasinghMS and others added 2 commits March 12, 2026 18:40

Cleaning stale tracing

7b42b2e

Potential fix for pull request finding

c845434

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 13, 2026

View reviewed changes

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Outdated Show resolved Hide resolved

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Outdated Show resolved Hide resolved

Potential fix for pull request finding

184acca

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 13, 2026 01:54

Fixed error formatting as per copilot

bd230b7

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 01:55 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

gurasinghMS added 2 commits March 12, 2026 18:59

Fixed clippy hopefully

1f6844f

Comment change

fb8ccf2

Copilot AI review requested due to automatic review settings March 13, 2026 02:03

gurasinghMS commented Mar 13, 2026

View reviewed changes

vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 02:04 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

gurasinghMS commented Mar 13, 2026

View reviewed changes

Potential fix for pull request finding

a9a1120

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 13, 2026 02:20

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 02:21 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

gurasinghMS and others added 2 commits March 12, 2026 19:24

using panic now

cb5c0f1

Potential fix for pull request finding

21acc8d

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 13, 2026 02:38

Copilot started reviewing on behalf of gurasinghMS March 13, 2026 02:38 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

alandau reviewed Mar 13, 2026

View reviewed changes

gurasinghMS and others added 2 commits March 17, 2026 11:54

Merge branch 'main' into add-scsi-channels-to-test

f131768

minor nit comment and other fixes to the pr

f2fd441

Copilot AI review requested due to automatic review settings March 17, 2026 19:08

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Apply suggestions from code review

a16327d

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 17, 2026 19:18

Copilot started reviewing on behalf of gurasinghMS March 17, 2026 19:19 View session

Copilot started reviewing on behalf of gurasinghMS March 17, 2026 19:21 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Fixing copilots changes

50272af

alandau approved these changes Mar 18, 2026

View reviewed changes

gurasinghMS merged commit f127d69 into microsoft:main Mar 18, 2026
80 of 82 checks passed

		// Ensure the agent is responsive after the restart before returning.
		agent.ping().await?;

Conversation

gurasinghMS commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gurasinghMS Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alandau Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alandau left a comment

Choose a reason for hiding this comment

Uh oh!

alandau Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!