-
Notifications
You must be signed in to change notification settings - Fork 31
Description
this is from seeing migrate_running_process flake in https://github.com/oxidecomputer/propolis/pull/1023/checks?check_run_id=61509073090 , where the logs from phd-runner are like:
1996 2026-01-26T18:51:34.848Z INFO phd-runner: [WAIT_FOR_SERIAL_OUTPUT - EVENT] Waiting for output on serial console
file = phd-tests/framework/src/test_vm/mod.rs
line = 804
path = phd_tests::migrate::running_process::migrate_running_process
target = phd_framework::test_vm
timeout_duration = Explicit(15s)
vm = migrate_running_process_source
vm_id = b463172b-ae55-498a-9f94-e772d657dc5a
1997 2026-01-26T18:51:49.849Z INFO phd-runner: [WAIT_FOR_SERIAL_OUTPUT - END]
elapsed_milliseconds = 15001
file = phd-tests/framework/src/test_vm/mod.rs
line = 797
path = phd_tests::migrate::running_process::migrate_running_process
target = phd_framework::test_vm
vm = migrate_running_process_source
vm_id = b463172b-ae55-498a-9f94-e772d657dc5a
1998 2026-01-26T18:51:49.849Z DEBG phd-runner: [TEST - EVENT] Killing Propolis server process
file = phd-tests/framework/src/test_vm/server.rs
line = 140
path = phd_tests::migrate::running_process::migrate_running_process
self.address = 127.0.0.1:9002
target = phd_framework::test_vm::server
1999 2026-01-26T18:51:49.858Z INFO phd-runner: [VM CLEANUP - START]
file = phd-tests/framework/src/test_vm/mod.rs
line = 1054
path = phd_tests::migrate::running_process::migrate_running_process
target = phd_framework::test_vm
vm = migrate_running_process_source
vm_id = b463172b-ae55-498a-9f94-e772d657dc5a
2000 2026-01-26T18:51:49.859Z DEBG phd-runner: [SERIAL CONSOLE TASK - EVENT] serial console command channel was closed
file = phd-tests/framework/src/serial/mod.rs
line = 213
target = phd_framework::serial
2001 2026-01-26T18:51:49.859Z INFO phd-runner: [SERIAL CONSOLE TASK - END]
elapsed_milliseconds = 19426
file = phd-tests/framework/src/serial/mod.rs
line = 200
target = phd_framework::serial
2002 2026-01-26T18:51:49.994Z INFO phd-runner: test phd_tests::migrate::running_process::migrate_running_process ... FAILED: deadline has elapsed
Stack backtrace:
0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
at /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/anyhow-1.0.100/src/backtrace.rs:27:14
1: <T as core::convert::Into<U>>::into
at /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/core/src/convert/mod.rs:784:9
2: anyhow::kind::Trait::new
at /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/anyhow-1.0.100/src/kind.rs:95:15
3: phd_framework::test_vm::TestVm::wait_for_serial_output::{{closure}}::{{closure}}
at ./oxidecomputer/propolis/phd-tests/framework/src/test_vm/mod.rs:822:33
4: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
at /home/build/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tracing-0.1.40/src/instrument.rs:321:15
5: phd_framework::test_vm::TestVm::wait_for_serial_output::{{closure}}
at ./oxidecomputer/propolis/phd-tests/framework/src/test_vm/mod.rs:797:5
6: phd_framework::test_vm::TestVm::run_command_sequence::{{closure}}
at ./oxidecomputer/propolis/phd-tests/framework/src/test_vm/mod.rs:933:22
7: phd_framework::test_vm::TestVm::run_shell_command::{{closure}}
at ./oxidecomputer/propolis/phd-tests/framework/src/test_vm/mod.rs:894:53
8: phd_tests::migrate::running_process::mk_dirt::{{closure}}
at ./oxidecomputer/propolis/phd-tests/tests/src/migrate.rs:276:10
where we timed out waiting for the echo of this cat.
what we got instead, in migrate_running_process_source.serial.log (skipping lines before which are all normal) is:
localhost:~# cat >dirt.sh <<'EOF'
> #!/usr/bin/env sh
> # a simple script for testing migration of running process memory.
> #
> # this script will write a bunch of "hello"s to an in-memory string, then
> # suspend to wait for a migration to occur. after the migration happens, the
> # script can be foregrounded, and it will then check that the data is still > # there. > dirt="hello" > len=8192 # 8 KiB ought to be enough for anyone > > # store 8k of data in memory
> i=0
> while [ "$i" -lt "$len" ]
> do
> dirt="$dirt
> hello"
> i=$(( i + 1 ))
> done
>
> echo "made dirt"
>
> # suspend this process and wait for it to be resumed before checking that the
> # data still exists.
> #
> # N.B.: posix sh doesn't have a suspend builtin, but we can make our own!
> kill -s TSTP $$
>
> # check that the data is still correct.
> #
> # we do this by writing it out to a file and then looping through the file,
> # because i wasn't sure how to loop over a variable line by line in posix sh
> # without using a file.
> dirtfile="/tmp/dirt.txt"
> echo "$dirt" > "$dirtfile"
> actual_len=$(wc -l "$dirtfile" | cut -d " " -f 1)
> echo "found $actual_len lines of dirt"
one < "$dirtfile"
EOF>
> # pre-check the file's length
> if [ "$actual_len" -lt "$len" ]
> then
> echo "not enough dirt: $actual_len < $len"
> exit 1
> fi
>
> i=0
> while read -r line
> do
> if [ $i -eq 8192 ]; then
> echo "all good"
> exit 0
> fi
> if [ "$line" != "hello" ]
> then
> echo "bad dirt $i: $line"
> exit 1
> fi
> i=$(( i + 1 ))
> done < "$dirtfile"
so at the end we're missing > EOF. but looking a few lines up, it's there and has a partial duplicate of done < "$dirtfile":
> echo "found $actual_len lines of dirt"
one < "$dirtfile"
EOF>#
> # pre-check the file's length
what gives. this is apparently a flake (rerunning the test went fine) and uncommon (I went back through a bunch of old Propolis CI runs and didn't see anything like it). I also don't see anything that looks like a new prompt in the output, but i think given how long we waited I would have expected a trailing localhost:~# ? I'm fuzzy on how bytes get to serial.log and maybe this is more explainable.