Skip to content

Debugging: add builtin gdbstub component.#12771

Open
cfallin wants to merge 3 commits intobytecodealliance:mainfrom
cfallin:gdbstub-component
Open

Debugging: add builtin gdbstub component.#12771
cfallin wants to merge 3 commits intobytecodealliance:mainfrom
cfallin:gdbstub-component

Conversation

@cfallin
Copy link
Member

@cfallin cfallin commented Mar 12, 2026

This adds a debug component that makes use of the debug-main world defined in #12756 and serves the gdbstub protocol, with Wasm extensions, compatible with LLDB.

This component is built and included inside the Wasmtime binary, and is loaded using the lower-level -D debugger=... debug-main option; the user doesn't need to specify the .wasm adapter component. Instead, the user simply runs wasmtime run -g <PORT> program.wasm ... and Wasmtime will load and prepare to run program.wasm as the debuggee, waiting for a gdbstub connection on the given TCP port before continuing.

The workflow is:

$ wasmtime run -g 1234 program.wasm
[ wasmtime starts and waits for connection ]

$ /opt/wasi-sdk/bin/lldb  # use LLDB from wasi-sdk release 32 or later
(lldb) process connect --plugin wasm connect://localhost:1234
Process 1 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x40000000000001cc
->  0x40000000000001cc: unreachable
    0x40000000000001cd: end
    0x40000000000001ce: local.get 0
    0x40000000000001d0: call   13
(lldb) si
Process 1 stopped
* thread #1, stop reason = instruction step into
    frame #0: 0x4000000000000184
->  0x4000000000000184: block
    0x4000000000000186: block
    0x4000000000000188: global.get 1
    0x400000000000018e: i32.const 3664
[ ... ]

This makes use of the gdbstub third-party crate, into which I've upstreamed support for the Wasm extensions in daniel5151/gdbstub#188, daniel5151/gdbstub#189, daniel5151/gdbstub#190, and daniel5151/gdbstub#192. (I'll add vets as part of this PR.)

@cfallin cfallin requested review from a team as code owners March 12, 2026 22:45
@cfallin cfallin requested review from dicej and removed request for a team March 12, 2026 22:45
@cfallin
Copy link
Member Author

cfallin commented Mar 12, 2026

This is stacked on top of #12756 until that one lands; only the last commit is new.

I haven't added end-to-end tests that spawn/interact with LLDB yet; depending on how that goes I might be able to include that here or might defer to another PR if that's OK.

cfallin added 2 commits March 13, 2026 12:23
This adds a debug component that makes use of the debug-main world
defined in bytecodealliance#12756 and serves the gdbstub protocol, with Wasm
extensions, compatible with LLDB.

This component is built and included inside the Wasmtime binary, and
is loaded using the lower-level `-D debugger=...` debug-main option;
the user doesn't need to specify the `.wasm` adapter
component. Instead, the user simply runs `wasmtime run -g <PORT>
program.wasm ...` and Wasmtime will load and prepare to run
`program.wasm` as the debuggee, waiting for a gdbstub connection on
the given TCP port before continuing.

The workflow is:

```
$ wasmtime run -g 1234 program.wasm
[ wasmtime starts and waits for connection ]

$ /opt/wasi-sdk/bin/lldb  # use LLDB from wasi-sdk release 32 or later
(lldb) process connect --plugin wasm connect://localhost:1234
Process 1 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0x40000000000001cc
->  0x40000000000001cc: unreachable
    0x40000000000001cd: end
    0x40000000000001ce: local.get 0
    0x40000000000001d0: call   13
(lldb) si
Process 1 stopped
* thread #1, stop reason = instruction step into
    frame #0: 0x4000000000000184
->  0x4000000000000184: block
    0x4000000000000186: block
    0x4000000000000188: global.get 1
    0x400000000000018e: i32.const 3664
[ ... ]
```

This makes use of the `gdbstub` third-party crate, into which I've
upstreamed support for the Wasm extensions in daniel5151/gdbstub#188,
daniel5151/gdbstub#189, daniel5151/gdbstub#190, and
daniel5151/gdbstub#192. (I'll add vets as part of this PR.)
@cfallin cfallin force-pushed the gdbstub-component branch from 34e9d51 to c0c1f02 Compare March 13, 2026 19:23
@cfallin
Copy link
Member Author

cfallin commented Mar 13, 2026

Rebased out #12756; should be good to review now.

@github-actions github-actions bot added the wizer Issues related to Wizer snapshotting, pre-initialization, and the `wasmtime wizer` subcommand label Mar 13, 2026
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "wizer"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wizer

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

authors.workspace = true
edition.workspace = true
license = "Apache-2.0 WITH LLVM-exception"
publish = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok here's a sticking point I didn't realize: this makes the wasmtime-cli crate non-publishable because there's a dependency on something that doesn't exist on crates.io. I'm also realizing now that it's not as simple as publishing this crate since it's fundamentally not publishable, it relies on being part of this workspace to build a sibling crate, which isn't present when built as a dep from crates.io.

The "true fix" for this is https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#artifact-dependencies, an unstable Cargo feature. Unfortunately we can't rely on that even in a nightly-conditional context I believe, if we tried to use that it would mean that Wasmtime would always require nightly.

Some possible ideas:

  • Check in the gdbstub binary and verify it's built in CI. I suspect it's a bit large and will receive many changes, so not my first choice.
  • Download gdbstub from github releases for released wasmtime-cli artifacts. We don't currently have HTTP/network dependencies in the CLI outside of WASI impls, so that'll be hard.
  • Publish this crate to crates.io, but handle failures in the build.rs script. That'd mean that the gdbstub binary would have to be an off-by-default optional feature which we enable for our release builds but would be off-by-default for crates.io.

Well, "some possible ideas" aka I think randomly in real time until something semi-reasonable pops out... Maybe that last one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hmm, I definitely didn't foresee this one being a problem either!

Question on the last option: what do you mean by "handle failures in build.rs"; in other words, why would such a failure be any different than some other build failure for a crate on crates.io (some of which are e.g. crates that wrap C code and use cc, or do other build-time shenanigans)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of two things:

  1. I forget what cwd is used for build.rs so I don't know what effect cargo build -p thing will have when something is published to crates.io. If that tries to build other random crates in a workspace, for example, I think that'd be bad.
  2. I think this would ideally have a better error message than "package thing not found" when built from crates.io, aka something like "you can't enable the gdbstub component when wasmtime is built from crates.io" or similar.

Basically, yeah, build-time weirdness is expected, but I'd like to ideally tame it. Another example would be printing a better error if wasm32-wasip2 weren't installed, but rustc does a decent job of this already.

Comment on lines +20 to +23
let wasm = out_dir
.join("wasm32-wasip2")
.join("release")
.join("wasmtime_internal_gdbstub_component.wasm");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To gauge, how big is this binary? Are we talking ~30M or something more like ~1M? Given that this is included in the CLI uncompressed it might be reasonable to try to apply simple size optimizations where possible if it's extra large.

[dependencies]
wit-bindgen = { workspace = true, features = ["macros"] }
anyhow = { workspace = true }
structopt = { workspace = true }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pulling in structopt, can you use clap? They should be pretty similar and easy to transition between, but my impression is that structopt is more-or-less deprecated in favor of clap

bail!("-g/--gdb cannot be combined with -Ddebugger=");
}
self.run.common.debug.debugger = Some("<built-in gdbstub>".into());
self.run.common.debug.arg.push(format!("0.0.0.0:{port}"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me defaulting to localhost feels more natural, but at the same time if doing so we should probably have the option to handle both 0.0.0.0 and localhost. Maybe the gdbstub_port option is renamed to just gdbstub, and then internally the wasm does parsing to figure out if it's a port or port-and-address?

}
self.run.common.debug.debugger = Some("<built-in gdbstub>".into());
self.run.common.debug.arg.push(format!("0.0.0.0:{port}"));
Some(gdbstub_component_artifact::GDBSTUB_COMPONENT.to_vec())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid .to_vec() on a big thing, could the field in RunCommand change to &'static [u8]?

@alexcrichton
Copy link
Member

Also, to clarify, @cfallin what depth would you like me to review the gdbstub component code itself? I'm happy more-or-less not reviewing it at all in the sense that it's well-sequestered, low-risk, and we'll likely iterate a lot on it in-tree. If you'd prefer though I could give it a closer look in any particular areas of interest.

@cfallin
Copy link
Member Author

cfallin commented Mar 16, 2026

Also, to clarify, @cfallin what depth would you like me to review the gdbstub component code itself? I'm happy more-or-less not reviewing it at all in the sense that it's well-sequestered, low-risk, and we'll likely iterate a lot on it in-tree. If you'd prefer though I could give it a closer look in any particular areas of interest.

I guess my default answer is "to whatever extent allows us to fulfill policy and be comfortable having this code in-repo" :-) I agree that since it's sandboxed, the bar could be lower than for core runtime code. I guess the spirit of our code-review policies is still that someone should give it a once-over -- but up to you how deep you take that!

Comment on lines +91 to +94
let listener = TcpListener::bind(&self.options.tcp_address)
.await
.expect("Could not bind to TCP port");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what QEMU does, but this might be a reasonable place to print "Hey I'm listening on address A.B.C.D:XXXX for a debugger" both to signify that's why the process is halted and also be a sort of "join point" for "if you're a test, now's when you can make your TCP connection" synchronization point.

Comment on lines +123 to +127
if inner.borrow_conn().flush().await.is_err() {
// Connection closed or other outbound error.
break 'mainloop;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may want to print something about an error perhaps? Otherwise the guest is still running and it may be "violently deleted" when dropping

Comment on lines +317 to +333
impl Connection for Conn {
type Error = anyhow::Error;

fn write(&mut self, byte: u8) -> std::result::Result<(), Self::Error> {
self.buf.push(byte);
Ok(())
}

fn flush(&mut self) -> std::result::Result<(), Self::Error> {
// We cannot flush synchronously; we leave this to the `async
// fn flush` method called within the main loop. Fortunately
// the gdbstub cannot wait for a response before returning to
// the main loop, so we cannot introduce any deadlocks by
// failing to flush synchronously here.
Ok(())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for you: how come these aren't implemented with blocking I/O? My rough assumption is that blocking I/O is expected while gdbstub is doing stuff and the guest is halted, and the only async-y bit is "wait for I/O on the TCP connection or the guest to hit an event"

Err(TargetError::NonFatal)
}

#[inline(always)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray debugging annotation?

Comment on lines +243 to +273
fn add_sw_breakpoint(&mut self, addr: u64, _kind: usize) -> TargetResult<bool, Self> {
let Some(wasm_addr) = WasmAddr::from_raw(addr) else {
return Ok(false);
};
let debuggee = self.debuggee;
if let AddrSpaceLookup::Module { module, .. } = self.addr_space.lookup(wasm_addr, debuggee)
{
module
.add_breakpoint(debuggee, wasm_addr.offset())
.map_err(|_| TargetError::NonFatal)?;
Ok(true)
} else {
Ok(false)
}
}

fn remove_sw_breakpoint(&mut self, addr: u64, _kind: usize) -> TargetResult<bool, Self> {
let Some(wasm_addr) = WasmAddr::from_raw(addr) else {
return Ok(false);
};
let debuggee = self.debuggee;
if let AddrSpaceLookup::Module { module, .. } = self.addr_space.lookup(wasm_addr, debuggee)
{
module
.remove_breakpoint(debuggee, wasm_addr.offset())
.map_err(|_| TargetError::NonFatal)?;
Ok(true)
} else {
Ok(false)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this perhaps support dual setting breakpoints on mapped addresses and seeing breakpoints on raw addresses in the code section?

cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 17, 2026
…g forward to first opcode.

LLDB, when instructed to `break main`, looks at the DWARF metadata for
`main` and finds its PC range, then sets a breakpoint at the first
PC. This is reasonable behavior for native ISAs! That PC better be a
real instruction!

On Wasm, however, (i) toolchains typically emit the PC range as
*including* the *locals count*, a leb128 value that precedes the first
opcode and any types of locals; (ii) our gdbstub component that
bridges LLDB to our debug APIs (bytecodealliance#12771) only supports *exact* PCs for
breakpoints, so when presented with a PC that does not actually point
to an opcode, setting the breakpoint is effectively a no-op. There
will always be a difference of at least 1 byte between the
start-of-function offset and first-opcode offset (for a leb128 of `0`
for no locals), so a breakpoint "on" a function will never work.

I initially prototyped a fix that adds a sequence point at the start
of every function (which, again, is *guaranteed* to be distinct from
the first opcode), and the branch is [here], but I didn't like the
developer experience: this meant that when a breakpoint at a function
start fired, LLDB had a weird interstitial state where no line-number
applied.

The behavior that would be closer in line with "native" debug
expectations is that we add a bit of fuzzy-ish matching: setting a
breakpoint at function start should break at the first opcode, even if
that's a few (or many) bytes later. There are two options here:
special-case function start, or generally change the semantics of our
breakpoint API so that "add breakpoint at `pc`" means "add breakpoint
at next opcode at or after `pc`". I opted for the latter in this PR
because it's more consistent.

The logic is a little subtle because we're effectively defining an
n-to-1 mapping with this "snap-to-next" behavior, so we have to
refcount each breakpoint (consider setting a breakpoint at function
start *and* at the first opcode, then deleting them, one at a time). I
believe the result is self-consistent, even if a little more
complicated now. And, importantly, with bytecodealliance#12771 on top of this change,
it produces the expected behavior for the (very simple!) debug script
"`b main`; `continue`".

[here]: https://github.com/cfallin/wasmtime/tree/breakpoint-at-func-start
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wizer Issues related to Wizer snapshotting, pre-initialization, and the `wasmtime wizer` subcommand

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants