I have vague recollections of discussing this in the past but I don't see a dedicated issue for this. Currently in the aarch64 backend the JTSequence instruction, a jump table which br_table in wasm uses, includes a csdb instruction for spectre mitigations and preventing speculation. The introduction of this in #4555 ran some benchmarks and found this to have little impact, but I've been made aware locally that this can have a much larger impact on macOS. IIRC this is macOS specific, but I forget.
To reproduce this I was using a coremark.wasm (such as this one) and I found that it prints a score of ~15k by default with wasmtime. I commented out the csdb, re-built wasmtime, ad the score jumped up to ~38k. Effectively, this instruction definitely has a noticable cost on at least macOS.
Do others remember any historical discussion we've had about this? Is this a macOS "bug" fixed in some future version of macOS silicon? Is this something fundamental that we stand by? (in comparison v8 performs over 2x better than Wasmtime on this same benchmark, presumably because it doesn't use csdb but I can't easily find out, but that'd be a least one data point).