Skip to content

Switch the destructors implementation for thread locals on Windows to use FLS#148799

Open
ohadravid wants to merge 5 commits intorust-lang:mainfrom
ohadravid:windows-thread-local-dtors-using-fls
Open

Switch the destructors implementation for thread locals on Windows to use FLS#148799
ohadravid wants to merge 5 commits intorust-lang:mainfrom
ohadravid:windows-thread-local-dtors-using-fls

Conversation

@ohadravid
Copy link
Contributor

@ohadravid ohadravid commented Nov 10, 2025

Summary

Switch the thread local destructors implementation on Windows to use the Fiber Local Storage APIs, which provide native support for setting a callback to be called on thread termination, replacing the current tls_callback symbol-based implementation.

Except for some spellchecking, no LLMs were used to produce code / comments / text in this PR.

Current Implementation

On Windows, in order to support thread locals with destructors,
the standard library uses a special tls_callback symbol that is used to call the destructors::run() hook on thread termination.

This has two downsides:

  1. It is not well documented, and seems to cause some problems 1 2 3.
  2. It disallows some synchronization operations, as mentioned in LocalKey's documentation.

as an example of point 2, this code, which uses JoinHandle::join in a thread local Drop impl, will deadlock on stable:

Join-on-Drop Deadlock Example
struct JoinOnDrop(Option<JoinHandle<()>>);

impl Drop for JoinOnDrop {
    fn drop(&mut self) {
        self.0.take().unwrap().join().unwrap();
    }
}

thread_local! {
    static HANDLE: JoinOnDrop = {
        let thread = std::thread::spawn(|| {   
            println!("Starting...");
            // std::thread::sleep(Duration::from_secs(3));
            println!("Done");
        });

        JoinOnDrop(Some(thread))
    };
}


fn main() {
    let thread = std::thread::spawn(|| {
        HANDLE.with(|_| {
            println!("Some other thread");
        })
    });

    thread.join().unwrap();

    println!("Done");
}

Proposed Change

We can use the Fls{Alloc,Set,Get,Free} functions (see https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989)
to implement the dtor callback needed for thread locals that have a Drop implementation.

We allocate a single key, and use its destructor callback to run all the registered destructors when a thread is shutting down.

With this implementation, the above code sample will not deadlock (but it still might not be a good idea to do this!).

Safety and Compatibility

Destructors will only run once: we use the common thread_local + atomic pattern to only set the Fls maker value once. The destructor callback is only called when that value is non-zero, so we are guaranteed that it will only be called once.

Destructors will only run at thread exit: we verify that we are not running in a fiber during the destructors callback. This means that using fibers (which is very rare) will result in thread local being leaked, unless the fiber is converted back to a thread using ConvertFiberToThread before thread termination. This is not ideal, but should be OK as destructors are not guaranteed to run, but it needs to be documented.

  • To be documented (replaces the current note in the docs about synchronization, and should also be noted in the rt module).

It might be possible for the user to use something like the current tls_callback to observe an already-freed thread locals, which is something that can also happen in the current implementation.

Destructors will only run on the correct thread: Fibers cannot be moved between threads.
Destructors will only run on the correct thread: they are registered to a thread_local list, so fiber movement between threads does not matter.

Users cannot observe different locals because they are using fibers: because we only use an Fls local marker to trigger the destructors callback, we don't change anything about how users interact with "normal" thread locals and fiber locals.

Other Notes

The implementation is based on the key::racy and guard::apple code, because we need a LazyKey-like racey static and an enable function.

While TLS slots are limited to 1088,
FLS slots are currently limited to 4000
per process.

Miri

Because miri is aware to the thread local implementation, I also implemented these functions and support for them in the interpreter here:

https://github.com/rust-lang/miri/compare/master...ohadravid:miri:windows-fls-support?expand=1

I guess that this will need to be merged before this PR (if this is accepted) - let me know and I'll open that PR as well.

Targets without target_thread_local

In *-gnu Windows targets, the target_thread_local feature is unavailable.

We could also change the "key" (non-target_thread_local) Windows impl at
library\std\src\sys\thread_local\key\windows.rs
to be based on the Fls functions. I can add it to this PR, or as a separate PR, if you think this is preferable.

Also, I used a Cell in a #[thread_local] to store the resulting key, like the other implementations.
This works, but I'm not sure if this is 100% OK given that we have these targets as well.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. O-windows Operating system: Windows proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants