Switch the destructors implementation for thread locals on Windows to use FLS#148799
Open
ohadravid wants to merge 5 commits intorust-lang:mainfrom
Open
Switch the destructors implementation for thread locals on Windows to use FLS#148799ohadravid wants to merge 5 commits intorust-lang:mainfrom
ohadravid wants to merge 5 commits intorust-lang:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switch the thread local destructors implementation on Windows to use the Fiber Local Storage APIs, which provide native support for setting a callback to be called on thread termination, replacing the current
tls_callbacksymbol-based implementation.Except for some spellchecking, no LLMs were used to produce code / comments / text in this PR.
Current Implementation
On Windows, in order to support thread locals with destructors,
the standard library uses a special
tls_callbacksymbol that is used to call thedestructors::run()hook on thread termination.This has two downsides:
LocalKey's documentation.as an example of point 2, this code, which uses
JoinHandle::joinin a thread local Drop impl, will deadlock on stable:Join-on-Drop Deadlock Example
Proposed Change
We can use the
Fls{Alloc,Set,Get,Free}functions (see https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989)to implement the dtor callback needed for thread locals that have a Drop implementation.
We allocate a single key, and use its destructor callback to run all the registered destructors when a thread is shutting down.
With this implementation, the above code sample will not deadlock (but it still might not be a good idea to do this!).
Safety and Compatibility
Destructors will only run once: we use the common
thread_local+ atomic pattern to only set the Fls maker value once. The destructor callback is only called when that value is non-zero, so we are guaranteed that it will only be called once.Destructors will only run at thread exit: we verify that we are not running in a fiber during the destructors callback. This means that using fibers (which is very rare) will result in thread local being leaked, unless the fiber is converted back to a thread using
ConvertFiberToThreadbefore thread termination. This is not ideal, but should be OK as destructors are not guaranteed to run, but it needs to be documented.rtmodule).It might be possible for the user to use something like the current
tls_callbackto observe an already-freed thread locals, which is something that can also happen in the current implementation.Destructors will only run on the correct thread: Fibers cannot be moved between threads.Destructors will only run on the correct thread: they are registered to a thread_local list, so fiber movement between threads does not matter.
Users cannot observe different locals because they are using fibers: because we only use an Fls local marker to trigger the destructors callback, we don't change anything about how users interact with "normal" thread locals and fiber locals.
Other Notes
The implementation is based on the
key::racyandguard::applecode, because we need aLazyKey-like racey static and anenablefunction.While TLS slots are limited to 1088,
FLS slots are currently limited to 4000
per process.
Miri
Because miri is aware to the thread local implementation, I also implemented these functions and support for them in the interpreter here:
https://github.com/rust-lang/miri/compare/master...ohadravid:miri:windows-fls-support?expand=1
I guess that this will need to be merged before this PR (if this is accepted) - let me know and I'll open that PR as well.
Targets without
target_thread_localIn
*-gnuWindows targets, thetarget_thread_localfeature is unavailable.We could also change the "key" (non-
target_thread_local) Windows impl atlibrary\std\src\sys\thread_local\key\windows.rsto be based on the Fls functions. I can add it to this PR, or as a separate PR, if you think this is preferable.
Also, I used a
Cellin a#[thread_local]to store the resulting key, like the other implementations.This works, but I'm not sure if this is 100% OK given that we have these targets as well.