Conversation
f07b921 to
9efcbff
Compare
| comm = MPI.COMM_WORLD | ||
| mpi_size = MPI.Comm_size(comm) | ||
| my_rank = MPI.Comm_rank(comm) | ||
|
|
||
| cores_per_numa = 16 | ||
| threads_per_rank = Threads.nthreads() | ||
| ranks_per_numa = div(cores_per_numa, threads_per_rank) | ||
|
|
||
| # Pin threads so that threads of a MPI rank will be pinned to cores with | ||
| # contiguous IDs. This will ensure that | ||
| # - When running 16 or less threads per rank, all threads will be pinned to the same | ||
| # NUMA region as their master (sharing a memory controller within Infinity fabric) | ||
| # - When running 8 or less threads per rank, all threads will be pinned to the same | ||
| # Core Complex Die | ||
| # - When running 4 or less threads per rank, all threads will be pinned to the same | ||
| # Core Complex (sharing a L3 cache) | ||
|
|
||
| my_numa, my_id_in_numa = divrem(my_rank, ranks_per_numa) .+ (1, 0) | ||
| pinthreads( numa( my_numa, 1:Threads.nthreads() ) .+ threads_per_rank .* my_id_in_numa ) |
There was a problem hiding this comment.
With ThreadPinning v0.7.3 you can use simply pinthreads(:affinitymask)
|
I think the failed CI jobs on nightly build here may have been due to the same upstream problems that were causing issues in #236 (comment) but we've now exceeded the 30 day window for being able to re-run workflows. This otherwise looks good to merge to me other than @giordano's suggestion above to use Also just noticed this is set to merge in to |
|
I don't think this is relevant anymore. If we wanted to look into this it would be worth re-implementing this with the latest version of ThreadPinning.jl. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## scaling_archer2 #233 +/- ##
==================================================
Coverage ? 79.29%
==================================================
Files ? 7
Lines ? 396
Branches ? 0
==================================================
Hits ? 314
Misses ? 82
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I've added thread pinning opimizations (thanks to @giordano) that could improve performance on Archer2. I would like to test the performance with