[PATCH] D138747: [Support] On Windows 11, fix an affinity mask issue on large core count machines

Sat Nov 26 11:52:49 PST 2022

aganea created this revision.
aganea added reviewers: mehdi_amini, rnk, MaskRay, wjschmidt, saudi, thieta.
Herald added subscribers: StephenFan, hiraditya.
Herald added a project: All.
aganea requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Recent Windows 11 and Windows Server 2022 changed the way they assign 'processor groups' to a starting PE. Before Windows 11 and Windows Server2022, only one processor group was assigned by default, then the program was responsible for dispatching its own threads on more processor groups. This is what D71775 <https://reviews.llvm.org/D71775> was doing, allowing LLVM programs use all threads on many cores machines.

After Windows 11 and Windows Server 2022, the OS takes care of that. This has an adverse effect reported in PR56618 <https://github.com/llvm/llvm-project/issues/56618> which is that using `::GetProcessAffinityMask()` API in some edge cases seems buggy now. That API was used to detect if an affinity mask was set, and adjust accordingly the available threads for a `ThreadPool`.

The bug seems to be a TOCTOU, the OS assigns a default affinity mask on the process group where the PE is started, however later the PE's `main()` thread runs on a different-sized process group. On Windows, the max size for an affinity mask is 64. In our case, when running on a `n2d-highcpu-224` GCE instance, we're seeing 4 processor groups, 2 of size 64 and 2 others of size 48, which makes a total 224 vCPUs. The Windows OS randomly assigns a starting process mask of either `(2^64)-1` or `(2^48)-1` bits. In some edge cases, the thread calling `::GetProcessAffinityMask()` randomly runs on a different `process group`, thus making hard for a program to determine if a custom affinity mask was set or not. This wasn't happening before Windows 11, since only a process group was used on PE startup, even on machines with asymmetric processor groups.

With this patch, one one hand, on Windows 11 & Windows Server 2022 we disable manual dispatching of threads on processor groups, and instead let the Windows OS do that. On the other hand, a workaround was added to mitigate the issue described above (see Threading.inc, L226).

Fixes PR56618 <https://github.com/llvm/llvm-project/issues/56618>.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138747

Files:
  llvm/include/llvm/Support/Windows/WindowsSupport.h
  llvm/lib/Support/Windows/Process.inc
  llvm/lib/Support/Windows/Threading.inc
  llvm/unittests/Support/ThreadPool.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D138747.478065.patch
Type: text/x-patch
Size: 8081 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221126/5af70ac4/attachment.bin>