[PATCH] D71775: [ThreadPool] On Windows, extend usage to all CPU sockets and all NUMA groups

Mon Dec 30 13:48:15 PST 2019

rnk added subscribers: inglorion, gbiv.
rnk added a comment.
Herald added a subscriber: george.burgess.iv.

+ at gbiv @inglorion, other ThinLTO users.

> To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

That seems reasonable to me.

================
Comment at: llvm/include/llvm/ADT/SmallBitVector.h:487
+
+  int compare(const SmallBitVector &RHS) const {
+    for (unsigned I = 0; I < std::max(size(), RHS.size()); ++I) {
----------------
I guess we don't need these changes, since it looks like you use BitVector only below.

================
Comment at: llvm/include/llvm/ADT/SmallBitVector.h:500
+
+  bool operator<(const SmallBitVector &RHS) const { return compare(RHS) == -1; }
+
----------------
You return `A - B` but then compare for equality to -1 and 1. I guess it works because you are doing it bit by bit, but it's exciting.

================
Comment at: llvm/include/llvm/Support/Threading.h:153
+    // runtime might use a lower value (not higher).
+    unsigned ThreadCount = 0;
+
----------------
Let's name the fields in a way that indicates that these numbers are the requested number of threads, not the final number. So, `ThreadsRequested` or something like that.

================
Comment at: llvm/include/llvm/Support/Threading.h:158
+    // specific core).
+    bool HyperThreads = true;
+
----------------
This could be `UseHyperThreads`. The first time I read it, I guessed that it indicated if the system has hyperthreads.

================
Comment at: llvm/lib/Support/Host.cpp:1277
                  << "/proc/cpuinfo: " << EC.message() << "\n";
     return -1;
   }
----------------
Another -1 case.

================
Comment at: llvm/lib/Support/Host.cpp:1334
 // On other systems, return -1 to indicate unknown.
 static int computeHostNumPhysicalCores() { return -1; }
 #endif
----------------
Note: the -1 case.

================
Comment at: llvm/lib/Support/Threading.cpp:85-86

-#include <thread>
-unsigned llvm::heavyweight_hardware_concurrency() {
-  // Since we can't get here unless LLVM_ENABLE_THREADS == 1, it is safe to use
-  // `std::thread` directly instead of `llvm::thread` (and indeed, doing so
-  // allows us to not define `thread` in the llvm namespace, which conflicts
-  // with some platforms such as FreeBSD whose headers also define a struct
-  // called `thread` in the global namespace which can cause ambiguity due to
-  // ADL.
-  int NumPhysical = sys::getHostNumPhysicalCores();
-  if (NumPhysical == -1)
-    return std::thread::hardware_concurrency();
-  return NumPhysical;
-}
+int computeHostNumPhysicalCores();
+int computeHostNumPhysicalThreads();

----------------
We already have a public API, getHostNumPhysicalCores. Can this use that?

================
Comment at: llvm/lib/Support/Threading.cpp:89
+unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
+  unsigned MaxThreadCount = HyperThreads ? computeHostNumPhysicalThreads()
+                                         : computeHostNumPhysicalCores();
----------------
I see these `computeHostNum*` methods return int, and some versions seem to return -1 to indicate errors. I think you'll want to use `int` here and check if `MaxThreadCount <= 0`. Otherwise below we may do a signed/unsigned comparison mismatch, and return ~0U for MaxThreadCount.

================
Comment at: llvm/lib/Support/Unix/Threading.inc:273
+
+int computeHostNumPhysicalThreads() {
+#if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
----------------
I'm not sure it makes sense to say "physical threads". I think "physical cores" was meant to distinguish between hardware threads and cores. Describing hardware execution resources as physical or non-physical isn't very precise or meaningful in the first place, but I don't think it should apply to hyper threads.

================
Comment at: llvm/lib/Support/Windows/Threading.inc:160
+
+static ArrayRef<ProcessorGroup> computeProcessorGroups() {
+  auto computeGroups = []() {
----------------
This is cached, so maybe `getProcessorGroups` to indicate that it is not intended to be expensive.

================
Comment at: llvm/lib/Support/Windows/Threading.inc:212
+template <typename R, typename UnaryPredicate>
+unsigned aggregate(R &&Range, UnaryPredicate P) {
+  unsigned I{};
----------------
Seems like `static` can be used here.

================
Comment at: llvm/unittests/Support/ThreadPool.cpp:31
 protected:
   // This is intended for platform as a temporary "XFAIL"
   bool isUnsupportedOSOrEnvironment() {
----------------
"Temporary" >_>

================
Comment at: llvm/unittests/Support/ThreadPool.cpp:180
+
+  //std::set<llvm::BitVector> ThreadsUsed;
+  llvm::DenseSet<llvm::BitVector> ThreadsUsed;
----------------
I guess this is why you added comparison operators. In any case, let's remove the commented out code in the final version.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71775/new/

https://reviews.llvm.org/D71775