[llvm] [llvm][transforms] Add a new algorithm to SplitModule (PR #95941)

Tue Jul 2 13:54:10 PDT 2024

================
@@ -268,6 +268,38 @@ void llvm::SplitModule(
   ClusterIDMapType ClusterIDMap;
   findPartitions(M, ClusterIDMap, N);
 
+  // Find empty modules and functions not mapped to modules in ClusterIDMap.
+  // Map these functions to the empty modules so that they skip being
+  // distributed by isInPartition() based on function name hashes below.
+  // This provides better uniformity of distribution of functions to modules
+  // in some cases - for example when the number of functions equals to N.
+  if (TryToAvoidEmptyModules) {
+    DenseSet<unsigned> NonEmptyModules;
+    SmallVector<const GlobalValue *> UnmappedFunctions;
+    for (const auto &F : M.functions()) {
+      if (F.isDeclaration() ||
+          F.getLinkage() != GlobalValue::LinkageTypes::ExternalLinkage)
+        continue;
+      auto It = ClusterIDMap.find(&F);
+      if (It == ClusterIDMap.end())
+        UnmappedFunctions.push_back(&F);
+      else
+        NonEmptyModules.insert(It->second);
+    }
+    SmallVector<unsigned> EmptyModules;
+    for (unsigned I = 0; I < N; ++I) {
+      if (!NonEmptyModules.contains(I))
+        EmptyModules.push_back(I);
+    }
+    auto NextEmptyModuleIt = EmptyModules.begin();
+    for (const auto F : UnmappedFunctions) {
+      if (NextEmptyModuleIt == EmptyModules.end())
+        break;
+      ClusterIDMap.insert({F, *NextEmptyModuleIt});
+      ++NextEmptyModuleIt;
+    }
----------------
joker-eph wrote:

> To get less useless empty modules.
>  reduces the number of empty modules as compared to what hash-based distribution does.

Right, I know it'll "work", but "to get less useless empty modules." isn't an obvious optimization metrics to me! It just seems like an odd incomplete solution to a load-balancing problem.

https://github.com/llvm/llvm-project/pull/95941