[llvm] [AMDGPU][PromoteAlloca] Whole-function alloca promotion to vector (PR #84735)

Mon Mar 11 03:32:40 PDT 2024

================
@@ -225,6 +253,49 @@ FunctionPass *llvm::createAMDGPUPromoteAllocaToVector() {
   return new AMDGPUPromoteAllocaToVector();
 }
 
+void AMDGPUPromoteAllocaImpl::sortAllocasToPromote(
+    SmallVectorImpl<AllocaInst *> &Allocas) {
+  DenseMap<AllocaInst *, unsigned> Scores;
+
+  LLVM_DEBUG(dbgs() << "Before sorting allocas:\n"; for (auto *A
+                                                         : Allocas) dbgs()
+                                                    << "  " << *A << "\n";);
+
+  for (auto *Alloca : Allocas) {
+    LLVM_DEBUG(dbgs() << "Scoring: " << *Alloca << "\n");
+    unsigned &Score = Scores[Alloca];
+    // Increment score by one for each user + a bonus for users within loops.
+    //
+    // Look through GEPs and bitcasts for additional users.
+    SmallVector<User *, 8> WorkList;
+    WorkList.append(Alloca->user_begin(), Alloca->user_end());
+    while (!WorkList.empty()) {
+      auto *Inst = dyn_cast<Instruction>(WorkList.pop_back_val());
+      if (!Inst)
+        continue;
+
+      if (isa<BitCastInst>(Inst) || isa<GetElementPtrInst>(Inst)) {
+        WorkList.append(Inst->user_begin(), Inst->user_end());
+        continue;
+      }
+
----------------
Pierre-vh wrote:

Currently it just works on users, so if we have a big alloca with 2 users and a small one with 3, it'll promote the small one first then may run out of the budget for the big one.

Making it aware of size is a good idea, but I'm not sure how big the impact should be. It's tricky to tune this because while big allocas may be better to promote, they might cause more spills which negates all the benefits of the transform as well.

I'd say adding size awareness can wait, but if we want to add it now, I'd suggest adding some small penalty for small allocas (under a certain threshold, e.g. 8 bytes) and adding a small boost to bigger allocas (e.g. above 64 bytes)


https://github.com/llvm/llvm-project/pull/84735