[llvm] [LoopVectorize] Refine runtime memory check costs when there is an outer loop (PR #76034)

Fri Dec 22 12:11:29 PST 2023

================
@@ -2091,16 +2091,45 @@ class GeneratedRTChecks {
         LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
         RTCheckCost += C;
       }
-    if (MemCheckBlock)
+    if (MemCheckBlock) {
+      InstructionCost MemCheckCost = 0;
       for (Instruction &I : *MemCheckBlock) {
         if (MemCheckBlock->getTerminator() == &I)
           continue;
         InstructionCost C =
             TTI->getInstructionCost(&I, TTI::TCK_RecipThroughput);
         LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
-        RTCheckCost += C;
+        MemCheckCost += C;
+      }
+
+      // If the runtime memory checks are being created inside an outer loop
+      // we should find out if these checks are outer loop invariant. If so,
+      // the checks will be hoisted out and so the effective cost will reduce
+      // according to the outer loop trip count.
+      if (OuterLoop) {
+        ScalarEvolution *SE = MemCheckExp.getSE();
+        const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
----------------
fhahn wrote:

> I assume that getting SCEVs is quite expensive, unless they've already been cached before. So I was worried about the increase in compilation time by checking each instruction.

I think `SE->getSCEV(MemRuntimeCheckCond)` will cause SCEV expressions to be built anyways for all the instructions feeding `MemRuntimeCheckCond`, so the time spent in SCEV construction should be roughly the same.

> Even if a particular instruction in the sequence is invariant, there is no guarantee it will be hoisted if the use is not invariant for some reason. So I thought the most convincing case was when the final condition was invariant as that likely means the whole sequence will be hoisted.

I think LLVM's LICM will hoist any hoist able instruction, the backend may sink them back. Loop-invariant instructions may also be sunk back in the loop (e.g. if it reduces register pressure or something like that), so at this point there's no hard guarantees I think. 

it might not be worth including this in the initial version, but it would at least be good to include a comment on why we only the final condition is checked.


https://github.com/llvm/llvm-project/pull/76034