[llvm] LoopVectorize: guard appending InstsToScalarize; fix bug (PR #88720)

Wed Apr 17 01:57:23 PDT 2024

================
@@ -5815,7 +5815,8 @@ void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
     if (!blockNeedsPredicationForAnyReason(BB))
       continue;
     for (Instruction &I : *BB)
-      if (isScalarWithPredication(&I, VF)) {
+      if (isScalarWithPredication(&I, VF) &&
+          !isScalarAfterVectorization(&I, VF)) {
----------------
david-arm wrote:

Thanks for the fix!

Honestly speaking I find it quite difficult to distinguish isScalarWithPredication from isScalarAfterVectorization, but this is not the fault of this patch. :)

If I understand correctly, the udiv is being treated as scalar-with-predication, which in this case means that we can't speculatively execute all lanes of the vector udiv because one of the lanes could fault, whereas in the original loop it may not due to predication. However, isScalarAfterVectorization returns true if after vectorisation there will only be a single copy of the instruction. In our case we will replicate the udiv for each lane in it's own predicated block to ensure that we maintain the same faulting behaviour as the original scalar loop.

The problem I see with this change is that it seems to assume there will be other instructions in the same block that will add BB to PredicatedBBsAfterVectorization, i.e. the store. I think that's why this patch seems to work. I think you still need to add the block when encountering the udiv in case there are no other instructions that will. For example, something like this:

```
      if (isScalarWithPredication(&I, VF)) {
        ScalarCostsTy ScalarCosts;
        // Do not apply discount if scalable, because that would lead to
        // invalid scalarization costs.
        // Do not apply discount logic if hacked cost is needed
        // for emulated masked memrefs.
        if (!isScalarAfterVectorization(&I, VF) && !VF.isScalable() && !useEmulatedMaskMemRefHack(&I, VF) &&
            computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
          ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
        // Remember that BB will remain after vectorization.
        PredicatedBBsAfterVectorization[VF].insert(BB);
      }
```

https://github.com/llvm/llvm-project/pull/88720