[PATCH] D99718: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost

Tue Apr 20 00:47:16 PDT 2021

david-arm added a comment.

Hi @ctetreau and @fhahn, my concern is that the decisions for whether or not to multiply by VF is quite complicated and is lacking a well-documented description for the expected behaviour. I'm cautious that the updated assert may still not cover everything correctly and is therefore fragile.

On the other hand, if the condition is incorrect but not asserted, then in the worst case the cost would be different and the vectorizer might pick a different VF. If we add an assertion and the condition is not correct, the compiler would crash.

So it depends on how fragile the condition would be for handling new corner-cases, and if it's even worth the effort given that the impact of making wrong assumptions by this code is very low.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7433
     SmallVector<const Value *, 4> Operands(I->operand_values());
-    unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
-    return N * TTI.getArithmeticInstrCost(
-                   I->getOpcode(), VectorTy, CostKind,
-                   TargetTransformInfo::OK_AnyValue,
-                   Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
+    return TTI.getArithmeticInstrCost(
+        I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
----------------
ctetreau wrote:
> I'd feel better if there was an assert here that verifies that the conditions you state in the commit message are true. Same with all the other places N is eliminated.
> 
> At the very least, a comment that explains inline that, even though conceptually there should be an N, because of [reasons], we don't need it.
Hi @ctetreau, in the first iteration of this patch I did assert something like this:

```
  auto hasSingleCopyAfterVectorization = [this](Instruction *I,
                                                ElementCount VF) -> bool {
    if (VF.isScalar())
      return true;

    auto Scalarized = InstsToScalarize.find(VF);
    assert(Scalarized != InstsToScalarize.end() &&
           "VF not yet analyzed for scalarization profitability");
    return !Scalarized->second.count(I) &&
           llvm::all_of(I->users(), [&](User *U) {
             auto *UI = cast<Instruction>(U);
             return !Scalarized->second.count(UI);
           });
  };

  if (isScalarAfterVectorization(I, VF)) {
    VectorTy = RetTy;
    // With the exception of GEPs, after scalarization there should only be one
    // copy of the instruction generated in the loop. This is because the VF is
    // either 1, or any instructions that need scalarizing have already been
    // dealt with by the the time we get here. As a result, it means we don't
    // have to multiply the instruction cost by VF.
    assert(I->getOpcode() == Instruction::GetElementPtr ||
           hasSingleCopyAfterVectorization(I, VF));
```

and although it passed the LLVM tests, it failed after merging due to another missing case - that of Instruction::Phi.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99718/new/

https://reviews.llvm.org/D99718