[PATCH] D85794: [WIP][llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]

Fri Aug 14 12:45:12 PDT 2020

ctetreau added a comment.

I understand that this is WIP, but I went ahead and pointed out several missing asserts. I assume the real patch will be based on this at least to some extent.

================
Comment at: llvm/include/llvm/Support/TypeSize.h:63
+  /// Ordering function for ElementCount, needed for `set` like containers.
+  bool operator<(const ElementCount &EC) const {
+    if (this->Scalable == EC.Scalable)
----------------
fpetrogalli wrote:
> ctetreau wrote:
> > I'm uncomfortable with having this comparitor. is `{true, 4} > {false, 4}`? Depends on the machine. I would personally prefer a named function that can be passed to the comparator parameter of containers. I understand that this makes the code uglier, so if you get significant pushback it may be worth caving on this issue.
> I think what you say makes sense.  I had to introduce a comparison operator because `llvm::SmallSet` is being used in VPlan. 
> 
> An alternative approach would be to remove the small set and have a hash function that could be used for std::unordered_set. I actually prefer this latter approach as it avoid introducing the concept of ordering ElementCount, which is kind of formally wrong as you have pointed out.
Honestly, I think it's reasonable to want to put an ElementCount in an ordered map or set. If it's just one place, it might be an easier sell to use an explicit named comparator.

I think it's really unfortunate that operator< is overloaded in this way in C++.

================
Comment at: llvm/include/llvm/Support/TypeSize.h:87
+  /// One or more elements.
+  bool isVector() const {
+    assert(!(Min == 0 && Scalable) &&
----------------
fpetrogalli wrote:
> vkmr wrote:
> > ctetreau wrote:
> > > Constraining the constructor doesn't help with the case where a valid `ElementCount` is constructed, and then `Min` set to zero later on.
> > > 
> > > Why exactly is `{true, 0}` forbidden? `forall N, N * 0 = 0` after all.
> > @ctetreau I see your point about constraining the constructor for scalable vectors; `ElementCount` is a counter and 0 is a valid count. Also, however one defines a "vector with 0 elements", it would be the same for fixed and scalar vectors. So, IF the constructor allows constructing  fixed vectors of 0 length, it should also allow scalable vectors of 0 length.
> > 
> > The perspective behind not allowing to have `Min = 0` (for both fixed and scalable vectors) is that `ElementCount`'s only purpose is to count the number of elements in a vector, and a vector with no elements is illegal vector. But that discussion would be out of scope of this patch.
> > 
> > The updated condition for `isVector()` still makes sense though.
> 
> Mathematically you are right. ElementCount is  a monomial of grade 0 (a*X^0) or 1 (a*X^1). It doesn't really matter if you are multiplying by 0*X^0 or 0*X^1, both values will null the result of the operation independently of the EC in input.
> 
> I think that @vkmr was mostly concerned about potential bugs that can happen when we check if we are working in scalable mode (so Scalable == true) but we forget to check that we have actually a non zero value by checking Min > 0. It is very unlikely to happen, but I guess there is some argument for choosing this approach. 
> 
> All in all, I'd rather avoid `{0, true}`, at list in debug mode, just to catch things that might end up weird if we end up producing this value.
So you're telling me that there's no good reason to construct a {0, true}, so as a debugging aid you're asserting that it doesn't happen? Why is {0, false} OK? I could see some valid code doing something like:

```
ElementCount EC = {std::min(someQuantity, someOtherQuantity), true};
if (!EC.isVector())
   return false;
```

Sure, I could delay construction of the ElementCount, but there's no good reason that I should have to do that.

Really though, as long as the Min and Scalable fields are public, I don't think we should try to preserve any invariants for them. They should either be made private (probably outside the scope of what you are trying to do), or its functionality should be expected to work for any combination of {Min, Scalable}.

================
Comment at: llvm/include/llvm/Support/TypeSize.h:95
+  /// Return the ElementCount instance representing a scalar.
+  static ElementCount getScalar() { return {1, false}; }
+  /// Return the ElementCount instance representing a zero.
----------------
I don't think we should add a getScalar and getZero, because it would be easy to assume these work for scalable vectors end up writing a bunch of bad code. Sure it's an easy fix, but best to just avoid the opportunity for misuse.

================
Comment at: llvm/include/llvm/Support/TypeSize.h:90
+  /// Return the ElementCount instance representing a scalar.
+  static ElementCount getScalar() { return {1, false}; }
 };
----------------
fpetrogalli wrote:
> rengolin wrote:
> > fpetrogalli wrote:
> > > rengolin wrote:
> > > > Should we have a `getScalable(int Min)`, `getNonScalable(int Min)` and `getDisabled()` too?
> > > > 
> > > > There are a number of places you changed from `VF` to `{VF, false}` and places you use literal `{0, false}` that would benefit from having proper names as the static builders. Using them everywhere would make the code safer and clearer to read.
> > > > 
> > > > Should we have a getScalable(int Min), getNonScalable(int Min) and getDisabled() too?
> > > 
> > > I'd rather not, for the following reasons. Please let me know if you disagree.
> > > 
> > > 1.  `getScalable(int Min)`, `getNonScalable(int Min) ` for replacing uses when using `{VF, false}`
> > > 
> > > The instances of `{VF, false}` are there just because I want to keep this a non-functional change. Those uses are mostly in places were VPlan is computing the possible ElementCount to evaluate. Once we extend VPlan to add also the scalable ECs, we will (likely) just add an extra level of loop iteration on the two values false/true of scalable, and use the ElementCount constructor directly. Once that is done, the two static methods you propose will become obsolete.
> > > 
> > > I agree that looking at the code with `{VF, false}` may be confusing. Would you prefer me to explicitly call the constructor instead of using an initializer list, so that we know we are dealing with ElementCount?
> > > 
> > > So, I see two approaches, let me know which one is the best in your opinion:
> > > 
> > > a. call the constructor explicitly instead of the initializer list
> > > b. have  a static method `getFixed` or `getNonScalable` (I prefer the former) that we use but mark it as deprecated as we shouldn't really be using it once the vectorizer doesn't care anymore about scalable and non scalable ECs.
> > > 
> > > 2. `getDisabled`
> > > 
> > > I am happy to add a method for the value of 'zero", but calling it `getDisabled` is misleading because ther eis no concept of Enabled/Disabled in ElementCount. I'd prefer to call it `getZero`. I am going this route for this change, let me know if you disagree, will figure something out.
> > > 
> > > Once we extend VPlan to add also the scalable ECs, we will (likely) just add an extra level of loop iteration on the two values false/true of scalable, and use the ElementCount constructor directly. Once that is done, the two static methods you propose will become obsolete.
> > 
> > I see what you mean, makes sense.
> > 
> > 
> > > I agree that looking at the code with `{VF, false}` may be confusing. Would you prefer me to explicitly call the constructor instead of using an initializer list, so that we know we are dealing with ElementCount?
> > 
> > No, init-list constants are usually more readable than explicit constructor calls. The point was not init-list vs c-tor, but constant vs named call, which won't work long term, as you explained above.
> > 
> > > 2. `getDisabled`
> > > 
> > > I am happy to add a method for the value of 'zero", but calling it `getDisabled` is misleading because ther eis no concept of Enabled/Disabled in ElementCount. I'd prefer to call it `getZero`. I am going this route for this change, let me know if you disagree, will figure something out.
> > 
> > LGTM. Thanks!
> > 
> > No, init-list constants are usually more readable than explicit constructor calls.
> 
> In the last update of the patch (before your comment) I have replaced the init-list with explicit constructor calls. I just want to make sure you are happy with this. I am am fan of init-list in general, but in this specific case I prefer the constructor, as it make it explicit that we are building something that is used to count the number of elements.
My thinking is that people shouldn't be manually constructing ElementCount objects very often. FixedVectorType and ScalableVectorType both have static getters that construct instances using only a number, and vectors have a method to get an ElementCount. I would think code that generically produces vectors that may either be fixed width or scalable from the ground up would not be common.

As for having a deprecated getFixed(), I'd like to point out that we have build bots that build with -werror, so deprecating methods with the intention of using them in upstream isn't going to work out.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:235
+  /// vectorization is disabled.
+  ElementCount BestVF = ElementCount::getZero();
   unsigned BestUF = 0;
----------------
perhaps this should just be an llvm::Optional<ElementCount>?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:321
+  if (VF.isVector()) {
+    auto *VectorTy = FixedVectorType::get(Ty, VF.Min);
     return VF * DL.getTypeAllocSize(Ty) != DL.getTypeStoreSize(VectorTy);
----------------
this can just become `VectorType::get(Ty, VF)`

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:874
         !isa<DbgInfoIntrinsic>(Inst)) {
-      auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(UF * VF);
+      auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(UF * VF.Min);
       if (NewDIL)
----------------
should assert that Scalable is false

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1822
+                       ? ConstantVector::getSplat(VF, cast<Constant>(Mul))
+                       : Builder.CreateVectorSplat(VF.Min, Mul);
   Builder.restoreIP(CurrIP);
----------------
IRBuilder has a CreateVectorSplat that takes an ElementCount

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2141
+    unsigned LastLane =
+        Cost->isUniformAfterVectorization(I, VF) ? 0 : VF.Min - 1;
     auto *LastInst = cast<Instruction>(
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2287
   unsigned InterleaveFactor = Group->getFactor();
-  auto *VecTy = FixedVectorType::get(ScalarTy, InterleaveFactor * VF);
+  auto *VecTy = FixedVectorType::get(ScalarTy, InterleaveFactor * VF.Min);

----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2485
+
+  auto *DataTy = FixedVectorType::get(ScalarDataTy, VF.Min);
   const Align Alignment = getLoadStoreAlignment(Instr);
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2816
     CheckMinIters = Builder.CreateICmp(
-        P, Count, ConstantInt::get(Count->getType(), VF * UF),
+        P, Count, ConstantInt::get(Count->getType(), VF.Min * UF),
         "min.iters.check");
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3271
+  // TODO: make this step a runtime value
+  Constant *Step = ConstantInt::get(IdxTy, VF.Min * UF);
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3737
+        UndefValue::get(FixedVectorType::get(VectorInit->getType(), VF.Min)),
+        VectorInit, Builder.getInt32(VF.Min - 1), "vector.recur.init");
   }
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3963
     assert(!IsInLoopReductionPhi && "Unexpected truncated inloop reduction!");
-    Type *RdxVecTy = FixedVectorType::get(RdxDesc.getRecurrenceType(), VF);
+    Type *RdxVecTy = FixedVectorType::get(RdxDesc.getRecurrenceType(), VF.Min);
     Builder.SetInsertPoint(
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5492
   // Clamp the interleave ranges to reasonable counts.
-  unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);
+  unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF.Min);

----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5678
     unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
-    return std::max<unsigned>(1, VF * TypeSize / WidestRegister);
+    return std::max<unsigned>(1, VF.Min * TypeSize / WidestRegister);
   };
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5906
+    unsigned ScalarCost =
+        VF.Min * getInstructionCost(I, ElementCount::getScalar()).first;

----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6149
   unsigned InterleaveFactor = Group->getFactor();
-  auto *WideVecTy = FixedVectorType::get(ValTy, VF * InterleaveFactor);
+  auto *WideVecTy = FixedVectorType::get(ValTy, VF.Min * InterleaveFactor);

----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6445
       // Return cost for branches around scalarized and predicated blocks.
-      auto *Vec_i1Ty =
-          FixedVectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);
-      return (TTI.getScalarizationOverhead(Vec_i1Ty, APInt::getAllOnesValue(VF),
-                                           false, true) +
-              (TTI.getCFInstrCost(Instruction::Br, CostKind) * VF));
-    } else if (I->getParent() == TheLoop->getLoopLatch() || VF == 1)
+      auto *Vec_i1Ty = FixedVectorType::get(
+          IntegerType::getInt1Ty(RetTy->getContext()), VF.Min);
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:6672

-    unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
+    unsigned N = isScalarAfterVectorization(I, VF) ? VF.Min : 1;
     return N *
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:303
     State->Instance->Part = Part;
-    for (unsigned Lane = 0, VF = State->VF; Lane < VF; ++Lane) {
+    for (unsigned Lane = 0, VF = State->VF.Min; Lane < VF; ++Lane) {
       State->Instance->Lane = Lane;
----------------
assert not scalable

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:154
     assert(Instance.Part < UF && "Queried Scalar Part is too large.");
-    assert(Instance.Lane < VF && "Queried Scalar Lane is too large.");
+    assert(Instance.Lane < VF.Min && !VF.Scalable &&
+           "Queried Scalar Lane is too large.");
----------------
NIT: the message doesn't match the VF being scalable. Should be split out

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85794/new/

https://reviews.llvm.org/D85794