[PATCH] D93617: [LoopVectorize] Take non-instruction's size into consideration when computing leader's demanded bits

Thu Jan 7 22:32:52 PST 2021

guopeilin added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:539
+    if (!isa<Instruction>(Val)) {
+      if (!isa<Argument>(Val))
+        continue;
----------------
guopeilin wrote:
> guopeilin wrote:
> > nikic wrote:
> > > guopeilin wrote:
> > > > nikic wrote:
> > > > > I think we can't entirely ignore non-arguments (aka constants) here. Take a look at this modified form of `llvm/test/Transforms/LoopVectorize/avoid-truncate-icmp-operands.ll`:
> > > > > 
> > > > > ```
> > > > > ; RUN: opt -loop-vectorize -S < %s | FileCheck %s
> > > > > target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
> > > > > target triple = "aarch64-unknown-linux-gnu"
> > > > > 
> > > > > @a = dso_local local_unnamed_addr global i64 0, align 8
> > > > > @b = dso_local local_unnamed_addr global i16 0, align 4
> > > > > 
> > > > > define dso_local void @myFunc(i32 %d, i32 %e) {
> > > > > ; CHECK:        pred.store.continue2:
> > > > > ; CHECK-NEXT:      %{{[0-9]+}} = icmp ult <2 x i64> %broadcast.splat{{[0-9]*}}, %broadcast.splat{{[0-9]*}}
> > > > > for.body29.lr.ph:
> > > > >   br label %for.body29
> > > > > 
> > > > > for.cond25.for.cond.cleanup28_crit_edge:          ; preds = %for.inc
> > > > >   ret void
> > > > > 
> > > > > for.body29:                                       ; preds = %for.inc, %for.body29.lr.ph
> > > > >   %n.078 = phi i16 [ undef, %for.body29.lr.ph ], [ %add34, %for.inc ]
> > > > >   br i1 undef, label %for.inc, label %if.then
> > > > > 
> > > > > if.then:                                          ; preds = %for.body29
> > > > >   %conv31 = zext i8 undef to i64
> > > > >   store i64 %conv31, i64* @a, align 8
> > > > >   %cmp.i = icmp ult i32 %e, %d
> > > > >   %.sroa.speculated = select i1 %cmp.i, i64 10000000000, i64 20000000000
> > > > >   %conv32 = trunc i64 %.sroa.speculated to i16
> > > > >   store i16 %conv32, i16* @b, align 4
> > > > >   br label %for.inc
> > > > > 
> > > > > for.inc:                                          ; preds = %if.then, %for.body29
> > > > >   %add34 = add nsw i16 %n.078, 2
> > > > >   %cmp27 = icmp slt i16 %add34, 16
> > > > >   br i1 %cmp27, label %for.body29, label %for.cond25.for.cond.cleanup28_crit_edge, !llvm.loop !6
> > > > > }
> > > > > 
> > > > > !6 = distinct !{!6, !7}
> > > > > !7 = !{!"llvm.loop.vectorize.enable", i1 true}
> > > > > ```
> > > > > 
> > > > > This gets vectorized with i32 width, but the two select constants are larger than 32-bit and get truncated.
> > > > > 
> > > > > I think the logic here should be: For `ConstantInt`, take the `getActiveBits()` of the constant value, for everything else (including Arguments) take the getTypeSizeInBits (what you're effectively doing now).
> > > > Thanks a lot for reviewing this patch. And a new patch has been updated, which takes case `ConstantInt` into consideration, too.
> > > > And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?
> > > > Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> > > > It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.
> > > > 
> > > > And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?
> > > 
> > > While the `%d` and `%e` arguments used in the icmp are 32-bit wide, the two constants that the `select` chooses between are larger than 32-bit. Vectorizing with 32-bit elements will also truncate the select constant and lose the top bits. Is this clearer?
> > > 
> > > By the way, it would be good if you could also add the above test case with a check that it uses i64 vectorization, not i32.
> > > 
> > > > Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> > > > It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.
> > > 
> > > I'm not sure to what degree these are really relevant in this code (as demanded bits only works on integer instructions), but generally yes, they should be taken into account. I would suggest to write the code something like this, so only ConstantInt has special treatment (to avoid regressions) while everything else is handled as "else":
> > > 
> > > ```
> > > unsigned BitSize;
> > > if (const auto *CI = dyn_cast<ConstantInt>(Val)) {
> > >   BitSize = CI->getValue().getActiveBits();
> > > } else {
> > >   DataLayout DL = ...;
> > >   BitSize = DL.getTypeSizeInBits(Val->getType());
> > > }
> > > 
> > > // Bail out for same reason below.
> > > if (BitSize > 64)
> > >   return MapVector<Instruction *, uint64_t>();
> > > uint64_t V = APInt::getAllOnesValue(BitSize).getZExtValue();
> > > DBits[Leader] |= V;
> > > LLVM_DEBUG(dbgs() << "\t Values's dbits " << Twine::utohexstr(V)
> > >                           << '\n');
> > > ```
> > > 
> > > There's just one problem there: Getting the DataLayout. I don't see any obvious way to get it, so it may be necessary to pass it as an argument to computeMinimumValueSizes().
> > Hi, Thanks a lot for your review and suggestion.
> > And as for DataLayout, I found that `ConstantInt` can be `0`,  the ActiveBits will return 0. So we may still need to check the BitSize before initializing APInt with BitSize. So maybe DL is not so necessary here.
> > Also, I found that case 
> > ```
> > Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll
> > ```
> > failed if we `handle everything else as 'else'`. So maybe we can only take Argument and ConstantInt into consideration, a conservative way.
> > 
> Sorry, It seems that the failure of test case `Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll` is not caused by `handle everything else as 'else'`, I am figuring it out. Please wait.
Hi, nikic! I update a new patch.
This patch fix three things:
Firstly, a `ConstantInt` can be zero, which means `getActiveBits` can be `0`, we still need to check `bitsize == 0` even through  we have `DL.getTypeSizeInBits(Val->getType())`. So I guess we don't need DataLayout here.
Secondly, a ConstantInt can be negative, if we still use `getActiveBits`, that is `BitWidth - countLeadingZeros()`, we will always get the most conservative bits for negative value cause its first bit is `1`. That is the reason why I failed on the test case by using getActiveBits for all ConstantInt
```
Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll
```
For this case, there exist a negative constant int `-28`, So I use `getMinSignedBits()` for negative cases.
Thirdly, for no negative constant int, I still use `getActiveBits` as before. And handle everything else as 'else' by using `Val->getType()->getScalarSizeInBits()`

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93617/new/

https://reviews.llvm.org/D93617