[PATCH] D93617: [LoopVectorize] Take non-instruction's size into consideration when computing leader's demanded bits

Wed Jan 6 18:08:25 PST 2021

guopeilin added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:539
+    if (!isa<Instruction>(Val)) {
+      if (!isa<Argument>(Val))
+        continue;
----------------
nikic wrote:
> guopeilin wrote:
> > nikic wrote:
> > > I think we can't entirely ignore non-arguments (aka constants) here. Take a look at this modified form of `llvm/test/Transforms/LoopVectorize/avoid-truncate-icmp-operands.ll`:
> > > 
> > > ```
> > > ; RUN: opt -loop-vectorize -S < %s | FileCheck %s
> > > target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
> > > target triple = "aarch64-unknown-linux-gnu"
> > > 
> > > @a = dso_local local_unnamed_addr global i64 0, align 8
> > > @b = dso_local local_unnamed_addr global i16 0, align 4
> > > 
> > > define dso_local void @myFunc(i32 %d, i32 %e) {
> > > ; CHECK:        pred.store.continue2:
> > > ; CHECK-NEXT:      %{{[0-9]+}} = icmp ult <2 x i64> %broadcast.splat{{[0-9]*}}, %broadcast.splat{{[0-9]*}}
> > > for.body29.lr.ph:
> > >   br label %for.body29
> > > 
> > > for.cond25.for.cond.cleanup28_crit_edge:          ; preds = %for.inc
> > >   ret void
> > > 
> > > for.body29:                                       ; preds = %for.inc, %for.body29.lr.ph
> > >   %n.078 = phi i16 [ undef, %for.body29.lr.ph ], [ %add34, %for.inc ]
> > >   br i1 undef, label %for.inc, label %if.then
> > > 
> > > if.then:                                          ; preds = %for.body29
> > >   %conv31 = zext i8 undef to i64
> > >   store i64 %conv31, i64* @a, align 8
> > >   %cmp.i = icmp ult i32 %e, %d
> > >   %.sroa.speculated = select i1 %cmp.i, i64 10000000000, i64 20000000000
> > >   %conv32 = trunc i64 %.sroa.speculated to i16
> > >   store i16 %conv32, i16* @b, align 4
> > >   br label %for.inc
> > > 
> > > for.inc:                                          ; preds = %if.then, %for.body29
> > >   %add34 = add nsw i16 %n.078, 2
> > >   %cmp27 = icmp slt i16 %add34, 16
> > >   br i1 %cmp27, label %for.body29, label %for.cond25.for.cond.cleanup28_crit_edge, !llvm.loop !6
> > > }
> > > 
> > > !6 = distinct !{!6, !7}
> > > !7 = !{!"llvm.loop.vectorize.enable", i1 true}
> > > ```
> > > 
> > > This gets vectorized with i32 width, but the two select constants are larger than 32-bit and get truncated.
> > > 
> > > I think the logic here should be: For `ConstantInt`, take the `getActiveBits()` of the constant value, for everything else (including Arguments) take the getTypeSizeInBits (what you're effectively doing now).
> > Thanks a lot for reviewing this patch. And a new patch has been updated, which takes case `ConstantInt` into consideration, too.
> > And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?
> > Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> > It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.
> > 
> > And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?
> 
> While the `%d` and `%e` arguments used in the icmp are 32-bit wide, the two constants that the `select` chooses between are larger than 32-bit. Vectorizing with 32-bit elements will also truncate the select constant and lose the top bits. Is this clearer?
> 
> By the way, it would be good if you could also add the above test case with a check that it uses i64 vectorization, not i32.
> 
> > Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> > It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.
> 
> I'm not sure to what degree these are really relevant in this code (as demanded bits only works on integer instructions), but generally yes, they should be taken into account. I would suggest to write the code something like this, so only ConstantInt has special treatment (to avoid regressions) while everything else is handled as "else":
> 
> ```
> unsigned BitSize;
> if (const auto *CI = dyn_cast<ConstantInt>(Val)) {
>   BitSize = CI->getValue().getActiveBits();
> } else {
>   DataLayout DL = ...;
>   BitSize = DL.getTypeSizeInBits(Val->getType());
> }
> 
> // Bail out for same reason below.
> if (BitSize > 64)
>   return MapVector<Instruction *, uint64_t>();
> uint64_t V = APInt::getAllOnesValue(BitSize).getZExtValue();
> DBits[Leader] |= V;
> LLVM_DEBUG(dbgs() << "\t Values's dbits " << Twine::utohexstr(V)
>                           << '\n');
> ```
> 
> There's just one problem there: Getting the DataLayout. I don't see any obvious way to get it, so it may be necessary to pass it as an argument to computeMinimumValueSizes().
Hi, Thanks a lot for your review and suggestion.
And as for DataLayout, I found that `ConstantInt` can be `0`,  the ActiveBits will return 0. So we may still need to check the BitSize before initializing APInt with BitSize. So maybe DL is not so necessary here.
Also, I found that case 
```
Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll
```
failed if we `handle everything else as 'else'`. So maybe we can only take Argument and ConstantInt into consideration, a conservative way.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93617/new/

https://reviews.llvm.org/D93617