[PATCH] D93617: [LoopVectorize] Take non-instruction's size into consideration when computing leader's demanded bits

Tue Jan 5 13:50:46 PST 2021

nikic added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:539
+    if (!isa<Instruction>(Val)) {
+      if (!isa<Argument>(Val))
+        continue;
----------------
guopeilin wrote:
> nikic wrote:
> > I think we can't entirely ignore non-arguments (aka constants) here. Take a look at this modified form of `llvm/test/Transforms/LoopVectorize/avoid-truncate-icmp-operands.ll`:
> > 
> > ```
> > ; RUN: opt -loop-vectorize -S < %s | FileCheck %s
> > target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
> > target triple = "aarch64-unknown-linux-gnu"
> > 
> > @a = dso_local local_unnamed_addr global i64 0, align 8
> > @b = dso_local local_unnamed_addr global i16 0, align 4
> > 
> > define dso_local void @myFunc(i32 %d, i32 %e) {
> > ; CHECK:        pred.store.continue2:
> > ; CHECK-NEXT:      %{{[0-9]+}} = icmp ult <2 x i64> %broadcast.splat{{[0-9]*}}, %broadcast.splat{{[0-9]*}}
> > for.body29.lr.ph:
> >   br label %for.body29
> > 
> > for.cond25.for.cond.cleanup28_crit_edge:          ; preds = %for.inc
> >   ret void
> > 
> > for.body29:                                       ; preds = %for.inc, %for.body29.lr.ph
> >   %n.078 = phi i16 [ undef, %for.body29.lr.ph ], [ %add34, %for.inc ]
> >   br i1 undef, label %for.inc, label %if.then
> > 
> > if.then:                                          ; preds = %for.body29
> >   %conv31 = zext i8 undef to i64
> >   store i64 %conv31, i64* @a, align 8
> >   %cmp.i = icmp ult i32 %e, %d
> >   %.sroa.speculated = select i1 %cmp.i, i64 10000000000, i64 20000000000
> >   %conv32 = trunc i64 %.sroa.speculated to i16
> >   store i16 %conv32, i16* @b, align 4
> >   br label %for.inc
> > 
> > for.inc:                                          ; preds = %if.then, %for.body29
> >   %add34 = add nsw i16 %n.078, 2
> >   %cmp27 = icmp slt i16 %add34, 16
> >   br i1 %cmp27, label %for.body29, label %for.cond25.for.cond.cleanup28_crit_edge, !llvm.loop !6
> > }
> > 
> > !6 = distinct !{!6, !7}
> > !7 = !{!"llvm.loop.vectorize.enable", i1 true}
> > ```
> > 
> > This gets vectorized with i32 width, but the two select constants are larger than 32-bit and get truncated.
> > 
> > I think the logic here should be: For `ConstantInt`, take the `getActiveBits()` of the constant value, for everything else (including Arguments) take the getTypeSizeInBits (what you're effectively doing now).
> Thanks a lot for reviewing this patch. And a new patch has been updated, which takes case `ConstantInt` into consideration, too.
> And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?
> Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.
> 
> And there still exists something that confuses me. Firstly, I don't know why `the two select constants are larger than 32-bit`? As far as I can see, both `%d` and `%e` are defined as an `i32 argument`, what is the purpose that we take `ConstantInt` into consideration here?

While the `%d` and `%e` arguments used in the icmp are 32-bit wide, the two constants that the `select` chooses between are larger than 32-bit. Vectorizing with 32-bit elements will also truncate the select constant and lose the top bits. Is this clearer?

By the way, it would be good if you could also add the above test case with a check that it uses i64 vectorization, not i32.

> Secondly, for `Constant`, there are `ConstantFP`, `ConstantInt` and some other kind of constant, so shall we take them into consideration, too?
> It would be a great appreciation if you could explain it a bit more. Again, thanks a lot for reviewing.

I'm not sure to what degree these are really relevant in this code (as demanded bits only works on integer instructions), but generally yes, they should be taken into account. I would suggest to write the code something like this, so only ConstantInt has special treatment (to avoid regressions) while everything else is handled as "else":

```
unsigned BitSize;
if (const auto *CI = dyn_cast<ConstantInt>(Val)) {
  BitSize = CI->getValue().getActiveBits();
} else {
  DataLayout DL = ...;
  BitSize = DL.getTypeSizeInBits(Val->getType());
}

// Bail out for same reason below.
if (BitSize > 64)
  return MapVector<Instruction *, uint64_t>();
uint64_t V = APInt::getAllOnesValue(BitSize).getZExtValue();
DBits[Leader] |= V;
LLVM_DEBUG(dbgs() << "\t Values's dbits " << Twine::utohexstr(V)
                          << '\n');
```

There's just one problem there: Getting the DataLayout. I don't see any obvious way to get it, so it may be necessary to pass it as an argument to computeMinimumValueSizes().

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93617/new/

https://reviews.llvm.org/D93617