[PATCH] D128911: Emit table lookup from TargetLowering::expandCTTZ()
Shubham Narlawar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 4 03:39:12 PDT 2022
gsocshubham added inline comments.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7960
+ long unsigned BitWidth) const {
+ if (BitWidth == 8 || BitWidth == 16)
+ return SDValue();
----------------
dmgreen wrote:
> This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest.
Updated accordingly.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7976
+
+ SmallVector<uint16_t> Table(BitWidth, 0);
+ for (long unsigned i = 0; i < BitWidth; i++) {
----------------
dmgreen wrote:
> Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.
Reverted to `uint8_t`.
For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as -
```
.ascii "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:"
```
Hence, earlier I had changed it to `uint16_t` where I was getting -
```
.half 0 # 0x0
.half 1 # 0x1
.half 2 # 0x2
.half 7 # 0x7
.half 3 # 0x3
.half 13 # 0xd
.half 8 # 0x8
.half 19 # 0x13
.half 4 # 0x4
.half 25 # 0x19
.half 14 # 0xe
.half 28 # 0x1c
.half 9 # 0x9
.half 34 # 0x22
.half 20 # 0x14
.half 40 # 0x28
.half 5 # 0x5
.half 17 # 0x11
.half 26 # 0x1a
.half 38 # 0x26
.half 15 # 0xf
.half 46 # 0x2e
.half 29 # 0x1d
.half 48 # 0x30
.half 10 # 0xa
.half 31 # 0x1f
.half 35 # 0x23
.half 54 # 0x36
.half 21 # 0x15
.half 50 # 0x32
.half 41 # 0x29
.half 57 # 0x39
.half 63 # 0x3f
.half 6 # 0x6
.half 12 # 0xc
.half 18 # 0x12
.half 24 # 0x18
.half 27 # 0x1b
.half 33 # 0x21
.half 39 # 0x27
.half 16 # 0x10
.half 37 # 0x25
.half 45 # 0x2d
.half 47 # 0x2f
.half 30 # 0x1e
.half 53 # 0x35
.half 49 # 0x31
.half 56 # 0x38
.half 62 # 0x3e
.half 11 # 0xb
.half 23 # 0x17
.half 32 # 0x20
.half 36 # 0x24
.half 44 # 0x2c
.half 52 # 0x34
.half 55 # 0x37
.half 61 # 0x3d
.half 22 # 0x16
.half 43 # 0x2b
.half 51 # 0x33
.half 60 # 0x3c
.half 42 # 0x2a
.half 59 # 0x3b
.half 58 # 0x3a
```
================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7987
+ CA, getPointerTy(TD),
+ TD.getPrefTypeAlign(VT.getTypeForEVT(*DAG.getContext())));
+ Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
----------------
barannikov88 wrote:
> craig.topper wrote:
> > The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector.
>
> Why the type of the elements specifically and not of the CA itself?
> AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.
Understood. Thanks!
Now, the alignment is of `CA`.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7989
+ Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
+ SDValue Load = DAG.getLoad(VT, dl, DAG.getEntryNode(),
+ DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
----------------
dmgreen wrote:
> It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.
Removed Load. It is just `getExtLoad` with `MVT::i8` now.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7998
+ LD->getMemoryVT(), Alignment, MMOFlags, LD->getAAInfo());
+ if (isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
+ EVT SetCCVT =
----------------
dmgreen wrote:
> gsocshubham wrote:
> > @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`?
> >
> > It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering.
> I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {`
> But I'm not sure what this does to the profitability of the transform.
>
> It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.
I have added above check!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128911/new/
https://reviews.llvm.org/D128911
More information about the llvm-commits
mailing list