[PATCH] D128911: Emit table lookup from TargetLowering::expandCTTZ()

Thu Aug 4 03:39:12 PDT 2022

gsocshubham added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7960
+                                        long unsigned BitWidth) const {
+  if (BitWidth == 8 || BitWidth == 16)
+    return SDValue();
----------------
dmgreen wrote:
> This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest.
Updated accordingly.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7976
+
+  SmallVector<uint16_t> Table(BitWidth, 0);
+  for (long unsigned i = 0; i < BitWidth; i++) {
----------------
dmgreen wrote:
> Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.
Reverted to `uint8_t`.

For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as -

```
 .ascii  "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:"
```

Hence, earlier I had changed it to `uint16_t` where I was getting -

```
        .half   0                               # 0x0
        .half   1                               # 0x1
        .half   2                               # 0x2
        .half   7                               # 0x7
        .half   3                               # 0x3
        .half   13                              # 0xd
        .half   8                               # 0x8
        .half   19                              # 0x13
        .half   4                               # 0x4
        .half   25                              # 0x19
        .half   14                              # 0xe
        .half   28                              # 0x1c
        .half   9                               # 0x9
        .half   34                              # 0x22
        .half   20                              # 0x14
        .half   40                              # 0x28
        .half   5                               # 0x5
        .half   17                              # 0x11
        .half   26                              # 0x1a
        .half   38                              # 0x26
        .half   15                              # 0xf
        .half   46                              # 0x2e
        .half   29                              # 0x1d
        .half   48                              # 0x30
        .half   10                              # 0xa
        .half   31                              # 0x1f
        .half   35                              # 0x23
        .half   54                              # 0x36
        .half   21                              # 0x15
        .half   50                              # 0x32
        .half   41                              # 0x29
        .half   57                              # 0x39
        .half   63                              # 0x3f
        .half   6                               # 0x6
        .half   12                              # 0xc
        .half   18                              # 0x12
        .half   24                              # 0x18
        .half   27                              # 0x1b
        .half   33                              # 0x21
        .half   39                              # 0x27
        .half   16                              # 0x10
        .half   37                              # 0x25
        .half   45                              # 0x2d
        .half   47                              # 0x2f
        .half   30                              # 0x1e
        .half   53                              # 0x35
        .half   49                              # 0x31
        .half   56                              # 0x38
        .half   62                              # 0x3e
        .half   11                              # 0xb
        .half   23                              # 0x17
        .half   32                              # 0x20
        .half   36                              # 0x24
        .half   44                              # 0x2c
        .half   52                              # 0x34
        .half   55                              # 0x37
        .half   61                              # 0x3d
        .half   22                              # 0x16
        .half   43                              # 0x2b
        .half   51                              # 0x33
        .half   60                              # 0x3c
        .half   42                              # 0x2a
        .half   59                              # 0x3b
        .half   58                              # 0x3a
```

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7987
+      CA, getPointerTy(TD),
+      TD.getPrefTypeAlign(VT.getTypeForEVT(*DAG.getContext())));
+  Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
----------------
barannikov88 wrote:
> craig.topper wrote:
> > The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector.
> 
> Why the type of the elements specifically and not of the CA itself?
> AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.
Understood. Thanks!
Now, the alignment is of `CA`.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7989
+  Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
+  SDValue Load = DAG.getLoad(VT, dl, DAG.getEntryNode(),
+                             DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
----------------
dmgreen wrote:
> It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.
Removed Load. It is just `getExtLoad` with `MVT::i8` now.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7998
+                     LD->getMemoryVT(), Alignment, MMOFlags, LD->getAAInfo());
+  if (isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
+    EVT SetCCVT =
----------------
dmgreen wrote:
> gsocshubham wrote:
> > @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`?
> > 
> > It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering.
> I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {`
> But I'm not sure what this does to the profitability of the transform.
> 
> It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.
I have added above check!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128911/new/

https://reviews.llvm.org/D128911