[PATCH] D128911: Emit table lookup from TargetLowering::expandCTTZ()

Sun Jul 24 11:39:50 PDT 2022

gsocshubham added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7870
+    SDValue CPIdx = DAG.getConstantPool(CA, getPointerTy(TD),
+					TD.getPrefTypeAlign(Elts[0]->getType()));
+    Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
----------------
barannikov88 wrote:
> gsocshubham wrote:
> > barannikov88 wrote:
> > > You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ.
> > From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment?
> > 
> > If I use array alignment `CA`, I get below assembly as compared to `SPARC/cttz.ll` assembly if array element alignment is used. What do you think? Should I update from `CPIdx` to `CA`?
> > 
> > ```
> > f:                                      ! @f
> >         .cfi_startproc
> > ! %bb.0:                                ! %entry
> >         mov     %o0, %o1
> >         cmp %o0, 0
> >         be      .LBB0_2
> >         mov     %g0, %o0
> > ! %bb.1:                                ! %entry
> >         sub %o0, %o1, %o0
> >         and %o1, %o0, %o0
> >         sethi 122669, %o1
> >         or %o1, 305, %o1
> >         smul %o0, %o1, %o0
> >         srl %o0, 27, %o0
> >         sethi %hi(.LCPI0_0), %o1
> >         add %o1, %lo(.LCPI0_0), %o1
> >         add %o1, %o0, %o2
> >         ldub [%o2+2], %o3
> >         ldub [%o2+3], %o4
> >         ldub [%o1+%o0], %o0
> >         ldub [%o2+1], %o1
> >         sll %o3, 8, %o2
> >         or %o2, %o4, %o2
> >         sll %o0, 8, %o0
> >         or %o0, %o1, %o0
> >         sll %o0, 16, %o0
> >         or %o0, %o2, %o0
> > ```
> I meant you should call
> `TD.getPrefTypeAlign(Elts->getType())`
> instead of
> `TD.getPrefTypeAlign(Elts[0]->getType()`
> Is the above assembly a result of such change, or did you do something different?
I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am taking alignment of an element from `RshrArr`. Is it fine?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7801
+      DAG.getNode(ISD::MUL, dl, VT, DAG.getNode(ISD::AND, dl, VT, Op, Neg),
+                  DAG.getConstant(DeBruijn.getZExtValue(), dl, VT)),
+      DAG.getConstant(ShiftAmt.getZExtValue(), dl, VT));
----------------
dmgreen wrote:
> I think we can make a constant from a APInt directly.
I tried replacing `DAG.getConstant(DeBruijn.getZExtValue(), dl, VT))` with just `DeBruijn` but there does not seem a direct conversion from `APInt` to SDValue`?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7806
+  std::vector<Constant *> Elts;
+  uint8_t RshrArr[32];
+
----------------
dmgreen wrote:
> This shouldn't be a plain C array. It's size is dependant on the BitWidth. `SmallVector<uint8_t> RshrArr(BitWidth, 0)` should create an array that is initialized to 0's with the correct size.
Changed it to SmallVector. Thanks!

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7810
+    APInt Lshr = DeBruijn.rotl(i);
+    unsigned int Rshr = Lshr.getZExtValue() >> ShiftAmt.getZExtValue();
+    RshrArr[Rshr] = i;
----------------
dmgreen wrote:
> `APInt Rshr = Lshr.lshr(ShiftAmt)`, then use `Rshr.getZExtValue()` in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds.
Done.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7814-7819
+  for (unsigned int i = 0; i < NumBitsPerElt; i++) {
+    SDValue Index = DAG.getConstant(RshrArr[i], dl, VT);
+    ConstantSDNode *IndexNode = cast<ConstantSDNode>(Index);
+    ConstantInt *CI =
+        const_cast<ConstantInt *>(IndexNode->getConstantIntValue());
+    Elts.push_back(CI);
----------------
dmgreen wrote:
> gsocshubham wrote:
> > dmgreen wrote:
> > > Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8.
> > > ```
> > > auto *CA = ConstantDataArray::get(*DAG.getContext(), RshrArr);
> > > ```
> > If I directly use `RshrArr`, I get below table in assembly -
> > 
> > ```
> > .LCPI0_0:
> >         .ascii  "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
> > ```
> > 
> > instead of -
> > 
> > ```
> > .LCPI0_0:
> >         .word   0                               ! 0x0
> >         .word   1                               ! 0x1
> >         .word   28                              ! 0x1c
> >         .word   2                               ! 0x2
> >         .word   29                              ! 0x1d
> >         .word   14                              ! 0xe
> >         .word   24                              ! 0x18
> >         .word   3                               ! 0x3
> >         .word   30                              ! 0x1e
> >         .word   22                              ! 0x16
> >         .word   20                              ! 0x14
> >         .word   15                              ! 0xf
> >         .word   25                              ! 0x19
> >         .word   17                              ! 0x11
> >         .word   4                               ! 0x4
> >         .word   8                               ! 0x8
> >         .word   31                              ! 0x1f
> >         .word   27                              ! 0x1b
> >         .word   13                              ! 0xd
> >         .word   23                              ! 0x17
> >         .word   21                              ! 0x15
> >         .word   19                              ! 0x13
> >         .word   16                              ! 0x10
> >         .word   7                               ! 0x7
> >         .word   26                              ! 0x1a
> >         .word   12                              ! 0xc
> >         .word   18                              ! 0x12
> >         .word   6                               ! 0x6
> >         .word   11                              ! 0xb
> >         .word   5                               ! 0x5
> >         .word   10                              ! 0xa
> >         .word   9                               ! 0x9
> >         .text
> >         .globl  f
> > 
> > ```
> Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth.
Done accordingly.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7829
+  Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
+  return DAG.getLoad(
+      VT, dl, DAG.getEntryNode(), DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
----------------
dmgreen wrote:
> The load should be loading MVT::i8, extended the result into VT.
Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8` in the return statement?

================
Comment at: llvm/test/CodeGen/SPARC/cttz.ll:4
+
+ at f.table = internal unnamed_addr constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1
+
----------------
barannikov88 wrote:
> Unused
Removed.

================
Comment at: llvm/test/CodeGen/SPARC/cttz.ll:26
+  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
+  %1 = icmp eq i32 %x, 0
+  %2 = select i1 %1, i32 0, i32 %0
----------------
barannikov88 wrote:
> Why not just `ret i32 %0` ?
Updated test accordingly.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128911/new/

https://reviews.llvm.org/D128911