[PATCH] D128911: Emit table lookup from TargetLowering::expandCTTZ()

Tue Jul 26 06:21:35 PDT 2022

gsocshubham added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7968
+                  DAG.getConstant(DeBruijn.getZExtValue(), dl, VT)),
+      DAG.getConstant(ShiftAmt, dl, VT));
+
----------------
craig.topper wrote:
> This should be getShiftAmountConstant.
Updated accordingly.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7972
+  for (unsigned i = 0; i < BitWidth; i++) {
+    APInt Lshr = DeBruijn.rotl(i);
+    APInt Rshr = Lshr.lshr(ShiftAmt);
----------------
craig.topper wrote:
> What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But here `L` means left? But that means I don't know what `shr` means.
It was a typo in naming. I have updated it accordingly.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7978
+      cast<ConstantSDNode>(DAG.getConstant(RshrArr[0], dl, VT));
+  ConstantInt *CI = const_cast<ConstantInt *>(Index->getConstantIntValue());
+
----------------
craig.topper wrote:
> We shouldn't need a ConstantSDNode and ConstantInt to get the Type*. You can use `VT.getTypeForEVT(*DAG.getContext())` in place of `CI->getType()`
> 
> 
> Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT.
Done.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7986
+  return DAG.getLoad(
+      VT, dl, DAG.getEntryNode(), DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
+      MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), Alignment);
----------------
craig.topper wrote:
> `Lookup` needs a sextOrTrunc to the target's pointer type.
I have created a new variable `TargetLookup` using `sextOrTrunc()`.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7795
+  APInt DeBruijn(32, 0x077CB531U);
+  APInt BitWidth(32, NumBitsPerElt);
+  APInt ShiftAmt(32, (BitWidth - BitWidth.logBase2()).getZExtValue());
----------------
dmgreen wrote:
> gsocshubham wrote:
> > dmgreen wrote:
> > > Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size.
> > > 
> > > BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be `Bitwidth - Log2_32(BitWidth)`. I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two.
> > Here, do you mean to change `APInt DeBruijn(32, 0x077CB531U)` to `APInt DeBruijn(64, 0x0218A392CD3D5DBF)`?
> > 
> > `NumBitsPerElt` is 32 even in the case for `call i64 @llvm.cttz.i64(i64 %x, i1 true)`
> It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64:
> ```
> APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL)
> ```
> 
> For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful.
Done.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7829
+  Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
+  return DAG.getLoad(
+      VT, dl, DAG.getEntryNode(), DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
----------------
dmgreen wrote:
> gsocshubham wrote:
> > dmgreen wrote:
> > > The load should be loading MVT::i8, extended the result into VT.
> > Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8` in the return statement?
> We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly.
Created a load using `getZExtOrTrunc`. If type is illegal then I am returning `getExtLoad(ISD::ZEXTLOAD....`

Let me know your suggestions on above.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7853
+    unsigned int Lshr = DeBrujin << i;
+    unsigned int Rshr = Lshr >> 27;
+    RshrArr[Rshr] = i;
----------------
craig.topper wrote:
> gsocshubham wrote:
> > dmgreen wrote:
> > > gsocshubham wrote:
> > > > dmgreen wrote:
> > > > > I think this should technically be a rotl, because the DeBruijn sequences are cyclic. The top element in the sequence is always 0 though, so it probably doesn't make a difference in practice.
> > > > > 
> > > > > It may be better to us APInt for all the DeBrujin constants though. It is easier to be more precise with them, and it's useful when i64 constants are supported.
> > > > I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<` but this changes variable from `unsigned int` to `SDNode`.
> > > That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time.
> > > If DeBruijn is an APInt of the correct size `APInt DeBruijn(32, 0x077CB531U)`, then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added.
> > Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)`
> > 
> > Is it correct way? Can you suggest a way to simplify below usage of APInt?
> Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl.
Updated to `shl()`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128911/new/

https://reviews.llvm.org/D128911