[PATCH] D123782: [AArch64] Generate AND in place of CSEL for Table Based CTTZ lowering in -O3

Fri Apr 22 12:20:36 PDT 2022

rahular-rrlogic marked an inline comment as done.
rahular-rrlogic added a comment.

In D123782#3467076 <https://reviews.llvm.org/D123782#3467076>, @dmgreen wrote:

>>> As far as I understand this should be testing for select (icmp eq X, 0), 0, cttz X and converting it to and(cttz X, #bw-1). If so there are more conditions that need to be checked.
>>
>> What other conditions do you think should be added other than checking for 0 and cttz?
>
> It looks like this checks for the select/CSEL, and the icmp/SUBS with a 0, but not the "eq" yet, and not the "0" in the select/csel. It also doesn't check that the X in the icmp and the X in the cttz are the same?

Is the last check really needed? Both icmp and cttz use the value from the same register in the IR itself.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17464-17465
+    if (N1.getOpcode() == ISD::CTTZ) {
+      SDValue NumBitsMinusOne =
+          DAG.getConstant(31, SDLoc(N), N1.getValueType());
+      return DAG.getNode(ISD::AND, SDLoc(N), N1.getValueType(), N1,
----------------
dmgreen wrote:
> Why is it hard-coding 31, as opposed to checking the size of the type? Why can't we get here with an i64?
I had misunderstood this part. Fixed now, thank you.

================
Comment at: llvm/test/CodeGen/AArch64/table-based-cttz.ll:27
+  %1 = icmp eq i64 %x, 0
+  %2 = trunc i64 %0 to i32
+  %3 = select i1 %1, i32 0, i32 %2
----------------
dmgreen wrote:
> Can you add an i64 case without the truncate down to an i32?
Added.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123782/new/

https://reviews.llvm.org/D123782