[PATCH] D113291: [WIP] Add LowerTableBasedCTZ and anable it for AARCH64 in -O3

Fri Nov 5 09:15:14 PDT 2021

craig.topper added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp:229
+  uint64_t MulConst, ShiftConst;
+  // Aarch64 has i64 for getelementpointer index so this match will probably
+  // fail for other targets.
----------------
Seems like this should be a TODO or FIXME or NOTE to make it more noticeable

================
Comment at: llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp:264
+
+  bool DefinedForZero = ConstData->getElementAsInteger(0) == InputBits;
+
----------------
If the value in element 0 isn't InputBits, don't you still need to make the lowering produce the same value as the table would have? For the example, in the bug report the value is 0 and the suggested lowering was ctlz(x) & 0x1f.

================
Comment at: llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp:267
+  IRBuilder<> B(LI);
+  ConstantInt *BoolConst = DefinedForZero ? B.getTrue() : B.getFalse();
+  auto Cttz = B.CreateIntrinsic(Intrinsic::cttz, {ArgXType}, {ArgX, BoolConst});
----------------
The polarity of this seems backwards. The second argument to ctlz intrinsic is called "zero_undef". It should be true if you don't care what value is produced for 0.

================
Comment at: llvm/test/Transforms/LowerTableBasedCTZ/AARCH64/lower-table-based-ctz-basics.ll:125
+  %idxprom = zext i32 %shr to i64
+  %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @ctz7.table, i64 0, i64 %idxprom
+  %0 = load i8, i8* %arrayidx, align 1, !tbaa !8
----------------
Why does this table name not match the function name?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113291/new/

https://reviews.llvm.org/D113291