[PATCH] D113291: [AggressiveInstCombine] Lower Table Based CTTZ

Thu Aug 18 09:23:14 PDT 2022

djtodoro marked 3 inline comments as done.
djtodoro added a comment.

Thanks for the comments.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:493
+
+  return Matched == InputBits;
+}
----------------
dmgreen wrote:
> If the length of the table is larger than the InputBits, how are we sure that the matched elements will be the correct ones? Is this always guaranteed?
> I think I would have expected a check that `for each i=0..InputBits-1, Table[(Mul<<i)>>Shift] == i`. With a check that the index is in range. Are they always equivalent with the larger tables too?
Hmmm, can you please walk me through out an example?

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:552
+      LI->getPointerOperand()->getType()->isOpaquePointerTy()
+          ? LI->getType()
+          : LI->getPointerOperandType()->getNonOpaquePointerElementType();
----------------
dmgreen wrote:
> Can this always just use the Load type?
I think yes, nothing comes up to my mind that can break it.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:590
+
+  unsigned InputBits = ConstIsWide ? 64 : 32;
+
----------------
dmgreen wrote:
> It might be better to switch this logical around -
> ```
> unsigned InputBits = X1->getType()->getScalarSizeInBits();
> if (InputBits != 32 && InputBits != 64)
>   return false;
> ```
Sounds good.

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:593-594
+  // Shift should extract top 5..7 bits.
+  if (InputBits - Log2_32_Ceil(InputBits) != ShiftConst &&
+      InputBits - Log2_32_Ceil(InputBits) - 1 != ShiftConst)
+    return false;
----------------
dmgreen wrote:
> Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.
> 
> The -1 case is for a larger table with more elements but that can handle zero values?
>Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.
Right...
Bu you meant `Log2_64()`, right? It is a power of 2, since it is either 32 or 64, so no need to add any assert here.

>The -1 case is for a larger table with more elements but that can handle zero values?
```
int ctz2(unsigned x)
{
#define u 0
 static short table[64] =
  {
    32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
    10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
    31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
    30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
   };
 x = (x & -x) * 0x0450FBAF;
 return table[x >> 26];
 }
```

================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:485
+             m_ZExtOrSelf(m_LShr(
+                 m_ZExtOrSelf(m_Mul(m_c_And(m_Neg(m_Value(X1)), m_Deferred(X1)),
+                                    m_ConstantInt(MulConst))),
----------------
dmgreen wrote:
> djtodoro wrote:
> > dmgreen wrote:
> > > Does the extend between the lshr and the mul every happen? From what I can tell, the type of the VT should be based on the type of these operations.
> > It does not happen in all the cases.
> Do you have a test case where the extend is between the shift and the mul?
I was completely sure that I had a case for it, but I am not able to produce it actually -- so I deleted it for now.

================
Comment at: llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll:123
+
+define i32 @ctz7(i32 %x) {
+; CHECK-LABEL: @ctz7(
----------------
spatel wrote:
> This is identical to the previous test, so it is not adding value here.
> 
> I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3". 
> 
> When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR.
> This is identical to the previous test, so it is not adding value here.
Right, thanks!

>I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3".
>
>When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR.

I usually do create tests that way, but these may be stale a bit. I will add the ```PhaseOrdering``` test, thanks for the suggestion.

================
Comment at: llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll:78
+  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
+  %call = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* nonnull %x)
+  %1 = load i32, i32* %x, align 4
----------------
spatel wrote:
> dmgreen wrote:
> > It is better to pass x as a parameter, although I'm not sure it matter much where x comes from for the rest of the pattern.
> Right - as far as this patch is concerned, this is identical to the previous test, so it shouldn't be here.
> 
> See my earlier comment about PhaseOrdering tests if we want more end-to-end coverage for `opt -O3`.
Yes, agree.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113291/new/

https://reviews.llvm.org/D113291