[PATCH] D113291: [AggressiveInstCombine] Lower Table Based CTTZ

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 19 02:01:53 PDT 2022


dmgreen added inline comments.


================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:547
+  LoadInst *LI = dyn_cast<LoadInst>(&I);
+  if (!LI) return false;
+
----------------
One think I forgot to mention - llvm has a code style that is best explained as "just run clang-format on the patch". These returns are all in the wrong place, for example, and could do with a cleanup.


================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:595-596
+
+  Type *XType = X1->getType();
+  if (!XType->isIntegerTy(InputBits)) return false;
+
----------------
This is true by definition now.


================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:493
+
+  return Matched == InputBits;
+}
----------------
djtodoro wrote:
> dmgreen wrote:
> > If the length of the table is larger than the InputBits, how are we sure that the matched elements will be the correct ones? Is this always guaranteed?
> > I think I would have expected a check that `for each i=0..InputBits-1, Table[(Mul<<i)>>Shift] == i`. With a check that the index is in range. Are they always equivalent with the larger tables too?
> Hmmm, can you please walk me through out an example?
Hmm. No, I'm not sure I can.

I was thinking about the ctz2 case, and whether there could be cases where the u's have different values that make them "match", but the values are different that make them wrong. So the items in the table accessed by the DeBruijn constant would produce incorrect values, but there are still InputBits number of matches.
```
;; int ctz2(unsigned x)
;; {
;; #define u 0
;;  static short table[64] =
;;   {
;;     32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
;;     10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
;;     31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
;;     30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
;;   };
;; x = (x & -x) * 0x0450FBAF;
;; return table[x >> 26];
;; }
```

But I don't think that is something that can come up. I was finding it hard to prove, but if the Mul is InputBits in length there are only at most InputBits separate elements that it can access. And multiple elements cannot map successfully back to the same i. 

I ran a sat solver overnight, and it is still going but hasn't found any counter examples, which is a good sign. (It is able to find valid DeBruijn CTTZ tables given the chance).

It might be worth adding a comment explaining why this correctly matches the table in all cases.


================
Comment at: llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp:593-594
+  // Shift should extract top 5..7 bits.
+  if (InputBits - Log2_32_Ceil(InputBits) != ShiftConst &&
+      InputBits - Log2_32_Ceil(InputBits) - 1 != ShiftConst)
+    return false;
----------------
djtodoro wrote:
> dmgreen wrote:
> > Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.
> > 
> > The -1 case is for a larger table with more elements but that can handle zero values?
> >Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.
> Right...
> Bu you meant `Log2_64()`, right? It is a power of 2, since it is either 32 or 64, so no need to add any assert here.
> 
> >The -1 case is for a larger table with more elements but that can handle zero values?
> ```
> int ctz2(unsigned x)
> {
> #define u 0
>  static short table[64] =
>   {
>     32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
>     10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
>     31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
>     30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
>    };
>  x = (x & -x) * 0x0450FBAF;
>  return table[x >> 26];
>  }
> ```
Ah, yeah - I meant Log2_32, but delete the wrong part of the function name.


================
Comment at: llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll:23
+
+define dso_local i32 @ctz6(i64* nocapture readonly %b) {
+; CHECK-LABEL: @ctz6(
----------------
I usually remove dso_local


================
Comment at: llvm/test/Transforms/PhaseOrdering/lower-table-based-cttz.ll:22
+
+; Function Attrs: nounwind uwtable
+define dso_local i32 @ctz1(i32 noundef %x) {
----------------
Can remove the `; Function Attrs`


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113291/new/

https://reviews.llvm.org/D113291



More information about the llvm-commits mailing list