[PATCH] D113291: [AggressiveInstCombine] Lower Table Based CTTZ and enable it for AARCH64 in -O3

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 26 11:03:20 PDT 2022


craig.topper added a comment.

In D113291#3540303 <https://reviews.llvm.org/D113291#3540303>, @dmgreen wrote:

> In D113291#3539965 <https://reviews.llvm.org/D113291#3539965>, @efriedma wrote:
>
>>> as for the second I'm not sure when it would be profitable to transform back and emit the table
>>
>> You really just have to weigh it against the current default expansion on targets where ctlz/cttz aren't legal, which is `popcount(v & -v)`.  It should be a straightforward comparison, generally.  If you have popcount, use it.  If multiply is legal, use a table lookup.  Otherwise... maybe stick with the popcount expansion?  Probably any approach is expensive at that point.
>>
>> Compare the generated code for arm-eabi.
>>
>>> You may not want to do that for non-hot ctzs?
>>
>> As opposed to what, calling into compiler-rt?
>
> I was meaning - it can be difficult for the compiler to recognize _when_ a ctz is performance critical. If the size of the table is large (which I was possible over-estimating the size of in my mind), then you may not want to emit the table for every ctz in the program. Currently that places where this is used have said, from the fact that they wrote it this way, that these ctz's are important. It just depends on whether converting to a table is always better, and if the table it small enough to be reasonable in the common case. 32 x i8 doesn't sound too big compared to what I originally imagined, if that is all it needs.

For the table lookup, is there an algorithm for creating the special constant that is used in the multiply? Or would we just hardcode known constants for common sizes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113291/new/

https://reviews.llvm.org/D113291



More information about the llvm-commits mailing list