[PATCH] D32605: Recognize CTLZ builtin

Tue May 9 16:05:13 PDT 2017

joerg added a comment.

In https://reviews.llvm.org/D32605#750243, @evstupac wrote:

> In https://reviews.llvm.org/D32605#750202, @joerg wrote:
>
> >   if the CPU has no direct lowering for the intrinsic, this transform is beneficial only if the resulting intrinsic can be constant folded
>
>
> Why?
>  What about converting loop to countable?
>  What about clear range of CTLZ(X):  0 <= CTLZ(X) <= bitwidth(X)?

I don't think this transform changes anything about the countability of the loop, SCEV should certainly be able to do

> What about InstCombine optimizations (not sure how useful they are, but still)?
> 
>   // fold (srl (ctlz x), "5") -> x  iff x has one bit set (the low bit).
>   // select_cc seteq X, 0, sizeof(X), ctlz(X) -> ctlz(X) 
>   // select_cc seteq X, 0, sizeof(X), ctlz_zero_undef(X) -> ctlz(X)
>   // select_cc seteq X, 0, sizeof(X), cttz(X) -> cttz(X)
>   // select_cc seteq X, 0, sizeof(X), cttz_zero_undef(X) -> cttz(X)
> 
> .....

In https://reviews.llvm.org/D32605#750271, @evstupac wrote:

> In https://reviews.llvm.org/D32605#750247, @joerg wrote:
>
> > Yeah, but even the generic expansion results in ~19 instructions on ARMv4. Compare that to one instruction in the loop and it can hardly be said to be a general win.
>
>
> There should be at least 3 instructions in the loop: add, shift and branch. For 32 bit instruction it will run 16 iterations average, so 16 * 3 > 19.

You are miscounting. The very example you originally gave trades a shift+count based loop into a clz + increment based loop. Naively speaking without subtleties of the architecture,
that saves one instruction in the loop. Any expansion of clz will be worse most of the time.

The udivmodsi4.S implementation in compiler-rt is a good example -- if clz can be used, it provides a nice optimization. Otherwise the stupid linear checking performs better in the majority of cases. Please keep also in mind that we are talking about potentially executing fewer instructions vs blowing up .text here.

Repository:
  rL LLVM

https://reviews.llvm.org/D32605