[PATCH] D32605: Recognize CTLZ builtin

Tue May 9 16:36:12 PDT 2017

joerg added a comment.

In https://reviews.llvm.org/D32605#750309, @evstupac wrote:

> If loop is just converted to countable other optimizations are applicable like unroll, LSR, vectorization... with potential great impact.

That is something SCEV should be able to discover on its own.

> 
> 
>> The udivmodsi4.S implementation in compiler-rt is a good example -- if clz can be used, it provides a nice optimization. Otherwise the stupid linear checking performs better in the majority of cases. Please keep also in mind that we are talking about potentially executing fewer instructions vs blowing up .text here.
> 
> If we apply the optimization only in case whole loop is converted to CTLZ this is ok?

Replacing the full loop with the intrinsic is ok. The current default lowering is broken, but improving that is orthogonal. I.e. from a code size perspective, trading the loop for a libcall is still an improvement when using an optimized library version.

> If just convert to countable - then there could be corner cases, which we can guard with TTI for architects that get regressions (if we get).

Hoisting the computation out of the loop without removing it should be guarded by the CPU support for CTLZ, correct.

Repository:
  rL LLVM

https://reviews.llvm.org/D32605