[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 16 10:03:07 PDT 2016


spatel added a comment.

In https://reviews.llvm.org/D23446#544797, @pgousseau wrote:

> I am more confident the OR case brings better performances because we will be replacing
>
>   48 85 FF                test     rdi,rdi
>   0F 94 C0                sete     al
>   48 85 F6                test     rsi,rsi
>   0F 94 C1                sete     cl
>   08 C1                   or       cl,al
>   0F B6 C1                movzx    eax,cl
>   C3                      ret
>   
>
> by this:
>
>  
>   F3 0F BD CE             lzcnt    ecx,esi
>   F3 0F BD C7             lzcnt    eax,edi
>   09 C8                   or       eax,ecx
>   C1 E8 05                shr      eax,5
>   C3                      ret
>   
>
> My plan now is to make the patch to handle the OR case only, what do you guys think?
>  Would X86ISelLowering still be the best place if only supporting the OR case?


The OR case certainly looks better in isolation (less instructions, less code size). If you are measuring perf improvements from that alone, I think we can be more confident that the transform to lzcnt is the source of that improvement. It's still not clear to me how the micro-benchmark was improved so much for the simpler case.

To match the OR pattern, I think you would either add some code to visitOr() or add tablegen patterns if it is possible to match the DAG nodes that way.


https://reviews.llvm.org/D23446





More information about the llvm-commits mailing list