[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 16 10:03:07 PDT 2016
spatel added a comment.
In https://reviews.llvm.org/D23446#544797, @pgousseau wrote:
> I am more confident the OR case brings better performances because we will be replacing
>
> 48 85 FF test rdi,rdi
> 0F 94 C0 sete al
> 48 85 F6 test rsi,rsi
> 0F 94 C1 sete cl
> 08 C1 or cl,al
> 0F B6 C1 movzx eax,cl
> C3 ret
>
>
> by this:
>
>
> F3 0F BD CE lzcnt ecx,esi
> F3 0F BD C7 lzcnt eax,edi
> 09 C8 or eax,ecx
> C1 E8 05 shr eax,5
> C3 ret
>
>
> My plan now is to make the patch to handle the OR case only, what do you guys think?
> Would X86ISelLowering still be the best place if only supporting the OR case?
The OR case certainly looks better in isolation (less instructions, less code size). If you are measuring perf improvements from that alone, I think we can be more confident that the transform to lzcnt is the source of that improvement. It's still not clear to me how the micro-benchmark was improved so much for the simpler case.
To match the OR pattern, I think you would either add some code to visitOr() or add tablegen patterns if it is possible to match the DAG nodes that way.
https://reviews.llvm.org/D23446
More information about the llvm-commits
mailing list