[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 18 09:33:23 PDT 2016
spatel added a comment.
In https://reviews.llvm.org/D23446#519225, @pgousseau wrote:
> - Disable transform if optForSize is true
optForSize is a very gray area: we allow speed optimizations even if they increase size if the speed vs. size trade-off is "large" for some definition of "large".
Can you post the detailed perf and size differences you're seeing with this change? I don't think the size change can be that big from what you've posted: lzcnt+shr is 7 bytes; {test/inc}+set is 5/6 bytes, but if there's an xor leading into it, that's 7/8 bytes.
Are there other size-increasing changes happening as side effects that I'm not accounting for? It's also not clear why this is a perf win for Jaguar. Sorry for taking this long to ask, but why is test+set slower?
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:86-93
@@ +85,10 @@
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK: # BB#0:
+; CHECK-NEXT: xorl %eax, %eax
+; CHECK-NEXT: testw %di, %di
+; CHECK-NEXT: sete %al
+; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT: retq
+;
+; NOFASTLZCNT-LABEL: foo5:
----------------
Oh...yuck. So we really should've done 'xor %eax, %eax' ahead of the lzcnt in that case? Please add a note about why we don't do 16-bit in the code comments and also here in the test case.
https://reviews.llvm.org/D23446
More information about the llvm-commits
mailing list