[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.

Thu Aug 18 09:33:23 PDT 2016

spatel added a comment.

In https://reviews.llvm.org/D23446#519225, @pgousseau wrote:

> - Disable transform if optForSize is true

optForSize is a very gray area: we allow speed optimizations even if they increase size if the speed vs. size trade-off is "large" for some definition of "large".

Can you post the detailed perf and size differences you're seeing with this change? I don't think the size change can be that big from what you've posted: lzcnt+shr is 7 bytes; {test/inc}+set is 5/6 bytes, but if there's an xor leading into it, that's 7/8 bytes.

Are there other size-increasing changes happening as side effects that I'm not accounting for? It's also not clear why this is a perf win for Jaguar. Sorry for taking this long to ask, but why is test+set slower?

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:86-93
@@ +85,10 @@
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK:       # BB#0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testw %di, %di
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT:    retq
+;
+; NOFASTLZCNT-LABEL: foo5:
----------------
Oh...yuck. So we really should've done 'xor %eax, %eax' ahead of the lzcnt in that case? Please add a note about why we don't do 16-bit in the code comments and also here in the test case.

https://reviews.llvm.org/D23446