[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.

Wed Aug 17 07:44:35 PDT 2016

spatel added a comment.

In https://reviews.llvm.org/D23446#516878, @pgousseau wrote:

>


Looking again the codegen of openssl without the "hasInterestingUses" constraint the codegen does not seem worse in terms of speed, only the size is not as good as it could be but I think it is ok? Something must have gone wrong during my initial testing I suppose ...

I wasn't expecting a size difference. Can you provide more details about the size and perf changes that you see with this change? We may want to gate the transform based on 'optForSize'.


================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:3566-3567
@@ -3565,3 +3565,4 @@
 
-SDValue TargetLowering::lowerCmpEqZeroToCtlzSrl(SDValue Op,
-                                                SelectionDAG &DAG) const {
+llvm::SDValue
+llvm::TargetLowering::lowerCmpEqZeroToCtlzSrl(SDValue Op, EVT ExtTy,
+                                              SelectionDAG &DAG) const {
----------------
Remove "llvm::"

================
Comment at: lib/Target/X86/X86.td:265-266
@@ -264,1 +264,4 @@
                        "true", "Vector SQRT is fast (disable Newton-Raphson)">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast (as in as fast as
+// equivalent instructions).
+def FeatureFastLZCNT
----------------
The description still seems too vague. The key point is that lzcnt must have the same latency and throughput as test/set, right?

How about: "If lzcnt has equivalent latency/throughput to most simple integer ops, it can be used to replace test/set sequences."

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:85-92
@@ +84,10 @@
+; Test 16-bit input, 16-bit output.
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK:       # BB#0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testw %di, %di
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT:    retq
+;
----------------
lzcnt has a 16-bit variant in the ISA. Is there some reason not to use it here?


https://reviews.llvm.org/D23446