[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.

pierre gousseau via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 15 11:04:10 PDT 2016


pgousseau added inline comments.

================
Comment at: lib/Target/X86/X86.td:182-184
@@ -181,2 +181,5 @@
                                       "Support LZCNT instruction">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast.
+def FeatureFastLZCNT : SubtargetFeature<"fastlzcnt", "HasFastLZCNT", "true",
+                                        "LZCNT instructions are fast">;
 def FeatureBMI     : SubtargetFeature<"bmi", "HasBMI", "true",
----------------
spatel wrote:
> I would move this down with the other 'fake' features (ie, the other fast/slow attributes). Someday, we may come up with a better way to distinguish performance "features" from architectural ones. 
> 
> It would also be good to explain exactly what we mean by "fast" in this context.
> 
> Finally, use a hyphen to make this more readable: "fast-lzcnt".
Sounds good will do.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:30780-30796
@@ -30775,1 +30779,19 @@
 
+  // Patterns with interesting uses do not benefit from srl(ctlz)
+  // transformation.
+  auto hasInterestingUses = [](SDNode *N) {
+    bool HasInterestingUses = false;
+    for (auto Use : N->uses()) {
+      if (Use->getOpcode() == ISD::AND || Use->getOpcode() == ISD::OR ||
+          Use->getOpcode() == ISD::XOR || Use->getOpcode() == ISD::ADD ||
+          Use->getOpcode() == ISD::SUB || Use->getOpcode() == ISD::MUL ||
+          Use->getOpcode() == ISD::SDIV || Use->getOpcode() == ISD::UDIV ||
+          Use->getOpcode() == ISD::SREM || Use->getOpcode() == ISD::UREM ||
+          Use->getOpcode() == ISD::SELECT) {
+        HasInterestingUses = true;
+        break;
+      }
+    }
+    return HasInterestingUses;
+  };
+
----------------
spatel wrote:
> I don't understand the need for this check, so at the least it needs a code comment to explain why it is here.
> Related: if we're matching the pattern starting from a zext, doesn't that miss the icmp/icmp/or patterns that you were originally hoping to optimize in D22038?
Sounds good, I will think of a better comment.
I added this check to be conservative for now as I noticed several worst code gen occurences (around 50% of matches in openssl).
I hope to address this and the icmp/icmp/OR case in another patch.

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:81-100
@@ +80,21 @@
+
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK:       # BB#0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testw %di, %di
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT:    retq
+;
+; NOFASTLZCNT-LABEL: foo5:
+; NOFASTLZCNT:       # BB#0:
+; NOFASTLZCNT-NEXT:    xorl %eax, %eax
+; NOFASTLZCNT-NEXT:    testw %di, %di
+; NOFASTLZCNT-NEXT:    sete %al
+; NOFASTLZCNT-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; NOFASTLZCNT-NEXT:    retq
+  %cmp = icmp eq i16 %a, 0
+  %conv = zext i1 %cmp to i16
+  ret i16 %conv
+}
----------------
spatel wrote:
> There should be at least one test of the 'HasInterestingUses' logic (if that logic really belongs in this patch).
Yes I meant to add those, will do thanks.


https://reviews.llvm.org/D23446





More information about the llvm-commits mailing list