[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 15 09:15:30 PDT 2016


spatel added inline comments.

================
Comment at: lib/Target/X86/X86.td:182-184
@@ -181,2 +181,5 @@
                                       "Support LZCNT instruction">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast.
+def FeatureFastLZCNT : SubtargetFeature<"fastlzcnt", "HasFastLZCNT", "true",
+                                        "LZCNT instructions are fast">;
 def FeatureBMI     : SubtargetFeature<"bmi", "HasBMI", "true",
----------------
I would move this down with the other 'fake' features (ie, the other fast/slow attributes). Someday, we may come up with a better way to distinguish performance "features" from architectural ones. 

It would also be good to explain exactly what we mean by "fast" in this context.

Finally, use a hyphen to make this more readable: "fast-lzcnt".

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:30780-30796
@@ -30775,1 +30779,19 @@
 
+  // Patterns with interesting uses do not benefit from srl(ctlz)
+  // transformation.
+  auto hasInterestingUses = [](SDNode *N) {
+    bool HasInterestingUses = false;
+    for (auto Use : N->uses()) {
+      if (Use->getOpcode() == ISD::AND || Use->getOpcode() == ISD::OR ||
+          Use->getOpcode() == ISD::XOR || Use->getOpcode() == ISD::ADD ||
+          Use->getOpcode() == ISD::SUB || Use->getOpcode() == ISD::MUL ||
+          Use->getOpcode() == ISD::SDIV || Use->getOpcode() == ISD::UDIV ||
+          Use->getOpcode() == ISD::SREM || Use->getOpcode() == ISD::UREM ||
+          Use->getOpcode() == ISD::SELECT) {
+        HasInterestingUses = true;
+        break;
+      }
+    }
+    return HasInterestingUses;
+  };
+
----------------
I don't understand the need for this check, so at the least it needs a code comment to explain why it is here.
Related: if we're matching the pattern starting from a zext, doesn't that miss the icmp/icmp/or patterns that you were originally hoping to optimize in D22038?

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:5
@@ +4,3 @@
+; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-pc-linux -mattr=+lzcnt -mcpu=haswell | FileCheck --check-prefix=NOFASTLZCNT %s
+
----------------
Instead of specifying a different CPU, this RUN would be better if it also used btver2, but explicitly disabled the 'fast-lzcnt' attribute. That way we verify the codegen with and without the attribute while simultaneously verifying that btver2 has this attribute by default.

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:7
@@ +6,3 @@
+
+define i32 @foo1(i32 %a) {
+; CHECK-LABEL: foo1:
----------------
Please give each test a meaningful name and/or add a comment to explain exactly what each test is checking.

================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:81-100
@@ +80,21 @@
+
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK:       # BB#0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testw %di, %di
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT:    retq
+;
+; NOFASTLZCNT-LABEL: foo5:
+; NOFASTLZCNT:       # BB#0:
+; NOFASTLZCNT-NEXT:    xorl %eax, %eax
+; NOFASTLZCNT-NEXT:    testw %di, %di
+; NOFASTLZCNT-NEXT:    sete %al
+; NOFASTLZCNT-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
+; NOFASTLZCNT-NEXT:    retq
+  %cmp = icmp eq i16 %a, 0
+  %conv = zext i1 %cmp to i16
+  ret i16 %conv
+}
----------------
There should be at least one test of the 'HasInterestingUses' logic (if that logic really belongs in this patch).


https://reviews.llvm.org/D23446





More information about the llvm-commits mailing list