[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 15 09:15:30 PDT 2016
spatel added inline comments.
================
Comment at: lib/Target/X86/X86.td:182-184
@@ -181,2 +181,5 @@
"Support LZCNT instruction">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast.
+def FeatureFastLZCNT : SubtargetFeature<"fastlzcnt", "HasFastLZCNT", "true",
+ "LZCNT instructions are fast">;
def FeatureBMI : SubtargetFeature<"bmi", "HasBMI", "true",
----------------
I would move this down with the other 'fake' features (ie, the other fast/slow attributes). Someday, we may come up with a better way to distinguish performance "features" from architectural ones.
It would also be good to explain exactly what we mean by "fast" in this context.
Finally, use a hyphen to make this more readable: "fast-lzcnt".
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:30780-30796
@@ -30775,1 +30779,19 @@
+ // Patterns with interesting uses do not benefit from srl(ctlz)
+ // transformation.
+ auto hasInterestingUses = [](SDNode *N) {
+ bool HasInterestingUses = false;
+ for (auto Use : N->uses()) {
+ if (Use->getOpcode() == ISD::AND || Use->getOpcode() == ISD::OR ||
+ Use->getOpcode() == ISD::XOR || Use->getOpcode() == ISD::ADD ||
+ Use->getOpcode() == ISD::SUB || Use->getOpcode() == ISD::MUL ||
+ Use->getOpcode() == ISD::SDIV || Use->getOpcode() == ISD::UDIV ||
+ Use->getOpcode() == ISD::SREM || Use->getOpcode() == ISD::UREM ||
+ Use->getOpcode() == ISD::SELECT) {
+ HasInterestingUses = true;
+ break;
+ }
+ }
+ return HasInterestingUses;
+ };
+
----------------
I don't understand the need for this check, so at the least it needs a code comment to explain why it is here.
Related: if we're matching the pattern starting from a zext, doesn't that miss the icmp/icmp/or patterns that you were originally hoping to optimize in D22038?
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:5
@@ +4,3 @@
+; RUN: llc < %s -mtriple=x86_64-pc-linux -mcpu=btver2 | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-pc-linux -mattr=+lzcnt -mcpu=haswell | FileCheck --check-prefix=NOFASTLZCNT %s
+
----------------
Instead of specifying a different CPU, this RUN would be better if it also used btver2, but explicitly disabled the 'fast-lzcnt' attribute. That way we verify the codegen with and without the attribute while simultaneously verifying that btver2 has this attribute by default.
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:7
@@ +6,3 @@
+
+define i32 @foo1(i32 %a) {
+; CHECK-LABEL: foo1:
----------------
Please give each test a meaningful name and/or add a comment to explain exactly what each test is checking.
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:81-100
@@ +80,21 @@
+
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK: # BB#0:
+; CHECK-NEXT: xorl %eax, %eax
+; CHECK-NEXT: testw %di, %di
+; CHECK-NEXT: sete %al
+; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT: retq
+;
+; NOFASTLZCNT-LABEL: foo5:
+; NOFASTLZCNT: # BB#0:
+; NOFASTLZCNT-NEXT: xorl %eax, %eax
+; NOFASTLZCNT-NEXT: testw %di, %di
+; NOFASTLZCNT-NEXT: sete %al
+; NOFASTLZCNT-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; NOFASTLZCNT-NEXT: retq
+ %cmp = icmp eq i16 %a, 0
+ %conv = zext i1 %cmp to i16
+ ret i16 %conv
+}
----------------
There should be at least one test of the 'HasInterestingUses' logic (if that logic really belongs in this patch).
https://reviews.llvm.org/D23446
More information about the llvm-commits
mailing list