[PATCH] D23446: [X86] Enable setcc to srl(ctlz) transformation on btver2 architectures.
pierre gousseau via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 15 11:04:10 PDT 2016
pgousseau added inline comments.
================
Comment at: lib/Target/X86/X86.td:182-184
@@ -181,2 +181,5 @@
"Support LZCNT instruction">;
+// On some architectures, such as AMD's Jaguar, LZCNT is fast.
+def FeatureFastLZCNT : SubtargetFeature<"fastlzcnt", "HasFastLZCNT", "true",
+ "LZCNT instructions are fast">;
def FeatureBMI : SubtargetFeature<"bmi", "HasBMI", "true",
----------------
spatel wrote:
> I would move this down with the other 'fake' features (ie, the other fast/slow attributes). Someday, we may come up with a better way to distinguish performance "features" from architectural ones.
>
> It would also be good to explain exactly what we mean by "fast" in this context.
>
> Finally, use a hyphen to make this more readable: "fast-lzcnt".
Sounds good will do.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:30780-30796
@@ -30775,1 +30779,19 @@
+ // Patterns with interesting uses do not benefit from srl(ctlz)
+ // transformation.
+ auto hasInterestingUses = [](SDNode *N) {
+ bool HasInterestingUses = false;
+ for (auto Use : N->uses()) {
+ if (Use->getOpcode() == ISD::AND || Use->getOpcode() == ISD::OR ||
+ Use->getOpcode() == ISD::XOR || Use->getOpcode() == ISD::ADD ||
+ Use->getOpcode() == ISD::SUB || Use->getOpcode() == ISD::MUL ||
+ Use->getOpcode() == ISD::SDIV || Use->getOpcode() == ISD::UDIV ||
+ Use->getOpcode() == ISD::SREM || Use->getOpcode() == ISD::UREM ||
+ Use->getOpcode() == ISD::SELECT) {
+ HasInterestingUses = true;
+ break;
+ }
+ }
+ return HasInterestingUses;
+ };
+
----------------
spatel wrote:
> I don't understand the need for this check, so at the least it needs a code comment to explain why it is here.
> Related: if we're matching the pattern starting from a zext, doesn't that miss the icmp/icmp/or patterns that you were originally hoping to optimize in D22038?
Sounds good, I will think of a better comment.
I added this check to be conservative for now as I noticed several worst code gen occurences (around 50% of matches in openssl).
I hope to address this and the icmp/icmp/OR case in another patch.
================
Comment at: test/CodeGen/X86/lzcnt-zext-cmp.ll:81-100
@@ +80,21 @@
+
+define i16 @foo5(i16 %a) {
+; CHECK-LABEL: foo5:
+; CHECK: # BB#0:
+; CHECK-NEXT: xorl %eax, %eax
+; CHECK-NEXT: testw %di, %di
+; CHECK-NEXT: sete %al
+; CHECK-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; CHECK-NEXT: retq
+;
+; NOFASTLZCNT-LABEL: foo5:
+; NOFASTLZCNT: # BB#0:
+; NOFASTLZCNT-NEXT: xorl %eax, %eax
+; NOFASTLZCNT-NEXT: testw %di, %di
+; NOFASTLZCNT-NEXT: sete %al
+; NOFASTLZCNT-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
+; NOFASTLZCNT-NEXT: retq
+ %cmp = icmp eq i16 %a, 0
+ %conv = zext i1 %cmp to i16
+ ret i16 %conv
+}
----------------
spatel wrote:
> There should be at least one test of the 'HasInterestingUses' logic (if that logic really belongs in this patch).
Yes I meant to add those, will do thanks.
https://reviews.llvm.org/D23446
More information about the llvm-commits
mailing list