[PATCH] D11428: Improve EmitLoweredSelect for contiguous pseudo CMOV instructions.
Ahmed Bougacha
ahmed.bougacha at gmail.com
Thu Jul 30 10:56:24 PDT 2015
ab requested changes to this revision.
ab added a comment.
This revision now requires changes to proceed.
A few nits here and there.
Testing all types is nice, but isn't IMO the most interesting part: the PHI trickery is!
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19785
@@ -19784,1 +19784,3 @@
+// Return true of it is OK for this CMOV pseudo-opcode to be cascaded
+// together with other CMOV pseudo-opcodes into a single basic-block with
----------------
of -> if
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19788
@@ +19787,3 @@
+// conditional jump around it.
+static bool InstOKForCMOVGrouping(MachineInstr *MI) {
+ switch (MI->getOpcode()) {
----------------
What about something like the IMO more descriptive "isCMOVPseudo" ?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19790-19810
@@ +19789,23 @@
+ switch (MI->getOpcode()) {
+ case X86::CMOV_GR8:
+ case X86::CMOV_FR32:
+ case X86::CMOV_FR64:
+ case X86::CMOV_V4F32:
+ case X86::CMOV_V2F64:
+ case X86::CMOV_V2I64:
+ case X86::CMOV_V8F32:
+ case X86::CMOV_V4F64:
+ case X86::CMOV_V4I64:
+ case X86::CMOV_V16F32:
+ case X86::CMOV_V8F64:
+ case X86::CMOV_V8I64:
+ case X86::CMOV_GR16:
+ case X86::CMOV_GR32:
+ case X86::CMOV_RFP32:
+ case X86::CMOV_RFP64:
+ case X86::CMOV_RFP80:
+ case X86::CMOV_V8I1:
+ case X86::CMOV_V16I1:
+ case X86::CMOV_V32I1:
+ case X86::CMOV_V64I1:
+ return true;
----------------
Can you sort these? I see the pseudo lowering switch is equally unordered, can you fix that too?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19814
@@ +19813,3 @@
+ default:
+ break;
+ }
----------------
return false here?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20042
@@ +20041,3 @@
+ std::next(MachineBasicBlock::iterator(LastCMOV));
+ MachineBasicBlock::iterator MIIt;
+ MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
----------------
Since this isn't shared between the loops, can you define it in the for(;;) instead?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20044-20047
@@ +20043,6 @@
+ MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
+ struct RewriteRegT {
+ unsigned Op1Reg;
+ unsigned Op2Reg;
+ };
+ DenseMap<unsigned, RewriteRegT> RewriteTable;
----------------
IMO the field names don't add much information, as they're always accompanied with the same-name variable. What about a pair instead of this struct?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20097
@@ -19965,2 +20096,3 @@
- MI->eraseFromParent(); // The pseudo instruction is gone now.
+ // Now remove all the CMOV(s).
+ for (MIIt = MIItBegin; MIIt != MIItEnd; )
----------------
That "all" bothers me ;)
-> "Now remove the CMOV(s)." ?
================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:1-2
@@ +1,3 @@
+; RUN: llc < %s -mcpu=pentium -mtriple=i386-linux-gnu -o - | FileCheck %s
+; RUN: llc < %s -mcpu=pentium4 -mattr=-cmov -mtriple=i386-linux-gnu -o - | FileCheck %s
+
----------------
Why the explicit cpu?
Also, IIRC we always use the CMOV pseudos for SSE selects, so can you do x86_64 tests for those as well? I don't expect interesting differences (except much more readable tests), so you can probably only use i386 for non-SSE types.
================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:7
@@ +6,3 @@
+; CHECK-LABEL: foo1:
+; CHECK: js
+; CHECK-NOT: js
----------------
I think these could use more explicit tests, for the copies and PHIs.
================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:217-222
@@ +216,8 @@
+; on the same condition.
+; Contrary to my expectations, this doesn't exercise the code for
+; CMOV_V8I1, CMOV_V16I1, CMOV_V32I1, or CMOV_V64I1. Instead the selects all
+; get lowered into vector length number of selects, which all eventually turn
+; into a huge number of CMOV_GR8, which are all contiguous, so the optimization
+; kicks in as long as CMOV_GR8 is supported. I couldn't find a way to get
+; CMOV_V*I1 pseudo-opcodes to get generated.
+;
----------------
These probably require AVX512. You can ignore them, I think.
http://reviews.llvm.org/D11428
More information about the llvm-commits
mailing list