[PATCH] D11428: Improve EmitLoweredSelect for contiguous pseudo CMOV instructions.

Ahmed Bougacha ahmed.bougacha at gmail.com
Thu Jul 30 10:56:24 PDT 2015


ab requested changes to this revision.
ab added a comment.
This revision now requires changes to proceed.

A few nits here and there.
Testing all types is nice, but isn't IMO the most interesting part: the PHI trickery is!


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19785
@@ -19784,1 +19784,3 @@
 
+// Return true of it is OK for this CMOV pseudo-opcode to be cascaded
+// together with other CMOV pseudo-opcodes into a single basic-block with
----------------
of -> if

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19788
@@ +19787,3 @@
+// conditional jump around it.
+static bool InstOKForCMOVGrouping(MachineInstr *MI) {
+  switch (MI->getOpcode()) {
----------------
What about something like the IMO more descriptive "isCMOVPseudo" ?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19790-19810
@@ +19789,23 @@
+  switch (MI->getOpcode()) {
+  case X86::CMOV_GR8:
+  case X86::CMOV_FR32:
+  case X86::CMOV_FR64:
+  case X86::CMOV_V4F32:
+  case X86::CMOV_V2F64:
+  case X86::CMOV_V2I64:
+  case X86::CMOV_V8F32:
+  case X86::CMOV_V4F64:
+  case X86::CMOV_V4I64:
+  case X86::CMOV_V16F32:
+  case X86::CMOV_V8F64:
+  case X86::CMOV_V8I64:
+  case X86::CMOV_GR16:
+  case X86::CMOV_GR32:
+  case X86::CMOV_RFP32:
+  case X86::CMOV_RFP64:
+  case X86::CMOV_RFP80:
+  case X86::CMOV_V8I1:
+  case X86::CMOV_V16I1:
+  case X86::CMOV_V32I1:
+  case X86::CMOV_V64I1:
+    return true;
----------------
Can you sort these?  I see the pseudo lowering switch is equally unordered, can you fix that too?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19814
@@ +19813,3 @@
+  default:
+    break;
+  }
----------------
return false here?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20042
@@ +20041,3 @@
+    std::next(MachineBasicBlock::iterator(LastCMOV));
+  MachineBasicBlock::iterator MIIt;
+  MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
----------------
Since this isn't shared between the loops, can you define it in the for(;;) instead?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20044-20047
@@ +20043,6 @@
+  MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
+  struct RewriteRegT {
+    unsigned Op1Reg;
+    unsigned Op2Reg;
+  };
+  DenseMap<unsigned, RewriteRegT> RewriteTable;
----------------
IMO the field names don't add much information, as they're always accompanied with the same-name variable.   What about a pair instead of this struct?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20097
@@ -19965,2 +20096,3 @@
 
-  MI->eraseFromParent();   // The pseudo instruction is gone now.
+  // Now remove all the CMOV(s).
+  for (MIIt = MIItBegin; MIIt != MIItEnd; )
----------------
That "all" bothers me ;)

-> "Now remove the CMOV(s)." ?

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:1-2
@@ +1,3 @@
+; RUN: llc < %s -mcpu=pentium -mtriple=i386-linux-gnu -o - | FileCheck %s 
+; RUN: llc < %s -mcpu=pentium4 -mattr=-cmov -mtriple=i386-linux-gnu -o - | FileCheck %s 
+
----------------
Why the explicit cpu?

Also, IIRC we always use the CMOV pseudos for SSE selects, so can you do x86_64 tests for those as well?  I don't expect interesting differences (except much more readable tests), so you can probably only use i386 for non-SSE types.

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:7
@@ +6,3 @@
+; CHECK-LABEL: foo1:
+; CHECK: js
+; CHECK-NOT: js
----------------
I think these could use more explicit tests, for the copies and PHIs.

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:217-222
@@ +216,8 @@
+; on the same condition.
+; Contrary to my expectations, this doesn't exercise the code for
+; CMOV_V8I1, CMOV_V16I1, CMOV_V32I1, or CMOV_V64I1.  Instead the selects all
+; get lowered into vector length number of selects, which all eventually turn
+; into a huge number of CMOV_GR8, which are all contiguous, so the optimization
+; kicks in as long as CMOV_GR8 is supported. I couldn't find a way to get
+; CMOV_V*I1 pseudo-opcodes to get generated.
+;
----------------
These probably require AVX512.  You can ignore them, I think.


http://reviews.llvm.org/D11428







More information about the llvm-commits mailing list