[PATCH] D11428: Improve EmitLoweredSelect for contiguous pseudo CMOV instructions.

Thu Jul 30 12:20:11 PDT 2015

kbsmith1 added a comment.

Ahmed, thank you for the review.  Please see my responses to your comments, and I will be uploading an improved version in just a bit.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19785
@@ -19784,1 +19784,3 @@
 
+// Return true of it is OK for this CMOV pseudo-opcode to be cascaded
+// together with other CMOV pseudo-opcodes into a single basic-block with
----------------
ab wrote:
> of -> if
OK.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19788
@@ +19787,3 @@
+// conditional jump around it.
+static bool InstOKForCMOVGrouping(MachineInstr *MI) {
+  switch (MI->getOpcode()) {
----------------
ab wrote:
> What about something like the IMO more descriptive "isCMOVPseudo" ?
OK.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19790-19810
@@ +19789,23 @@
+  switch (MI->getOpcode()) {
+  case X86::CMOV_GR8:
+  case X86::CMOV_FR32:
+  case X86::CMOV_FR64:
+  case X86::CMOV_V4F32:
+  case X86::CMOV_V2F64:
+  case X86::CMOV_V2I64:
+  case X86::CMOV_V8F32:
+  case X86::CMOV_V4F64:
+  case X86::CMOV_V4I64:
+  case X86::CMOV_V16F32:
+  case X86::CMOV_V8F64:
+  case X86::CMOV_V8I64:
+  case X86::CMOV_GR16:
+  case X86::CMOV_GR32:
+  case X86::CMOV_RFP32:
+  case X86::CMOV_RFP64:
+  case X86::CMOV_RFP80:
+  case X86::CMOV_V8I1:
+  case X86::CMOV_V16I1:
+  case X86::CMOV_V32I1:
+  case X86::CMOV_V64I1:
+    return true;
----------------
ab wrote:
> Can you sort these?  I see the pseudo lowering switch is equally unordered, can you fix that too?
OK.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:19814
@@ +19813,3 @@
+  default:
+    break;
+  }
----------------
ab wrote:
> return false here?
OK.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20042
@@ +20041,3 @@
+    std::next(MachineBasicBlock::iterator(LastCMOV));
+  MachineBasicBlock::iterator MIIt;
+  MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
----------------
ab wrote:
> Since this isn't shared between the loops, can you define it in the for(;;) instead?
OK.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20044-20047
@@ +20043,6 @@
+  MachineBasicBlock::iterator SinkInsertionPoint = sinkMBB->begin();
+  struct RewriteRegT {
+    unsigned Op1Reg;
+    unsigned Op2Reg;
+  };
+  DenseMap<unsigned, RewriteRegT> RewriteTable;
----------------
ab wrote:
> IMO the field names don't add much information, as they're always accompanied with the same-name variable.   What about a pair instead of this struct?
Let me try a pair and see whether the code looks cleaner.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20097
@@ -19965,2 +20096,3 @@
 
-  MI->eraseFromParent();   // The pseudo instruction is gone now.
+  // Now remove all the CMOV(s).
+  for (MIIt = MIItBegin; MIIt != MIItEnd; )
----------------
ab wrote:
> That "all" bothers me ;)
> 
> -> "Now remove the CMOV(s)." ?
OK.

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:1-2
@@ +1,3 @@
+; RUN: llc < %s -mcpu=pentium -mtriple=i386-linux-gnu -o - | FileCheck %s 
+; RUN: llc < %s -mcpu=pentium4 -mattr=-cmov -mtriple=i386-linux-gnu -o - | FileCheck %s 
+
----------------
ab wrote:
> Why the explicit cpu?
> 
> Also, IIRC we always use the CMOV pseudos for SSE selects, so can you do x86_64 tests for those as well?  I don't expect interesting differences (except much more readable tests), so you can probably only use i386 for non-SSE types.
This specific test is meant to test the 32 bit CPU case, as that is where most of the CMOV pseudos occur. For this specific test, the few FP things in here need to use 32 bit cpu to test RFP register type CMOV pseudos. pseudo_cmov_lower1.ll tests the SSE/SSE2 type pseudo CMOVs (for 32 bit cpu) by using -mcpu=pentium, which I thought would be nice to explicitly say that this is a CPU without CMOV support.  I also wanted to make sure that the test worked properly when a CPU with CMOV support was used, but when cmov was turned off explicitly. I can add a x86_64 test based on pseudo_cmov_lower1.ll, or perhaps just as another run line in that test. I don't understand why x86_64 will make the tests any more readable, can you elaborate on that?

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:7
@@ +6,3 @@
+; CHECK-LABEL: foo1:
+; CHECK: js
+; CHECK-NOT: js
----------------
ab wrote:
> I think these could use more explicit tests, for the copies and PHIs.
Can you explain more what you are looking for here? I cannot really test for copies and phis in the resulting output assembly code, so maybe I just don't quite understand your comment.

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:217-222
@@ +216,8 @@
+; on the same condition.
+; Contrary to my expectations, this doesn't exercise the code for
+; CMOV_V8I1, CMOV_V16I1, CMOV_V32I1, or CMOV_V64I1.  Instead the selects all
+; get lowered into vector length number of selects, which all eventually turn
+; into a huge number of CMOV_GR8, which are all contiguous, so the optimization
+; kicks in as long as CMOV_GR8 is supported. I couldn't find a way to get
+; CMOV_V*I1 pseudo-opcodes to get generated.
+;
----------------
ab wrote:
> These probably require AVX512.  You can ignore them, I think.
Thanks for that info.


http://reviews.llvm.org/D11428