[PATCH] D11428: Improve EmitLoweredSelect for contiguous pseudo CMOV instructions.

Fri Jul 31 11:06:14 PDT 2015

ab added a comment.

Thanks for the update!  A couple answers inline.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20074-20075
@@ +20073,4 @@
+    // Add this PHI to the rewrite table.
+    RegRewriteTable[DestReg].first = Op1Reg;
+    RegRewriteTable[DestReg].second = Op2Reg;
+  }
----------------
Merge both lines into

    RegRewriteTable[DestReg] = std::make_pair(Op1Reg, Op2Reg)

?

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:2-3
@@ +1,4 @@
+; RUN: llc < %s -mcpu=pentium -mtriple=i386-linux-gnu -o - | FileCheck %s 
+; RUN: llc < %s -mcpu=pentium4 -mattr=-cmov -mtriple=i386-linux-gnu -o - | FileCheck %s 
+
+; This test checks that only a single js gets generated in the final code
----------------
> This specific test is meant to test the 32 bit CPU case, as that is where most of the CMOV pseudos occur. For this specific test, the few FP things in here need to use 32 bit cpu to test RFP register type CMOV pseudos.

Makes sense

> pseudo_cmov_lower1.ll tests the SSE/SSE2 type pseudo CMOVs (for 32 bit cpu) by using -mcpu=pentium, which I thought would be nice to explicitly say that this is a CPU without CMOV support. I also wanted to make sure that the test worked properly when a CPU with CMOV support was used, but when cmov was turned off explicitly.

That sounds like testing -mattr and CPU features, which should be done elsewhere, no?  Here, if all you want is to assert that we're compiling for a target without CMOV, -mattr=-cmov does exactly that (but could be left out with just i386).  IMO, explicit CPUs should only be used when the CPU actually matters (e.g. when testing scheduling models?).

> I can add a x86_64 test based on pseudo_cmov_lower1.ll, or perhaps just as another run line in that test.

Great!

> I don't understand why x86_64 will make the tests any more readable, can you elaborate on that?

Only because of the less noisy ABI, really.  For tests like foo4/foo5, with very explicit checks.

================
Comment at: test/CodeGen/X86/pseudo_cmov_lower.ll:8
@@ +7,3 @@
+; CHECK: js
+; CHECK-NOT: js
+define i32 @foo1(i32 %v1, i32 %v2, i32 %v3) nounwind {
----------------
Right now the tests check for a specific number of branches.  I'm also interested in the flow of data and register assignments, leading to the final return value.  Basically, the various MOVs in each block, and the final SUB.

Consider, for instance, a bug where one of the PHIs has inverted operands (or even the feature where opposite CCs lead to inverted PHI operands): we should test for that.

By the way, I was testing this out when I noticed that this:

```
define i32 @foo(i32 %v0, i32 %v1, i32 %v2, i32 %v3, i32 %v4) nounwind {
entry:
  %cmp = icmp slt i32 %v0, 0  ;; <- %v0 disables the opt,  %v1 enables it
  %v3.v4 = select i1 %cmp, i32 %v3, i32 %v4
  %v1.v2 = select i1 %cmp, i32 %v1, i32 %v2
  %sub = sub i32 %v1.v2, %v3.v4
  ret i32 %sub
}
```

seems to not trigger the optimization (I get JS twice).  Shouldn't it?

http://reviews.llvm.org/D11428