[PATCH] First stage of call lowering for Mips fast-isel

Tue Oct 21 06:46:49 PDT 2014

This direction is a big improvement on the custom C++ O32 implementation used by SelectionDAG. I'm not suggesting we do this straight away but I think there are further improvements we can make. In particular, I think we can remove the conversion from stack arguments to register arguments fairly easily.

The main rules that make O32 tricky are:
1. The fixed portion of a vararg function is unaffected by the use of varargs.
2. The variable portion of a vararg function is passed using integer registers and the stack only.
3. The second argument is passed in a floating point register if it is a floating point argument and the first argument was also floating point

1 and 2 are tricky because the tablegen-erated code isn't given information about which arguments are part of the variable portion. The information we need is in ArgListEntry::IsFixed. I've already had to solve this problem to remove our custom implementation of CCState::AnalyzeCallOperands. This change is part of a fairly long WIP branch so here's a link to the commit on my github repo https://github.com/dsandersimgtec/llvm/commit/0a9991fc220dbf8954607f1c9af2fc61d0378532. To summarize the approach, MipsCCState (a subclass of CCState) holds a vector containing the values of ArgListEntry::IsFixed for each argument and is populated before calling CCState::AnalyzeCallOperands, we then use CCIf to access this vector from the tablegen-erated code. The same trick is used to gain access to the original argument type so that we can identify f128 arguments.

3 is tricky because the rule depends on outcome of the previous argument. I haven't tried this yet but I believe we can use CCAssignToRegWithShadow to implement the rule. Just to illustrate my thoughts, here is a toy example:
  CCIfType<[f32], CCAssignToRegWithShadow<[F12, F14], [A0, A1]>>
  CCAssignToRegWithShadow<[A0, A1, A2, A3], [F12AndF14, F12AndF14, F12AndF14, F12AndF14]>
where F12AndF14 is a register containing F12, and F14 as subregisters. I believe this behaves as follows:
* Given (i32 %a, f32 %b): %a is allocated to A0 and prevents the allocation of F12 and F14. %b is allocated to A1.
* Given (f32 %a, f32 %b): %a is allocated to F12 and prevents the allocation of A0. %b is allocated to F14 and prevents the allocation of A1
* Given (f32 %a, i32 %b): %a is allocated to F12 and prevents the allocation of A0. %b is allocated to A1 and prevents the allocation of F14.

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:770
@@ +769,3 @@
+
+  emitInst(Mips::ADJCALLSTACKDOWN).addImm(16);
+  // Process the args.
----------------
The 16-bytes reserved space is allocated by the prologue of the callee not the caller.

However, I discovered something strange in the SelectionDAG implementation this morning. MipsTargetLowering::LowerCall() calls CCInfo.AllocateStack() to allocate the reserved argument area even though LowerCall is lowering from the callers point of view. Removing the call breaks the frame size calculation so it's clearly important but I haven't figured out why. I would have expected the one in MipsTargetLowering::LowerFormalArguments() to be the important one.

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:786
@@ +785,3 @@
+    } else if (i == 1) {
+      if ((firstMVT == MVT::f32) || (firstMVT == MVT::f64)) {
+        if (ArgVT == MVT::f32) {
----------------
We don't need to track the precise MVT. Just whether it was floating point or not.

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:828-829
@@ +827,4 @@
+    }
+    case CCValAssign::AExt:
+    // Intentional fall-through.
+    case CCValAssign::ZExt: {
----------------
It makes no difference for 32-bit but AExt is normally treated the same way as SExt by our backend.

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:843-849
@@ +842,9 @@
+    // Now copy/store arg to correct locations.
+    if (VA.isRegLoc() && !VA.needsCustom()) {
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
+              TII.get(TargetOpcode::COPY), VA.getLocReg()).addReg(ArgReg);
+      CLI.OutRegs.push_back(VA.getLocReg());
+    } else if (VA.needsCustom()) {
+      // FIXME: Handle custom args.
+      return false;
+    } else {
----------------
We don't use custom arguments. We ought to delete the references to VA.needsCustom()

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:857
@@ +856,3 @@
+      // Need to store on the stack.
+      unsigned ArgSize = (ArgVT.getSizeInBits() + 7) / 8;
+
----------------
We should use RoundUpToAlignment()

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:878-879
@@ +877,4 @@
+
+  if (NumBytes < 16)
+    NumBytes = 16;
+  return true;
----------------
Didn't we do this on line 767 too?

================
Comment at: lib/Target/Mips/MipsFastISel.cpp:971
@@ +970,3 @@
+  MachineInstrBuilder MIB =
+      BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Mips::JALR),
+              Mips::RA).addReg(Mips::T9);
----------------
This isn't correct for MIPS32r6 and I've realised that we don't guard against MIPS32r6. hasMips32() is true for MIPS32, MIPS32r2, MIPS32r6, MIPS64, MIPS64r2, and MIPS64r6. Just correct the main guard for now.

================
Comment at: lib/Target/Mips/MipsISelLowering.h:605
@@ +604,3 @@
+
+  class MipsCCState : public CCState {
+  private:
----------------
I think this class should get it's own header and implementation file.

http://reviews.llvm.org/D5714