[PATCH] Fix fastcc/tailcallopt for ARM64

Wed May 7 02:55:41 PDT 2014

Hi Jiangning,

Thanks very much for working on this! It's an interesting area, but a very tricky one. Mostly it looks OK, I've just got a few comments:

================
Comment at: lib/Target/ARM64/ARM64FrameLowering.cpp:477
@@ +476,3 @@
+  // ARM64TargetLowering::LowerCall figures out ArgumentPopSize and keeps
+  // it as the 2nd argument of ARM64ISD::TC_RETURN.
+  NumBytes += ArgumentPopSize;
----------------
TCRETURN in the ARM64 world.

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:1769
@@ +1768,3 @@
+      // any case:
+      StackArgSize = RoundUpToAlignment(StackArgSize, 16);
+    }
----------------
Darwin definitely does need the stack to be aligned to 16 bytes: a segfault occurs if a misaligned SP is ever used.

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2146-2148
@@ +2145,5 @@
+  // AAPCS64 requires stack pointer must be 16-byte aligned.
+  unsigned NumBytes = Subtarget->isTargetDarwin() ?
+    CCInfo.getNextStackOffset() :
+    RoundUpToAlignment(CCInfo.getNextStackOffset(), 16);
+
----------------
Again, Darwin has the same constraints as other platforms here.

I also think this might be the wrong place to do this, on reflection. I'd like to try & keep NumBytes unaligned (it's mostly corrected for by a RoundUpToAlignment in ARM64FrameLowering::eliminateCallFramePseudoInstr and the more accurate value is useful to PrologEpilogInserter).

So perhaps it's best to call RoundUpToAlignment only where it's actually needed: line 2169 & 2390 by the looks of it. On the other hand, if it can be moved to eliminateCallFramePseudoInstr that would be even more consistent (that may make calculations impossible or unnecessarily difficult in ISelLowering though, so  twist yourself into contortions trying it).

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2236-2241
@@ -2092,3 +2235,8 @@
       unsigned LocMemOffset = VA.getLocMemOffset();
-      SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset);
-      PtrOff = DAG.getNode(ISD::ADD, DL, getPointerTy(), StackPtr, PtrOff);
+      uint32_t OpSize = Flags.isByVal() ? Flags.getByValSize()*8 :
+                                          VA.getLocVT().getSizeInBits();
+      OpSize = (OpSize + 7) / 8;
+      uint32_t BEAlign = 0;
+      if (OpSize < 8 && !Subtarget->isLittleEndian())
+        BEAlign = 8-OpSize;
+      int32_t Offset = LocMemOffset + BEAlign;
----------------
I think this is probably dodgy for byval arguments. Perhaps leave the big-endian changes for James to thrash out?

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2247
@@ +2246,3 @@
+        int FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset, true);
+      
+        DstAddr = DAG.getFrameIndex(FI, getPointerTy());
----------------
There's some trailing spaces on this line (& others). Might be an idea to clang-format the patch if they're not obvious in your editor.

================
Comment at: lib/Target/ARM64/ARM64InstrInfo.td:89
@@ -88,3 +88,3 @@
                                            SDTCisSameAs<0,3>]>;
-def SDT_ARM64TCRET : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;
+def SDT_ARM64TCRET : SDTypeProfile<0, -1, [SDTCisPtrTy<0>]>;
 def SDT_ARM64PREFETCH : SDTypeProfile<0, 2, [SDTCisVT<0, i32>, SDTCisPtrTy<1>]>;
----------------
Couldn't this be a 2-argument node rather than a variadic one? You seem to have modified all uses.

http://reviews.llvm.org/D3633