[PATCH] Fix fastcc/tailcallopt for ARM64

Thu May 8 02:26:36 PDT 2014

Hi Tim,

I updated with new patch and see my comments below.

Thanks,
-Jiangning

================
Comment at: lib/Target/ARM64/ARM64FrameLowering.cpp:477
@@ +476,3 @@
+  // ARM64TargetLowering::LowerCall figures out ArgumentPopSize and keeps
+  // it as the 2nd argument of ARM64ISD::TC_RETURN.
+  NumBytes += ArgumentPopSize;
----------------
Tim Northover wrote:
> TCRETURN in the ARM64 world.
Tim, it should be TC_RETURN rather than TCRETURN, I think.

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2146-2148
@@ +2145,5 @@
+  // AAPCS64 requires stack pointer must be 16-byte aligned.
+  unsigned NumBytes = Subtarget->isTargetDarwin() ?
+    CCInfo.getNextStackOffset() :
+    RoundUpToAlignment(CCInfo.getNextStackOffset(), 16);
+
----------------
Tim Northover wrote:
> Again, Darwin has the same constraints as other platforms here.
> 
> I also think this might be the wrong place to do this, on reflection. I'd like to try & keep NumBytes unaligned (it's mostly corrected for by a RoundUpToAlignment in ARM64FrameLowering::eliminateCallFramePseudoInstr and the more accurate value is useful to PrologEpilogInserter).
> 
> So perhaps it's best to call RoundUpToAlignment only where it's actually needed: line 2169 & 2390 by the looks of it. On the other hand, if it can be moved to eliminateCallFramePseudoInstr that would be even more consistent (that may make calculations impossible or unnecessarily difficult in ISelLowering though, so  twist yourself into contortions trying it).
OK. I modified this and keep the raw stack size in NumBytes, and added RoundUpToAlignment to the locations you pointed out. I think you are correct.

I also think it's quite normal to calculate and keep the stack size info in LowerCall, because at this moment we can easily know the callee we are handling, and logic is interleaved with judging if it is a tailcall. Moving to eliminateCallFramePseudoInstr sounds not quite reasonable to me.

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2236-2241
@@ -2092,3 +2235,8 @@
       unsigned LocMemOffset = VA.getLocMemOffset();
-      SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset);
-      PtrOff = DAG.getNode(ISD::ADD, DL, getPointerTy(), StackPtr, PtrOff);
+      uint32_t OpSize = Flags.isByVal() ? Flags.getByValSize()*8 :
+                                          VA.getLocVT().getSizeInBits();
+      OpSize = (OpSize + 7) / 8;
+      uint32_t BEAlign = 0;
+      if (OpSize < 8 && !Subtarget->isLittleEndian())
+        BEAlign = 8-OpSize;
+      int32_t Offset = LocMemOffset + BEAlign;
----------------
James Molloy wrote:
> Tim Northover wrote:
> > I think this is probably dodgy for byval arguments. Perhaps leave the big-endian changes for James to thrash out?
> Yes, this looks like code that was in AArch64 but has been refactored for ARM64. It should all fall out in the wash as this code is now fully committed to ARM64 ToT.
I merged with Jame's fix on TOT, but the logic should still be similar to this, and I think James almost kept the same logic here.

================
Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2247
@@ +2246,3 @@
+        int FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset, true);
+      
+        DstAddr = DAG.getFrameIndex(FI, getPointerTy());
----------------
Tim Northover wrote:
> There's some trailing spaces on this line (& others). Might be an idea to clang-format the patch if they're not obvious in your editor.
I see. Sorry about that! Now I know I can use git-clang-format to clean up my patch. Thanks for your kindly reminder.

================
Comment at: lib/Target/ARM64/ARM64InstrInfo.td:89
@@ -88,3 +88,3 @@
                                            SDTCisSameAs<0,3>]>;
-def SDT_ARM64TCRET : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;
+def SDT_ARM64TCRET : SDTypeProfile<0, -1, [SDTCisPtrTy<0>]>;
 def SDT_ARM64PREFETCH : SDTypeProfile<0, 2, [SDTCisVT<0, i32>, SDTCisPtrTy<1>]>;
----------------
Tim Northover wrote:
> Couldn't this be a 2-argument node rather than a variadic one? You seem to have modified all uses.
Yeah. I tried and it's amazing 2-argument works. Actually with this patch the TC_RETURN may have more arguments rather than 2, because addTokenForArgument will add dependence to register argument and RA will guarantee not to clobber them. Actually I don't quite understand why the pattern match can pass the argument check if I set argument to 2. But anyway, given that number 2 works, I get it changed in my new patch.

http://reviews.llvm.org/D3633