[PATCH] Fix fastcc/tailcallopt for ARM64

Mon May 12 00:10:03 PDT 2014

Hi Tim,

Ping... Are you OK with this new patch?

Thanks,
-Jiangning

2014-05-08 17:26 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:

> Hi Tim,
>
> I updated with new patch and see my comments below.
>
> Thanks,
> -Jiangning
>
> ================
> Comment at: lib/Target/ARM64/ARM64FrameLowering.cpp:477
> @@ +476,3 @@
> +  // ARM64TargetLowering::LowerCall figures out ArgumentPopSize and keeps
> +  // it as the 2nd argument of ARM64ISD::TC_RETURN.
> +  NumBytes += ArgumentPopSize;
> ----------------
> Tim Northover wrote:
> > TCRETURN in the ARM64 world.
> Tim, it should be TC_RETURN rather than TCRETURN, I think.
>
> ================
> Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2146-2148
> @@ +2145,5 @@
> +  // AAPCS64 requires stack pointer must be 16-byte aligned.
> +  unsigned NumBytes = Subtarget->isTargetDarwin() ?
> +    CCInfo.getNextStackOffset() :
> +    RoundUpToAlignment(CCInfo.getNextStackOffset(), 16);
> +
> ----------------
> Tim Northover wrote:
> > Again, Darwin has the same constraints as other platforms here.
> >
> > I also think this might be the wrong place to do this, on reflection.
> I'd like to try & keep NumBytes unaligned (it's mostly corrected for by a
> RoundUpToAlignment in ARM64FrameLowering::eliminateCallFramePseudoInstr and
> the more accurate value is useful to PrologEpilogInserter).
> >
> > So perhaps it's best to call RoundUpToAlignment only where it's actually
> needed: line 2169 & 2390 by the looks of it. On the other hand, if it can
> be moved to eliminateCallFramePseudoInstr that would be even more
> consistent (that may make calculations impossible or unnecessarily
> difficult in ISelLowering though, so  twist yourself into contortions
> trying it).
> OK. I modified this and keep the raw stack size in NumBytes, and added
> RoundUpToAlignment to the locations you pointed out. I think you are
> correct.
>
> I also think it's quite normal to calculate and keep the stack size info
> in LowerCall, because at this moment we can easily know the callee we are
> handling, and logic is interleaved with judging if it is a tailcall. Moving
> to eliminateCallFramePseudoInstr sounds not quite reasonable to me.
>
> ================
> Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2236-2241
> @@ -2092,3 +2235,8 @@
>        unsigned LocMemOffset = VA.getLocMemOffset();
> -      SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset);
> -      PtrOff = DAG.getNode(ISD::ADD, DL, getPointerTy(), StackPtr,
> PtrOff);
> +      uint32_t OpSize = Flags.isByVal() ? Flags.getByValSize()*8 :
> +                                          VA.getLocVT().getSizeInBits();
> +      OpSize = (OpSize + 7) / 8;
> +      uint32_t BEAlign = 0;
> +      if (OpSize < 8 && !Subtarget->isLittleEndian())
> +        BEAlign = 8-OpSize;
> +      int32_t Offset = LocMemOffset + BEAlign;
> ----------------
> James Molloy wrote:
> > Tim Northover wrote:
> > > I think this is probably dodgy for byval arguments. Perhaps leave the
> big-endian changes for James to thrash out?
> > Yes, this looks like code that was in AArch64 but has been refactored
> for ARM64. It should all fall out in the wash as this code is now fully
> committed to ARM64 ToT.
> I merged with Jame's fix on TOT, but the logic should still be similar to
> this, and I think James almost kept the same logic here.
>
> ================
> Comment at: lib/Target/ARM64/ARM64ISelLowering.cpp:2247
> @@ +2246,3 @@
> +        int FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset,
> true);
> +
> +        DstAddr = DAG.getFrameIndex(FI, getPointerTy());
> ----------------
> Tim Northover wrote:
> > There's some trailing spaces on this line (& others). Might be an idea
> to clang-format the patch if they're not obvious in your editor.
> I see. Sorry about that! Now I know I can use git-clang-format to clean up
> my patch. Thanks for your kindly reminder.
>
> ================
> Comment at: lib/Target/ARM64/ARM64InstrInfo.td:89
> @@ -88,3 +88,3 @@
>                                             SDTCisSameAs<0,3>]>;
> -def SDT_ARM64TCRET : SDTypeProfile<0, 1, [SDTCisPtrTy<0>]>;
> +def SDT_ARM64TCRET : SDTypeProfile<0, -1, [SDTCisPtrTy<0>]>;
>  def SDT_ARM64PREFETCH : SDTypeProfile<0, 2, [SDTCisVT<0, i32>,
> SDTCisPtrTy<1>]>;
> ----------------
> Tim Northover wrote:
> > Couldn't this be a 2-argument node rather than a variadic one? You seem
> to have modified all uses.
> Yeah. I tried and it's amazing 2-argument works. Actually with this patch
> the TC_RETURN may have more arguments rather than 2, because
> addTokenForArgument will add dependence to register argument and RA will
> guarantee not to clobber them. Actually I don't quite understand why the
> pattern match can pass the argument check if I set argument to 2. But
> anyway, given that number 2 works, I get it changed in my new patch.
>
> http://reviews.llvm.org/D3633
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140512/80a07e52/attachment.html>