[PATCH] Fix return sequence on armv4 thumb

Renato Golin renato.golin at linaro.org
Fri Aug 1 08:05:14 PDT 2014


================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:495
@@ +494,3 @@
+  //   POP {r3}
+  //   BX r3
+  if (IsReturn && STI.hasV4TOps() && !STI.hasV5TOps()) {
----------------
Jon Roelofs wrote:
> Renato Golin wrote:
> > This is probably going to create two POPs. I think a better way would be to detect the LR in the loop above and change it from PC to R3 and just add the BX magic here. It should be safe to assume R3 is fine, but you can add a check if you want.
> Yes, it will create two POPs. ISTR that POPs require their register lists to be in ascending order with no duplicates, so I think swapping PC for R3 only works if reglist contains at most {r0,r1,r2,pc}. Please correct me here if I'm mistaken.
> 
> If I use a virtual register as Tim suggests, I think I'm going to have a hard time detecting the few cases where it is possible to coalesce these two POPs. Any suggestions on how to make that happen?
That's a good point. If you can use LR, than the problem is solved. If not, let's leave the two POPs for now and create a FIXME comment to that effect.

================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:502-503
@@ +501,4 @@
+    // Epilogue: pop LR to R3 and branch off it.
+    AddDefaultPred(BuildMI(MBB, MI, dl, TII.get(ARM::tPOP)))
+      .addReg(ARM::R3, RegState::Define);
+
----------------
Jon Roelofs wrote:
> Tim Northover wrote:
> > I think R3 is potentially reserved for a return value. E.g.
> > 
> >     define i128 @foo() {
> >      ret i128 0
> >     }
> > 
> > It's rather an edge-case (particularly on v4t, but then v4t is one big edge case). But we probably shouldn't break it anyway.
> > 
> > The safest thing to do would be use the register scavenger. And if the machinery is wired up to restoreCalleeSavedRegisters (fingers crossed) the easiest way to do *that* is to just create a virtual register (even though it is post regalloc, it's got special handling in PrologEpilogInserter).
> Ohhhh, good point. What about i256? That would eat up _all_ the lo registers, leaving us with no option to pop the link address. I guess for the ABI to be sane, that would have to go on the stack...
The AAPCS only allows you to use R0~R3, everything else goes on the stack. The confusion from using R3 comes from ARM's own docs suggesting it's a reasonable way of doing it (even not coalescing the POPs), but it fails to mention that large return values could use it.

I had a read on the old APCS and couldn't find any mentions and to why R3 should be used as a return register (though it says R0~R3 can be used for returning values). But since we don't follow the APCS any more, even on ARMv4T, I think we should discard any reference to that document.

Any other register would have to be spilled in the caller, which is just not viable. Tim's suggestion would be the only reasonable way of doing it as a generic case.

Without thinking too much, we may get away with POPing it into LR again, and BX LR. I haven't covered all tail-return / recursive cases, but it would be a way of not needing the register scavenger.

http://reviews.llvm.org/D4748






More information about the llvm-commits mailing list