[PATCH] Fix return sequence on armv4 thumb
Jon Roelofs
jonathan at codesourcery.com
Fri Aug 1 07:36:36 PDT 2014
I'll add some tests, and look into the register scavenger thing. I think I'm going to hold off on coalescing the POPs though until I'm convinced that it's even possible to do.
Thanks for the quick review comments!
Jon
================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:474
@@ -472,1 +473,3 @@
continue;
+ IsReturn = true;
+ if (STI.hasV4TOps() && !STI.hasV5TOps())
----------------
Renato Golin wrote:
> Since you're already creating a new boolean, why not make it IsV4TReturn and set it inside the conditional below, so that your special lowering only checks for a flag and not a bunch of them.
Ok.
================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:475
@@ +474,3 @@
+ IsReturn = true;
+ if (STI.hasV4TOps() && !STI.hasV5TOps())
+ continue;
----------------
Renato Golin wrote:
> A comment here hinting at the block you just added below would be nice. Something like:
>
> // ARMv4T requires BX, see below
Ok.
================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:495
@@ +494,3 @@
+ // POP {r3}
+ // BX r3
+ if (IsReturn && STI.hasV4TOps() && !STI.hasV5TOps()) {
----------------
Renato Golin wrote:
> This is probably going to create two POPs. I think a better way would be to detect the LR in the loop above and change it from PC to R3 and just add the BX magic here. It should be safe to assume R3 is fine, but you can add a check if you want.
Yes, it will create two POPs. ISTR that POPs require their register lists to be in ascending order with no duplicates, so I think swapping PC for R3 only works if reglist contains at most {r0,r1,r2,pc}. Please correct me here if I'm mistaken.
If I use a virtual register as Tim suggests, I think I'm going to have a hard time detecting the few cases where it is possible to coalesce these two POPs. Any suggestions on how to make that happen?
================
Comment at: llvm/lib/Target/ARM/Thumb1FrameLowering.cpp:502-503
@@ +501,4 @@
+ // Epilogue: pop LR to R3 and branch off it.
+ AddDefaultPred(BuildMI(MBB, MI, dl, TII.get(ARM::tPOP)))
+ .addReg(ARM::R3, RegState::Define);
+
----------------
Tim Northover wrote:
> I think R3 is potentially reserved for a return value. E.g.
>
> define i128 @foo() {
> ret i128 0
> }
>
> It's rather an edge-case (particularly on v4t, but then v4t is one big edge case). But we probably shouldn't break it anyway.
>
> The safest thing to do would be use the register scavenger. And if the machinery is wired up to restoreCalleeSavedRegisters (fingers crossed) the easiest way to do *that* is to just create a virtual register (even though it is post regalloc, it's got special handling in PrologEpilogInserter).
Ohhhh, good point. What about i256? That would eat up _all_ the lo registers, leaving us with no option to pop the link address. I guess for the ABI to be sane, that would have to go on the stack...
http://reviews.llvm.org/D4748
More information about the llvm-commits
mailing list