[PATCH] D38143: Dynamic stack alignment for Thumb1
Renato Golin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 4 01:37:45 PDT 2017
rengolin added subscribers: compnerd, joerg.
rengolin added inline comments.
================
Comment at: lib/Target/ARM/Thumb1FrameLowering.cpp:357
+ const unsigned NrBitsToZero = countTrailingZeros(MFI.getMaxAlignment());
+ // Emit the following sequence, using R4 as a temporary, since we cannot use
+ // SP as a source or destination register for the shifts:
----------------
chill wrote:
> chill wrote:
> > asl wrote:
> > > rengolin wrote:
> > > > hard-coding r4 here is bound to create problems. You need to make sure it's saved earlier (and popped later).
> > > >
> > > > You also add four instructions to the prologue, which in Thumb1 is not great. It' better than bad codegen, of course, but you need to make sure how often the realignment will hit (from being fatal, I'm guessing not often), or if there's another way to do this (I can't think of anything).
> > > >
> > > > Welcoming comments from people with more Thumb1 experience than myself.
> > > Do we have register scavenger available here, so we could scavenge a spare register? Or it's too early to to this - before the frame is set up ?
> > Thanks for pointing that out. I've fixed the patch to add R4 to the set of callee saved register.
> >
> > Unfortunately, I don't know of a better way to do the alignment in Thumb1. The only other alternative can think of (mov + bic) would need one more scratch register and would have more limiting range of possible alignments.
> I've experimented with using a virtual scratch register and having the register scavenger later find a suitable hard register for it. I works ... kinda. The register R4 is used as a scratch anyway in function epilogue to restore the SP from the frame pointer, so using some other available register does not buy as anything. There's the option of using a virtual scratch register in epilogue too - but (besides triggering some bugs) the `llvm::emitThumbRegPlusImmediate` does not emit great code if the destination register is not a low register, e.g. we may get:
>
> ```
> movs r1, #-8
> rsbs r1, r1, #0
> add r1, r7
> mov sp, r1
> ```
>
> Also, this approach of using R4 as scratch is also employed by ARM/Thumb2 stack align code and I think this should be addressed by a separate patch.
Right, the fix works for me (rather, I can't think of anything better). @asl @compnerd @joerg?
https://reviews.llvm.org/D38143
More information about the llvm-commits
mailing list