[PATCH] D114611: [AVR] Expand STDWSPQRr & STDSPQRr, approach #2

Dylan McKay via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 15 04:27:05 PST 2021


dylanmckay added inline comments.


================
Comment at: llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp:1211
+template <>
+bool AVRExpandPseudo::expand<AVR::STDSPQRr>(Block &MBB, BlockIt MBBI) {
+  MachineInstr &MI = *MBBI;
----------------
note to self (and others if they care)

considerations I haven't verified yet:



  - AVRFrameLowering, the original location of this code, is earlier on in the codegen process than AVRExpandPseudoInsts. Before this patch, the allocation of Z and the replacements of the stack pointer with it were isolated to the "frame setup" pass(es) of LLVM. After this patch, the allocation of Z stays in the frame setup pass, but the substitutions of it are moved to a later AVRExpandPseudos pass. It's conceivable that other passes may clobber the valid ranges of that the Z instruction contains the frame pointer and inserting another usage of it later in the process will actually read an invalid valid. I haven't verified this yet



================
Comment at: llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp:1221
+                              ? AVR::R29R28
+                              : AVR::R31R30);
+
----------------
Patryk27 wrote:
> couchand wrote:
> > Perhaps I'm missing something, but where is Z being initialized?  I'm happy with the use of Y (I'm fine trusting the comment in the old code that Y is guaranteed to hold a copy of SP, though maybe I shouldn't just trust that).
> > 
> > The old code initialized Z immediately prior to converting the pseudo stores into real stores.  I can't see the same guarantee being made here.  Do we know that Z still holds the adjusted frame pointer value?  At first glance it seems like intervening instructions could easily clobber it.
> That's a great question, and the answer is: I'm not sure 😅
> 
> On a parallel matter, I wonder why we're using the `Z` in the first place - couldn't we always refer to R29R28? Seems like that's what the other backends (e.g. RISC-V) do.
It is safe to assume that the existing code is correctly initializing `Y` as the frame pointer, it does so here: https://github.com/llvm/llvm-project/blob/36b0325c442a3669c2eb2c6fcaeb2cb57445c851/llvm/lib/Target/AVR/AVRFrameLowering.cpp#L109-L111

The main complication around the frame pointer in the existing backend which is what I was worried about (not anymore), is some special hax in the somewhere in the backend that detects when the register allocator has allocated the Y register already but then subsequently figures out the function needs stack spills, which then forbids the previously valid allocation of the now-FP Y (the backend fixes this clash somehow). But this patch doesn't touch any of that so no worries there.

> The old code initialized Z immediately prior to converting the pseudo stores into real stores. I can't see the same guarantee being made here. Do we know that Z still holds the adjusted frame pointer value? At first glance it seems like intervening instructions could easily clobber it.

Yes, I agree. 

Obvious prerequisite context: at function entry, the stack pointer register `(SP)` is the frame pointer. The stack pointer never disappears.

If the function has no variable stack allocations (= a fixed stack frame size), then you don't actually need to have a frame pointer register to operate on the stack because you know the size of the stack (therefore the layout of the stack too) so you if you wanted to read/write to the frame within the function body, you could achieve this without a dedicated frame pointer register by directly reading/writing from SP into a temp reg, adjusting the pointer as needed as you know the stack layout, and executing the memory operation). If you were to write the assembly this way, you get the benefit of having 33.333..% extra pointer variables available to your code, because you never reserved Z(R28R29) for the frame pointer. The downside however is that every stack operation inside a function without a dedicated FP is now maybe 3-4 instructions rather than 1, etc.

The AVR backend does this in many cases. All functions have access to SP register no matter what, obvious. Some functions have access to a dedicated frame pointer register  
The AVR backend will dedicate Y(R28R29) as the dedicated frame pointer, and initialize it in function prologue, if and only if a dedicated frame pointer is "needed". Currently, a dedicated frame pointer is needed if [these conditions are satisfied](https://github.com/llvm/llvm-project/blob/36b0325c442a3669c2eb2c6fcaeb2cb57445c851/llvm/lib/Target/AVR/AVRFrameLowering.cpp#L232-L233).

So, if there are no spills, no `allocas`, and no arguments passed on the stack: then you won't have a dedicated frame pointer and if you wish to access the stack you'll need to manually store/save SP into a temp register of your choosing and do the operation that way. This is what the old code you're replacing was doing [here](https://github.com/llvm/llvm-project/blob/36b0325c442a3669c2eb2c6fcaeb2cb57445c851/llvm/lib/Target/AVR/AVRFrameLowering.cpp#L359-L371). At the very start and very end of the block, `Z`(R31R30) gets manually assigned as a temporary "frame pointer" register, **for the duration of the function epilogue basic block** that is passed into `fixStackStores` at the end. 

**Conclusion:**

So, the Z register can be considered a valid "temporary" frame pointer register and be used as such as you are doing. The key point though is that this substitution can only be done if: the function has no dedicated frame pointer (because only then will `AVRFrameLowering.cpp` store the stack pointer in Z register contain the frame pointer (that's still true after your patch) AND if you know for sure that the stack store/load instruction you're expanding originates from and only from the `AVRFrameLowering.cpp` `if (Opcode == TII.getCallFrameSetupOpcode())` conditional that is actually setting Z.


Perhaps we can add some assertion to this function that ensures that you're only doing the Z substitution if you know the previous criteria I mentioned are valid (no dedicated FP + the instruction must have come from the `if (Opcode == TII.getCallFrameSetupOpcode()) {` conditional below. There might be a better way to do it tho, I'm still thinking





CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114611/new/

https://reviews.llvm.org/D114611



More information about the llvm-commits mailing list