[clang] [AVR] make the AVR ABI Swift compatible (PR #72298)

Mon Jan 1 13:53:43 PST 2024

rjmccall wrote:

> That makes a lot of sense. Thank you John. I guess here are my thoughts. As I understand it, the `SwiftABIInfo` by default does something like "if something can be passed in 4 registers or fewer then pass by register, otherwise pass indirectly"? I think that sweet spot (also sort of reflected in Existential Containers I believe?) makes sense for 32 bit or 64 bit registers (and presumably the sorts of caching you'd expect in modern intel/arm larger machine architectures). We have 8 bit registers, which is quite different...

Yes, I see.  Pointers are 16-bit and would be passed in two registers, right?  You need to do more accurate counting than the default implementation.  The default implementation breaks up "large" integers (also specified by the ABI) when determining the scalar sequence, and then it assumes that all scalars count the same towards the limit.  You probably want to keep e.g. `i16`s and `ptr`s in the scalar sequence instead of splitting them into `i8`s, but you want to count them properly as two registers rather than one.  And then, yeah, maybe it makes sense to put the cap at something like 6 or even 4 registers.

> In our case, one complication is stack manipulation is fairly painful, we may be able to improve the AVR back end a bit but at the moment, just moving the stack pointer down for something like an alloca is 8 instructions (and then again 8 bytes in the function epilog).

On most architectures, functions that include calls will need to perform a stack adjustment on entry anyway — at the very least, for the function's own frame — but you don't need an extra adjustment for individual call sites because you include space for the maximum stack argument space usage of all the calls in the function in that initial adjustment.  Is that not how it's done on AVR?

> But, equally, the default C ABI (avr gcc abi) allows a struct to be split over as many as 18 registers, and we are often producing pretty inefficient code like this when we have large structs, moving registers around a lot either side of a call site. Which is something I really wanted to find a way to solve "one day" with Swift for Arduino/Microswift/Swift for AVR.

Yeah, 18 registers seems like it'd be way higher than you want.  I usually approach this from a code-size optimization perspective.  Passing in registers is great in the ideal situation: the caller is able to efficiently produce the argument into whatever argument register is convenient (e.g. loading, or loading an immediate, or taking the address of a local variable), and the callee is going to immediately use the component values of argument in whatever register it came in.  When that *isn't* true, you don't want passing in registers to be a highly punitive mistake.  That's why the cap is normally expressed in terms of the number of scalars involved rather than the size of the data: the assumption is that each scalar will require a separate load/store if we're not in the ideal case.  And for ISAs like AVR with non-orthogonal register use, I feel like the chances that taking complex data in registers will be useful to the callee are somewhat lower; and if spilling an argument to the stack retroactively is expensive, that also changes the balance.  So having a relatively low cap seems like the right thing to do.

> Probably it's reasonable to say that ideally, when lowering to AVR assembly from Swift, any struct larger than 8 bytes should be passed on the stack in our case. Do you think we can implement that?

It's important to distinguish the three ways of passing an argument: you can pass it in registers, you can pass it in the stack argument area, and you can pass a pointer to it.  I am generally of the opinion that passing arguments in the stack argument area is a waste and should only be used if you don't have a better choice available, e.g. because you have too many arguments and you've run out of registers.  This is because you usually can't forward such an argument efficiently: you have to copy it into a *new* stack argument area for the next call.  So the Swift CC usually says that if an argument is too big for the cap, it gets passed by pointer.

So I guess it depends on what you mean by "passed on the stack".  I would not recommend passing in the stack argument area.  But triggering pass-by-pointer on relatively small structures makes sense to me.

https://github.com/llvm/llvm-project/pull/72298