[llvm-dev] Handling register allocation on Propeller 2

Tue Feb 23 08:52:02 PST 2021

This is a complex situation, so I’m opting to ask the people in this list for assistance, as I’m still new to LLVM’s codebase.

The Parallax Propeller 2 (Henceforth P2) has 496 allocatable 32-bit registers, 512 total.
There are also two special registers, PTRA and PTRB, that have special semantics when used with memory reads/writes to permit incrementing/decrementing them in place and adding an index value to them. PTRA will likely be the stack register, but PTRB will likely be free for allocation.

First issue: Allocating all of them is a bad idea. Space needs to be left for interrupt handlers, core-local global data, etc. ideally the compiler would only use, say, 384 of them or less. Even more ideally, the amount a particular function uses would be configurable to permit situations where the developer needs more of the regfile to themself, but I have no idea how to approach that.

Second issue: When dealing with larger objects in the regfile, it is strongly advisable to keep them continuous. It’s possible to bulk-save/bulk-load any group of continuous registers in two instructions, at a rate of one saved per cycle. What’s be the best way to utilize this? As this also impacts, say, loading small arrays and structs into the regfile.

Third issue: The regfile can be indexed indirectly for cheap, and instructions exist to load individual aligned nibbles, bytes, and words from a reg into another reg (even indirectly, so array access works). Memory reads/writes are slow individually (9-26 cycles and 3-20 cycles respectively) so ideally this’ll be taken advantage of somehow. This would permit rapidly loading a small array or similar into the regfile and indexing it from there, which, if the array is used multiple times, would almost always be faster if it’s small, as the bulk read would take roughly 9-25 + read_amount cycles.

As far as I can tell, this isn’t an easy thing to take quick advantage of, as the regfile can be treated as a bank of fast core local memory (I.e. a zero page), and LLVM doesn’t seem immediately happy with this idea.

Fourth issue: The P2 has two flag regs, C and Z. All instructions that write them have the ability to control, individually, if the flag is written. Alongside this, Boolean operations and moves with the flags, and between the flags and any bit of a register, are all single instruction (and cheap-as-a-move) operations. What’d be a good way to take advantage of this?

The P2 as of now has no standard C calling convention (nor any calling convention suitable for that), so I’m also stuck trying to define a calling convention for this architecture. Any help with that would be appreciated as well, because I’m not familiar with the requirements nor general advice.

Sorry if this is a bit much to ask, any help and/or advice is appreciated.
—Braden N.