[llvm-dev] Register spilling fix for experimental 6502 backend
David Chisnall via llvm-dev
llvm-dev at lists.llvm.org
Sun Feb 14 02:07:53 PST 2016
On 13 Feb 2016, at 22:54, N. E. C. via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Other devs suggested reserving the zero page of memory (addresses $00-$FF) and
> treating it as a giant set of registers. And I don't know, because doesn't that
> sort of defeat the purpose of register allocation? It might be inevitable that
> we have to lie to LLVM about the 6502's registers, but, for now, I want to see
> how far I can get using the allocator as intended.
I don’t think that it contradicts the purpose of register allocation. Register allocation is really shorthand for operand location selection (though registers do have the nice property that, on the compiler’s side of the instruction decoder, they never alias). On a RISC-like register-register architecture, these are precisely the same. On modern x86 processors (at least when running in 32-bit mode), if you don’t move the stack pointer then the top few stack slots are stored in rename registers so register-memory instructions that operate on them can use the same rename logic as register-register instructions, so it would be entirely valid to treat a few esp-relative addresses that the compiler guarantees are not address taken as registers.
For the 6502, you might want to look at the papers describing compilers for the Xerox Alto as a good reference. As was common in that era, they defined a virtual instruction set implemented in microcode as the compiler target. The Alto’s instruction set was specifically designed to be used in this way, with compilers generating bytecode and shipping a bytecode interpreter. If you think that this is slow, remember that Smalltalk on the Alto ran an entire multitasking GUI on a processor not much faster than a 6502, though with a bit more RAM. Without long pipelines, a jump-threaded bytecode interpreter is very fast. See also P-code from the UCSD Pascal compiler.
SWEET16 is an example of the same concept for the 6502, providing effectively a 16-bit virtual ISA for compilers and interpreters to target. When targeting a processor like the 6502 from a vaguely high-level language (which, given the amount of structure that it implies, includes LLVM IR in this context), you are going to find so many repeated instruction sequences that it will be very difficult to fit within the memory constraints if you expand them at every use.
Even Clang and GCC targeting modern architectures end up doing this to some degree, for example providing software floating point routines in compiler-rt / libgcc. If you tried to inline them at every use, then you’d end up with code that would suffer from instruction cache footprint on larger processors and completely fail to fit on M-profile processors.
If I were writing a 6502 back end, then I would define an SWEET16-derived virtual ISA, include the 6502 assembly for that code in compiler-rt, and target that ISA from the LLVM back end. I’d end up with much denser code at the end and not have to fight the LLVM infrastructure so much,
More information about the llvm-dev