[llvm-dev] Question about target instruction optimization

Thu Jul 26 00:23:52 PDT 2018

Yes, "crippled" is the right word to describe some areas of the 
instruction set.
You are right with the register comparison with X86's regs.

Indeed, the idea is to save IX in a function's prologue and assign the 
stack pointer to it in a function's prologue, and in the epilogue, 
restore the SP from IX and IX's original value *if* the function needs a 
frame (i.e. has parameters or variables on the stack).
Plus, even adjusting the stack pointer to outside the local storage area 
requires sacrificing the HL register pair for larger amounts (or a 
series of "inc sp" or dummy-pushes for shorter ones). So the emitted 
code for the frame-setup will be quite dynamic, depending on the 
circumstances, but can be done with ~3-4 cases.

Stack access to locals / spilled vars within the boundaries of the 
offset range (-128 - +127) will be done via IX, and any access to 
outside the range will require address calculation from the current 
stack base via HL (loading the offset into HL and add SP).

And even then, saving / spilling a physical register (16 bit pair) into 
a stack position within the range of IX is a costly sequence (LD 
(IX+n),low8 / LD (IX+(n+1)),high8 to store a value, and the reverse to 
restore it) - 8 bytes in total, and IX/IY-instructions are always slower 
than operations with other regs - so "push" and "pop" for temporary 
short-term (single) register spilling would be preferred - those need 
only 1 byte and save / restore the whole 16 bit value in one go.
Not sure how to tell LLVM to do so, though.

However, for functions small enough to do any computation in the 
available registers, or where spilling can be limited to some push and 
pop operations, the whole call frame setup can be skipped at all. If a 
few params can be passed through registers, the resulting code can be as 
efficient as hand-written assembly (or even more, regarding the 
capabilities of the SSA).

Knowing that, a developer will have control over it to at least *some* 
degree. Making a local var "static" would allow the compiler to use the 
efficient instruction to store and restore a 16 bit variable directly at 
a memory address (for the cost of losing recursion and the that the 
optimizer could keep it in a register).
But then, using existing C compilers for the Z80 (from *way* back then) 
was always a game of compiling, checking the emitted code, rearrange, 
build, check again if the function is time-critical, or writing it in 
assembly. In most cases, the just just needs to be "good enough", and 
good compilers achieved 50-90% of the performance of hand-written assembly.
And of course, I expect LLVM to beat that >;->

I started off with jacobly0's (E)Z80 backend heritage, made it build 
with a recent version of LLVM again and try to understand the 
shortcomings and areas of improvement.
It is targeted at EZ80's more powerful instruction set, introduced a lot 
of custom code into the LLVM base to support ZE80's 24 bit native 
pointers and custom binary file output (which caused it to no longer 
work with the current LLVM codebase), and last but not least, is incomplete
I created a project area in Github 
(https://github.com/MI-CHI/llvm-z80-backend) where I'm started working 
with a friend from the MSX community on that, with - apart from making a 
decent Z80 backend - some long term goals such as banking support for a 
"far" memory model to be able to compile "bigger" applications such as a 
ZIP archiver. Might end-up horribly slow, though :D
As of now, we are in the phase to get a feeling on how to do things in 
the LLVM backend and to define the rules for the instruction lowering, 
calling conventions and frame setup (shamelessly studying the code that 
old Z80 C compilers emit ;-)

Michael