[llvm-dev] Question about target instruction optimization

Wed Jul 25 10:42:21 PDT 2018

This is a question about optimizing the code generation in a (new) Z80 
backend:

The CPU has a couple of 8 bit physical registers, e.g. H, L, D and E, 
which are overlaid in 16 bit register pairs named HL and DE.

It has also a native instruction to load a 16 bit immediate value into a 
16 bit register pair (HL or DE), e.g.:

     LD HL,<imm16>

Now when having a sequence of loading two 16 bit register pairs with the 
*same* immediate value, the simple approach is:

     LD HL,<imm16>
     LD DE,<imm16>

However, the second line can be shortened (in opcode bytes and cycles) 
to load the overlaid 8 bit registers of HL (H and L) into the overlaid 8 
bit registers of DE (D and E), so the desired result is:

     ; optimized version: saves 1 byte and 2 cycles
     LD D,H    (sets the high 8 bits of DE from the high 8 bits of HL)
     LD E,L    (same for lower 8 bits)

Another example: If reg pair DE needs to be loaded with imm16 = 0, and 
another physical(!) register is known to be 0 (from a previous immediate 
load, directly or indirectly) - assuming that L = 0 (H might be 
something else) - the following code:

     LD DE,0x0000

should become:

     LD D,L
     LD E,L

I would expect that this needs to be done in a peephole optimizer pass, 
as during the lowering process, the physical registers are not yet assigned.

Now my question:
1. Is that correct (peephole instead of lowering)? Should the lowering 
always emit the generic, not always optimal "LD DE,<imm16>". Or should 
the lowering process always split the 16 bit immediate load in two 8 bit 
immediate loads (via two new virtual 8 bit registers), which would be 
eliminated later automatically?
2. And if peephole is the better choice, which of these is recommended: 
the SSA-based Machine Code Optimizations, or the Late Machine Code 
Optimizations? Both places in the LLVM code generator docs say "To be 
written", so I don't really know which one to choose... or even writing 
a custom pass?

...and more importantly, how would I check if any physical register 
contains a specific fixed value at a certain point (in which case the 
optimization can be done) - or not.

Michael