<div dir="ltr">This is so far down the list of problems you'll have (and the difference so trivial to program size and speed) that I think you should ignore it until you have a working compiler.<div><br></div><div>As far as two registers getting the same value, that should be picked up by common subexpression elimination in the optimiser anyway.</div><div><br></div><div>You might want to consider having a pseudo-instruction for LD {BC,DE,HL,IX,IY},<span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">{BC,DE,HL,IX,IY} (all combinations are valid except those containing two of HL,IX,IY). You could expand this very late in the assembler, or during legalisation.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 25, 2018 at 10:42 AM, Michael Stellmann via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This is a question about optimizing the code generation in a (new) Z80 backend:<br>

<br>

The CPU has a couple of 8 bit physical registers, e.g. H, L, D and E, which are overlaid in 16 bit register pairs named HL and DE.<br>

<br>

It has also a native instruction to load a 16 bit immediate value into a 16 bit register pair (HL or DE), e.g.:<br>

<br>

    LD HL,<imm16><br>

<br>

Now when having a sequence of loading two 16 bit register pairs with the *same* immediate value, the simple approach is:<br>

<br>

    LD HL,<imm16><br>

    LD DE,<imm16><br>

<br>

However, the second line can be shortened (in opcode bytes and cycles) to load the overlaid 8 bit registers of HL (H and L) into the overlaid 8 bit registers of DE (D and E), so the desired result is:<br>

<br>

    ; optimized version: saves 1 byte and 2 cycles<br>

    LD D,H    (sets the high 8 bits of DE from the high 8 bits of HL)<br>

    LD E,L    (same for lower 8 bits)<br>

<br>

<br>

Another example: If reg pair DE needs to be loaded with imm16 = 0, and another physical(!) register is known to be 0 (from a previous immediate load, directly or indirectly) - assuming that L = 0 (H might be something else) - the following code:<br>

<br>

    LD DE,0x0000<br>

<br>

should become:<br>

<br>

    LD D,L<br>

    LD E,L<br>

<br>

I would expect that this needs to be done in a peephole optimizer pass, as during the lowering process, the physical registers are not yet assigned.<br>

<br>

Now my question:<br>

1. Is that correct (peephole instead of lowering)? Should the lowering always emit the generic, not always optimal "LD DE,<imm16>". Or should the lowering process always split the 16 bit immediate load in two 8 bit immediate loads (via two new virtual 8 bit registers), which would be eliminated later automatically?<br>

2. And if peephole is the better choice, which of these is recommended: the SSA-based Machine Code Optimizations, or the Late Machine Code Optimizations? Both places in the LLVM code generator docs say "To be written", so I don't really know which one to choose... or even writing a custom pass?<br>

<br>

...and more importantly, how would I check if any physical register contains a specific fixed value at a certain point (in which case the optimization can be done) - or not.<br>

<br>

Michael<br>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br></div>