[LLVMdev] Register Pairing

Sat Nov 27 08:56:34 PST 2010

Hello, some months ago i wrote to the mailing list asking some questions
about register pairing, i've been experimenting several things with the help
i got back then.

Some background first: this issue is for a backend for an 8bit
microcontroller with only 8bit regs, however it has a few 16bit instructions
that only work with fixed register pairs, so it doesnt allow all
combinations of regs. This introduces some problems because if data wider
than 8bits is expanded into 8bit operations the 16bit instructions never get
selected, also the reg allocator should allocate adjacent regs to form the
pairs. The most important 16 bit instruction is data movement, this
instruction can move register pairs in a single cycle, doing something like
this:
mov r25, r23
mov r24, r22
into:
movw r25:r24, r23:r22
The key point here is that the movw instruction can only move data of fixed
pairs in this way. movw Rdest+1:Rdest, Rorig+1:Rorig, so movw R25:R23,
R21:R18 is illegal because registers pairs aren't adjacent.

Explaining this as if it was for x86 may make things more clear. Suppose we
only have 8bit regs (ah, al, cl, ch, bl, bh, etc...) with 8 bit instructions
and a few 16 bit instrs that only work with ax, bx, cx,.... We could pair
ah:al to form ax, and so on. Then we have a movw instruction that is able to
work with reg pairs, such as ax, bx, cx, etc... This way if we store a short
in ah:al we could copy it into another pair, but storing it into al:cl would
be bad because movw wouldnt be emitted. Also these reg pairs would be stored
in a pseudo reg class so the 16 bit intrs would be emitted, otherwise none
of them would be produced.

Going back to my case I want to favor storing 16bit or wider data in
adjacent regs to favor the insertion of this instruction like
HARD_REGNO_MODE_OK in gcc. To overcome this problem and to select 16bit
instructions automatically by LLVM i thought on working always with 16 bit
register pairs. I created a pseudo reg class that contains the reg pairs
marking each pair with its 2 subregs of 8 bits. Then i marked the i16 type
legal with addRegisterClass(), although this is nearly a lie because i16 is
only legal in a few specific cases, it makes LLVM think that working with
the pairs is fine.
Now for example to perform a 16 bit sum consider this code:
typedef unsigned short t;
t foo(t a, t b, t c)
{
    return a+b;
}
Incoming args come in register pairs, split them into the 8bit subregs
perform the addition and combine them again into the pair ready for the next
operation, in this case we're returning the result. This gives me this code
before instr sel:
# Machine code for function foo:
Function Live Ins: %R25R24 in reg%16384, %R23R22 in reg%16385
Function Live Outs: %R25R24

BB#0: derived from LLVM BB %entry
    Live Ins: %R25R24 %R23R22
    %reg16385<def> = COPY %R23R22; WDREGS:%reg16385 // COPY B
    %reg16384<def> = COPY %R25R24; WDREGS:%reg16384  //  COPY A
    %reg16387<def> = COPY %reg16384:ssub_0; GPR8:%reg16387 WDREGS:%reg16384
// EXTRACT LO BYTE OF A
    %reg16388<def> = COPY %reg16385:ssub_0; GPR8:%reg16388 WDREGS:%reg16385
// EXTRACT LO BYTE OF B
    %reg16389<def> = COPY %reg16384:ssub_1; GPR8:%reg16389 WDREGS:%reg16384
// EXTRACT HI BYTE OF A
    %reg16390<def> = COPY %reg16385:ssub_1; GPR8:%reg16390 WDREGS:%reg16385
// EXTRACT HI BYTE OF B
    %reg16391<def> = ADDRdRr %reg16388, %reg16387<kill>, %SREG<imp-def>;
GPR8:%reg16391,16388,16387 // ADD LO BYTES
    %reg16392<def> = ADCRdRr %reg16390, %reg16389<kill>,
%SREG<imp-def,dead>, %SREG<imp-use>; GPR8:%reg16392,16390,16389 // ADDC HI
BYTES
    %reg16393<def> = REG_SEQUENCE %reg16391<kill>, ssub_0, %reg16392<kill>,
ssub_1; WDREGS:%reg16393 GPR8:%reg16391,16392 // COMBINE INTO REG PAIR AGAIN
    %R25R24<def> = COPY %reg16393; WDREGS:%reg16393 // COPY REG PAIR INTO
RETURN REG.
    RET

This is fine until we get to the register allocation stage, there it does:
BB#0: derived from LLVM BB %entry
    Live Ins: %R25R24 %R23R22
    %R18<def> = COPY %R24
    %R19<def> = COPY %R25
    %R24<def> = COPY %R22<kill>, %R25R24<imp-def>
    %R24<def> = ADDRdRr %R24, %R18<kill>, %SREG<imp-def>
    %R25<def> = COPY %R23<kill>
    %R25<def> = ADCRdRr %R25, %R19<kill>, %SREG<imp-def,dead>,
%SREG<imp-use,kill>
    RET %R25R24<imp-use,kill>

Here instead of splitting the pair into its own subregs, it's copying each
subreg into r18 and r19 making useless movements.
The optimal code should be:
add r24, r22
adc r25, r23

So my first question would be how to solve this?
And going on further, is this the correct way to model the pairs to get the
behaviour i need, or is there a better and direct way of handling this? You
can think of the x86 example i wrote before.

Thanks for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101127/6fa166b2/attachment.html>