[LLVMdev] Subregister liveness tracking
Akira Hatanaka
ahatanak at gmail.com
Tue Oct 8 14:06:30 PDT 2013
What I didn't mention in r192119 is that mthi/lo clobbers the other
sub-register only if the contents of hi and lo are produced by mult or
other arithmetic instructions (div, madd, etc.) It doesn't have this
side-effect if it is produced by another mthi/lo. So I don't think making
mthi/lo clobber the other half would work.
For example, this is an illegal sequence of instructions, where instruction
3 makes $hi unpredictable:
1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
2. mflo $4, $lo<use> // $4 <- $lo
3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.
4. mfhi $5, $hi<use> // $5 <- $hi
5. mthi $hi<def>, $7 // $hi <- $7
6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>,
$hi<def> = $2 * $3 + (lo,hi)
Unlike the mtlo instruction in the example above, instruction 5 in the next
example does not clobber $hi:
1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
2. mflo $4, $lo<use> // $4 <- $lo
3. mfhi $5, $hi<use> // $5 <- $hi
4. mthi $hi<def>, $7 // $hi <- $7.
5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.
6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>,
$hi<def> = $2 * $3 + (lo,hi)
Probably I can define a pseudo instruction "mthilo" that defines both lo
and hi and expands to mthi and mtlo after register allocation, which will
force register allocator to spill/restore the whole register in most cases
(the only exception I can think of is the inline-assembly constraint 'l'
for 'lo' register).
On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <matze at braunis.de> wrote:
>
> Currently it will always spill / restore the whole vreg but only
> spilling the parts that are actually live would be a nice addition in the
> future.
>
> Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an
> unpredictable value, then it should just have an additional (dead) def
> operand for $hi, shouldn’t it?
>
> Greetings
> Matthias
>
> Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:
>
> Hi,
>
> I have a question about the way sub-registers are spilled and restored
> that is related to the changes I made in r192119.
>
> Suppose I have the following piece of code with four
> instructions. %vreg0 and %vreg1 consist of two sub-registers indexed by
> sub_lo and sub_hi.
>
> instr0 %vreg0<def>
> instr1 %vreg1:sub_lo<def,read-undef>
> instr2 %vreg0<use>
> instr3 %vreg1:sub_hi<def>
>
> If register allocator decides to insert spill and restore instructions
> for %vreg0, will it spill the whole register that includes sub-registers lo
> and hi?
>
> instr0 %vreg0<def>
> spill0 %vreg0
> instr1 %vreg1:sub_lo<def,read-undef>
> spill1 %vreg1:sub_lo
> restore0 %vreg0
> instr2 %vreg0<use>
> restore1 %vreg1:sub_lo
> instr3 %vreg1:sub_hi<def>
>
> Or will it spill just the lo sub-register?
>
> instr0 %vreg0<def>
> spill0 %vreg0:sub_lo
> instr1 %vreg1:sub_lo<def,read-undef>
> spill1 %vreg1:sub_lo
> restore0 %vreg0:sub_lo
> instr2 %vreg0<use>
> restore1 %vreg1:sub_lo
> instr3 %vreg1:sub_hi<def>
>
> If it spills the whole register (both sub-registers lo and hi), the
> changes I made should be fine. Otherwise, I will have to find another way
> to prevent the problems I mentioned in r192119's commit log.
>
>
>
> On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de> wrote:
>
>> I've been working on patches to improve subregister liveness tracking on
>> llvm and I wanted to inform the llvm community about the overal
>> design/motivation for them. I will send the patches to llvm-commits later
>> today.
>>
>> Greetings
>> Matthias Braun
>>
>>
>> Subregisters in llvm
>> ====================
>>
>> Some targets can access registers in different ways resulting in wider or
>> narrower accesses. For example on ARM NEON one of the single precision
>> floating point registers is called 'S0'. You may also access 'D0' on arm
>> which
>> is the combination of 'S0' and 'S1' and can store a double prevision
>> number or
>> 2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
>> 'S3' (or 'D0' and 'D1') and so on.
>>
>> Before register allocation llvm machine code accesses values through
>> virtual
>> registers, these get assigned to physical registers later. Each virtual
>> register has an assigned register class which is a set of physical
>> registers.
>> So for example on ARM you have a register class containing all the 'SXX'
>> registers and another one containing all the 'DXX' registers, ...
>>
>> But sometimes you want to mix narrow and wide accesses to values. Like
>> loading
>> the 'D0' register but later reading the 'S0' and 'S1' components
>> separately.
>> This is modeled with subregister operands which specify that only parts
>> of a
>> wider value are accessed. For example the register class of the 'DXX'
>> registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
>> result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
>> register later.
>>
>> Typical operations are decomposing wider values or composing wide values
>> with
>> multiple smaller defs:
>>
>> Decomposing:
>> %vreg1<def> = produce a 'D' value
>> = use 'S' value %vreg1:ssub_0
>> = use 'S' value %vreg1:ssub_1
>>
>> Composing:
>> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg1:ssub_1<def> = produce an 'S' value
>> = use a 'D' value %vreg1
>>
>> Problems / Motivation
>> =====================
>>
>> Currently the llvm register allocator tracks liveness for whole virtual
>> registers. This can lead to suboptimal code:
>>
>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg0:ssub_1<def> = produce an 'S' value
>> = use a 'D' value %vreg0
>> %vreg1 = produce an 'S' value
>> = use an 'S' value %vreg1
>> = use an 'S' value %vreg0:ssub_0
>>
>> The current code will realize that vreg0 and vreg1 interfere and assign
>> them
>> to different registers like D0+S2 aka S0+S1+S2; while in reality after the
>> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
>> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
>> assignment would be D0+S1 aka S0+S1.
>>
>> A even more pressing problem are artificial dependencies in the schedule
>> graph. This is a side effect of llvms live range information being
>> represented
>> in a static single assignment like fashion: Every definition of a vreg
>> starts
>> a new interval with a new value number. This means that partial register
>> writes must be modeled as an implicit use of the unwritten parts of a
>> register
>> and force the creating of a new value number. This in turn leads to
>> artificial
>> dependencies in the schedule graph for code like the following where all
>> defs
>> should be independent:
>>
>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg0:ssub_1<def> = produce an 'S' value
>> %vreg0:ssub_2<def> = produce an 'S' value
>> %vreg0:ssub_3<def> = produce an 'S' value
>>
>>
>> Subegister liveness tracking
>> ============================
>>
>> I developed a set of patches which enable liveness tracking on the
>> subregister
>> level, to overcome the problems mentioned above. After these changes you
>> can
>> have separate live ranges for subregisters of a virtual register. With
>> these
>> patches the following code:
>>
>> 16B %vreg0:ssub_0<def,read-undef> = ...
>> 32B %vreg0:ssub_1<def> = ...
>> 48B = %vreg0
>> 64B = %vreg0:ssub_0
>> 80B %vreg0 = ...
>> 96B = %vreg0:ssub_1
>>
>> will be represented as the following live range(s):
>>
>> Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
>> SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
>> SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>>
>> Patches/Changes:
>> * Moves live range management code in the LiveInterval class to a new
>> class LiveRange, move the previous LiveRange class (which was just a
>> single
>> interval inside a live range) to LiveRange::Segment.
>> LiveInterval is made a subclass of LiveRange, other code paths like
>> register units liveness use LiveRange instead of LiveInterval now.
>> * Introduce a linked list of SubRange objects to the LiveInterval class.
>> A SubRange is a subclass of LiveRange and contains a LaneMask indicating
>> which subregisters are represented.
>> * Various algorithms have been adapted to calculate/preserve subregister
>> liveness.
>> * The register allocator has been adapted to track interference at the
>> subregister level (LaneMasks are mapped to register units)
>>
>> Note that SubRegister liveness tracking has to be explicitely enabled by
>> the
>> target architecture, as it does not provide enough benefits for the costs
>> on
>> some targets (e.g. having subregister liveness for the lower/upper 8bit
>> regs
>> on x86 provided nearly no benefits in the llvm-testsuite, so you can't
>> justify
>> more computations/memory usage for that.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131008/5f90e62a/attachment.html>
More information about the llvm-dev
mailing list