[LLVMdev] Subregister liveness tracking
Matthias Braun
mbraun at apple.com
Wed Oct 9 13:03:14 PDT 2013
On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
> What I didn't mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work.
Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think.
>
> For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:
>
> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
> 2. mflo $4, $lo<use> // $4 <- $lo
> 3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.
> 4. mfhi $5, $hi<use> // $5 <- $hi
> 5. mthi $hi<def>, $7 // $hi <- $7
> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi)
>
> Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:
>
> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
> 2. mflo $4, $lo<use> // $4 <- $lo
> 3. mfhi $5, $hi<use> // $5 <- $hi
> 4. mthi $hi<def>, $7 // $hi <- $7.
> 5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.
> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi)
>
> Probably I can define a pseudo instruction "mthilo" that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint 'l' for 'lo' register).
That is probably the cleanest solution, with the only downside being that the scheduler can’t place instruction between the mthi and mtlo anymore.
Greetings
Matthias
>
>
>
> On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <matze at braunis.de> wrote:
>
> Currently it will always spill / restore the whole vreg but only spilling the parts that are actually live would be a nice addition in the future.
>
> Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an unpredictable value, then it should just have an additional (dead) def operand for $hi, shouldn’t it?
>
> Greetings
> Matthias
>
> Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:
>> Hi,
>>
>> I have a question about the way sub-registers are spilled and restored that is related to the changes I made in r192119.
>>
>> Suppose I have the following piece of code with four instructions. %vreg0 and %vreg1 consist of two sub-registers indexed by sub_lo and sub_hi.
>>
>> instr0 %vreg0<def>
>> instr1 %vreg1:sub_lo<def,read-undef>
>> instr2 %vreg0<use>
>> instr3 %vreg1:sub_hi<def>
>>
>> If register allocator decides to insert spill and restore instructions for %vreg0, will it spill the whole register that includes sub-registers lo and hi?
>>
>> instr0 %vreg0<def>
>> spill0 %vreg0
>> instr1 %vreg1:sub_lo<def,read-undef>
>> spill1 %vreg1:sub_lo
>> restore0 %vreg0
>> instr2 %vreg0<use>
>> restore1 %vreg1:sub_lo
>> instr3 %vreg1:sub_hi<def>
>>
>> Or will it spill just the lo sub-register?
>>
>> instr0 %vreg0<def>
>> spill0 %vreg0:sub_lo
>> instr1 %vreg1:sub_lo<def,read-undef>
>> spill1 %vreg1:sub_lo
>> restore0 %vreg0:sub_lo
>> instr2 %vreg0<use>
>> restore1 %vreg1:sub_lo
>> instr3 %vreg1:sub_hi<def>
>>
>> If it spills the whole register (both sub-registers lo and hi), the changes I made should be fine. Otherwise, I will have to find another way to prevent the problems I mentioned in r192119's commit log.
>>
>>
>>
>> On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de> wrote:
>> I've been working on patches to improve subregister liveness tracking on llvm and I wanted to inform the llvm community about the overal design/motivation for them. I will send the patches to llvm-commits later today.
>>
>> Greetings
>> Matthias Braun
>>
>>
>> Subregisters in llvm
>> ====================
>>
>> Some targets can access registers in different ways resulting in wider or
>> narrower accesses. For example on ARM NEON one of the single precision
>> floating point registers is called 'S0'. You may also access 'D0' on arm which
>> is the combination of 'S0' and 'S1' and can store a double prevision number or
>> 2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
>> 'S3' (or 'D0' and 'D1') and so on.
>>
>> Before register allocation llvm machine code accesses values through virtual
>> registers, these get assigned to physical registers later. Each virtual
>> register has an assigned register class which is a set of physical registers.
>> So for example on ARM you have a register class containing all the 'SXX'
>> registers and another one containing all the 'DXX' registers, ...
>>
>> But sometimes you want to mix narrow and wide accesses to values. Like loading
>> the 'D0' register but later reading the 'S0' and 'S1' components separately.
>> This is modeled with subregister operands which specify that only parts of a
>> wider value are accessed. For example the register class of the 'DXX'
>> registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
>> result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
>> register later.
>>
>> Typical operations are decomposing wider values or composing wide values with
>> multiple smaller defs:
>>
>> Decomposing:
>> %vreg1<def> = produce a 'D' value
>> = use 'S' value %vreg1:ssub_0
>> = use 'S' value %vreg1:ssub_1
>>
>> Composing:
>> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg1:ssub_1<def> = produce an 'S' value
>> = use a 'D' value %vreg1
>>
>> Problems / Motivation
>> =====================
>>
>> Currently the llvm register allocator tracks liveness for whole virtual
>> registers. This can lead to suboptimal code:
>>
>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg0:ssub_1<def> = produce an 'S' value
>> = use a 'D' value %vreg0
>> %vreg1 = produce an 'S' value
>> = use an 'S' value %vreg1
>> = use an 'S' value %vreg0:ssub_0
>>
>> The current code will realize that vreg0 and vreg1 interfere and assign them
>> to different registers like D0+S2 aka S0+S1+S2; while in reality after the
>> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
>> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
>> assignment would be D0+S1 aka S0+S1.
>>
>> A even more pressing problem are artificial dependencies in the schedule
>> graph. This is a side effect of llvms live range information being represented
>> in a static single assignment like fashion: Every definition of a vreg starts
>> a new interval with a new value number. This means that partial register
>> writes must be modeled as an implicit use of the unwritten parts of a register
>> and force the creating of a new value number. This in turn leads to artificial
>> dependencies in the schedule graph for code like the following where all defs
>> should be independent:
>>
>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>> %vreg0:ssub_1<def> = produce an 'S' value
>> %vreg0:ssub_2<def> = produce an 'S' value
>> %vreg0:ssub_3<def> = produce an 'S' value
>>
>>
>> Subegister liveness tracking
>> ============================
>>
>> I developed a set of patches which enable liveness tracking on the subregister
>> level, to overcome the problems mentioned above. After these changes you can
>> have separate live ranges for subregisters of a virtual register. With these
>> patches the following code:
>>
>> 16B %vreg0:ssub_0<def,read-undef> = ...
>> 32B %vreg0:ssub_1<def> = ...
>> 48B = %vreg0
>> 64B = %vreg0:ssub_0
>> 80B %vreg0 = ...
>> 96B = %vreg0:ssub_1
>>
>> will be represented as the following live range(s):
>>
>> Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
>> SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
>> SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>>
>> Patches/Changes:
>> * Moves live range management code in the LiveInterval class to a new
>> class LiveRange, move the previous LiveRange class (which was just a single
>> interval inside a live range) to LiveRange::Segment.
>> LiveInterval is made a subclass of LiveRange, other code paths like
>> register units liveness use LiveRange instead of LiveInterval now.
>> * Introduce a linked list of SubRange objects to the LiveInterval class.
>> A SubRange is a subclass of LiveRange and contains a LaneMask indicating
>> which subregisters are represented.
>> * Various algorithms have been adapted to calculate/preserve subregister
>> liveness.
>> * The register allocator has been adapted to track interference at the
>> subregister level (LaneMasks are mapped to register units)
>>
>> Note that SubRegister liveness tracking has to be explicitely enabled by the
>> target architecture, as it does not provide enough benefits for the costs on
>> some targets (e.g. having subregister liveness for the lower/upper 8bit regs
>> on x86 provided nearly no benefits in the llvm-testsuite, so you can't justify
>> more computations/memory usage for that.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131009/ca0428bb/attachment.html>
More information about the llvm-dev
mailing list