[llvm-dev] [LLVMdev] Subregister liveness tracking

Sergey Yakoushkin via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 19 09:26:08 PDT 2016


Hi all,

I'm working on LLVM-based back-end. We have custom LLVM solution for
sub-register liveness tracking.

Recently we tried to replace it with LLVM enableSubRegLiveness and run into
multiple issues.
I can't share specific IR snippets now, but failures are related to
spilling, undef sub-registers, etc.

There is Bug 17557 open since 2013: Enable subregister liveness for
scheduling and register allocation.
https://llvm.org/bugs/show_bug.cgi?id=17557

with TODO list:

"These are the remaining steps to get Matthias' subregister liveness
fully integrated:
- Fix LiveRegUnits to correctly handle regmasks.
- Benchmark/tune compile time.
- Enable subreg liveness on x86 for testing purposes.
- Use LiveRegUnits to fix ARM VMOV widening.
- Fix the scheduler's DAG builder to use bundler iterator, not operand index.
- Discard the master live range after coalescing so that LiveInterval
updates don't need to preserve it when we reorder subregister defs.
- Enable subreg scheduling on all targets that enable MI scheduler."


Is someone still working on bug fixes and enhancements? any pending patches?

R600 back-end is using sub-reg liveness: e.g. r238999 - R600: Re-enable
sub-reg liveness (June 2015).
But it seems requirements and use cases are GPU-specific.

Does anyone use sub-reg liveness for RISC/CISC+SIMD targets?

Thanks,
Sergey


On Wed, Oct 9, 2013 at 11:03 PM, Matthias Braun <mbraun at apple.com> wrote:

>
> On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>
> What I didn't mention in r192119 is that mthi/lo clobbers the other
> sub-register only if the contents of hi and lo are produced by mult or
> other arithmetic instructions (div, madd, etc.) It doesn't have this
> side-effect if it is produced by another mthi/lo. So I don't think making
> mthi/lo clobber the other half would work.
>
> Uh that is indeed nasty, and can’t really be expressed like that in the
> current RA framework I think.
>
>
> For example, this is an illegal sequence of instructions, where
> instruction 3 makes $hi unpredictable:
>
> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
> 2. mflo $4, $lo<use> // $4 <- $lo
> 3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.
> 4. mfhi $5, $hi<use> // $5 <- $hi
> 5. mthi $hi<def>, $7 // $hi <- $7
> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>,
> $hi<def> = $2 * $3 + (lo,hi)
>
> Unlike the mtlo instruction in the example above, instruction 5 in the
> next example does not clobber $hi:
>
> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
> 2. mflo $4, $lo<use> // $4 <- $lo
> 3. mfhi $5, $hi<use> // $5 <- $hi
> 4. mthi $hi<def>, $7 // $hi <- $7.
> 5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.
> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>,
> $hi<def> = $2 * $3 + (lo,hi)
>
> Probably I can define a pseudo instruction "mthilo" that defines both lo
> and hi and expands to mthi and mtlo after register allocation, which will
> force register allocator to spill/restore the whole register in most cases
> (the only exception I can think of is the inline-assembly constraint 'l'
> for 'lo' register).
>
> That is probably the cleanest solution, with the only downside being that
> the scheduler can’t place instruction between the mthi and mtlo anymore.
>
> Greetings
> Matthias
>
>
>
>
> On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <matze at braunis.de> wrote:
>
>>
>> Currently it will always spill / restore the whole vreg but only spilling
>> the parts that are actually live would be a nice addition in the future.
>>
>> Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an
>> unpredictable value, then it should just have an additional (dead) def
>> operand for $hi, shouldn’t it?
>>
>> Greetings
>>     Matthias
>>
>> Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:
>>
>> Hi,
>>
>> I have a question about the way sub-registers are spilled and restored
>> that is related to the changes I made in r192119.
>>
>> Suppose I have the following piece of code with four instructions. %vreg0
>> and %vreg1 consist of two sub-registers indexed by sub_lo and sub_hi.
>>
>> instr0 %vreg0<def>
>> instr1 %vreg1:sub_lo<def,read-undef>
>> instr2 %vreg0<use>
>> instr3 %vreg1:sub_hi<def>
>>
>> If register allocator decides to insert spill and restore instructions
>> for %vreg0, will it spill the whole register that includes sub-registers lo
>> and hi?
>>
>> instr0 %vreg0<def>
>> spill0 %vreg0
>> instr1 %vreg1:sub_lo<def,read-undef>
>> spill1 %vreg1:sub_lo
>> restore0 %vreg0
>> instr2 %vreg0<use>
>> restore1 %vreg1:sub_lo
>> instr3 %vreg1:sub_hi<def>
>>
>> Or will it spill just the lo sub-register?
>>
>> instr0 %vreg0<def>
>> spill0 %vreg0:sub_lo
>> instr1 %vreg1:sub_lo<def,read-undef>
>> spill1 %vreg1:sub_lo
>> restore0 %vreg0:sub_lo
>> instr2 %vreg0<use>
>> restore1 %vreg1:sub_lo
>> instr3 %vreg1:sub_hi<def>
>>
>> If it spills the whole register (both sub-registers lo and hi), the
>> changes I made should be fine. Otherwise, I will have to find another way
>> to prevent the problems I mentioned in r192119's commit log.
>>
>>
>>
>> On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de> wrote:
>>
>>> I've been working on patches to improve subregister liveness tracking on
>>> llvm and I wanted to inform the llvm community about the overal
>>> design/motivation for them. I will send the patches to llvm-commits later
>>> today.
>>>
>>> Greetings
>>>     Matthias Braun
>>>
>>>
>>> Subregisters in llvm
>>> ====================
>>>
>>> Some targets can access registers in different ways resulting in wider or
>>> narrower accesses. For example on ARM NEON one of the single precision
>>> floating point registers is called 'S0'. You may also access 'D0' on arm
>>> which
>>> is the combination of 'S0' and 'S1' and can store a double prevision
>>> number or
>>> 2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2'
>>> and
>>> 'S3' (or 'D0' and 'D1') and so on.
>>>
>>> Before register allocation llvm machine code accesses values through
>>> virtual
>>> registers, these get assigned to physical registers later. Each virtual
>>> register has an assigned register class which is a set of physical
>>> registers.
>>> So for example on ARM you have a register class containing all the 'SXX'
>>> registers and another one containing all the 'DXX' registers, ...
>>>
>>> But sometimes you want to mix narrow and wide accesses to values. Like
>>> loading
>>> the 'D0' register but later reading the 'S0' and 'S1' components
>>> separately.
>>> This is modeled with subregister operands which specify that only parts
>>> of a
>>> wider value are accessed. For example the register class of the 'DXX'
>>> registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
>>> result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
>>> register later.
>>>
>>> Typical operations are decomposing wider values or composing wide values
>>> with
>>> multiple smaller defs:
>>>
>>> Decomposing:
>>> %vreg1<def> = produce a 'D' value
>>>             = use 'S' value %vreg1:ssub_0
>>>             = use 'S' value %vreg1:ssub_1
>>>
>>> Composing:
>>> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg1:ssub_1<def>            = produce an 'S' value
>>>            = use a 'D' value %vreg1
>>>
>>> Problems / Motivation
>>> =====================
>>>
>>> Currently the llvm register allocator tracks liveness for whole virtual
>>> registers. This can lead to suboptimal code:
>>>
>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg0:ssub_1<def> = produce an 'S' value
>>>        = use a 'D' value %vreg0
>>> %vreg1 = produce an 'S' value
>>>        = use an 'S' value %vreg1
>>>        = use an 'S' value %vreg0:ssub_0
>>>
>>> The current code will realize that vreg0 and vreg1 interfere and assign
>>> them
>>> to different registers like D0+S2 aka S0+S1+S2; while in reality after
>>> the
>>> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while
>>> the
>>> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
>>> assignment would be D0+S1 aka S0+S1.
>>>
>>> A even more pressing problem are artificial dependencies in the schedule
>>> graph. This is a side effect of llvms live range information being
>>> represented
>>> in a static single assignment like fashion: Every definition of a vreg
>>> starts
>>> a new interval with a new value number. This means that partial register
>>> writes must be modeled as an implicit use of the unwritten parts of a
>>> register
>>> and force the creating of a new value number. This in turn leads to
>>> artificial
>>> dependencies in the schedule graph for code like the following where all
>>> defs
>>> should be independent:
>>>
>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg0:ssub_1<def>            = produce an 'S' value
>>> %vreg0:ssub_2<def>            = produce an 'S' value
>>> %vreg0:ssub_3<def>            = produce an 'S' value
>>>
>>>
>>> Subegister liveness tracking
>>> ============================
>>>
>>> I developed a set of patches which enable liveness tracking on the
>>> subregister
>>> level, to overcome the problems mentioned above. After these changes you
>>> can
>>> have separate live ranges for subregisters of a virtual register. With
>>> these
>>> patches the following code:
>>>
>>>   16B  %vreg0:ssub_0<def,read-undef> = ...
>>>   32B  %vreg0:ssub_1<def>            = ...
>>>   48B               = %vreg0
>>>   64B               = %vreg0:ssub_0
>>>   80B  %vreg0 = ...
>>>   96B         = %vreg0:ssub_1
>>>
>>> will be represented as the following live range(s):
>>>
>>>   Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
>>>   SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
>>>   SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>>>
>>> Patches/Changes:
>>> * Moves live range management code in the LiveInterval class to a new
>>>   class LiveRange, move the previous LiveRange class (which was just a
>>> single
>>>   interval inside a live range) to LiveRange::Segment.
>>>   LiveInterval is made a subclass of LiveRange, other code paths like
>>>   register units liveness use LiveRange instead of LiveInterval now.
>>> * Introduce a linked list of SubRange objects to the LiveInterval class.
>>>   A SubRange is a subclass of LiveRange and contains a LaneMask
>>> indicating
>>>   which subregisters are represented.
>>> * Various algorithms have been adapted to calculate/preserve subregister
>>>   liveness.
>>> * The register allocator has been adapted to track interference at the
>>>   subregister level (LaneMasks are mapped to register units)
>>>
>>> Note that SubRegister liveness tracking has to be explicitely enabled by
>>> the
>>> target architecture, as it does not provide enough benefits for the
>>> costs on
>>> some targets (e.g. having subregister liveness for the lower/upper 8bit
>>> regs
>>> on x86 provided nearly no benefits in the llvm-testsuite, so you can't
>>> justify
>>> more computations/memory usage for that.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/d9ce861d/attachment.html>


More information about the llvm-dev mailing list