[llvm-dev] [LLVMdev] Subregister liveness tracking

Tue Jul 19 10:48:48 PDT 2016

Oh that is an old ticket, the items in the TODO below are all done (some are obsolete).

subregister liveness tracking has been available in llvm for roughly a year now and for a few months we also have scheduler extensions in place to allow independent scheduling of subregister definitions.

While I had subregister liveness tracking working for all CPU targets I decided against enabling it in the end. This is because I could not measure any benefits in the generated code, so I couldn't justify the added compiletime.

We do have subregister liveness tracking enabled in production for AMDGPU and an out of tree target, so it certainly works today.

- Matthias

> On Jul 19, 2016, at 9:26 AM, Sergey Yakoushkin <sergey.yakoushkin at gmail.com> wrote:
> 
> Hi all,
> 
> I'm working on LLVM-based back-end. We have custom LLVM solution for sub-register liveness tracking.
> 
> Recently we tried to replace it with LLVM enableSubRegLiveness and run into multiple issues.
> I can't share specific IR snippets now, but failures are related to spilling, undef sub-registers, etc.
> 
> There is Bug 17557 open since 2013: Enable subregister liveness for scheduling and register allocation.
> https://llvm.org/bugs/show_bug.cgi?id=17557 <https://llvm.org/bugs/show_bug.cgi?id=17557>
> 
> with TODO list:
> "These are the remaining steps to get Matthias' subregister liveness fully integrated:
> - Fix LiveRegUnits to correctly handle regmasks.
> - Benchmark/tune compile time.
> - Enable subreg liveness on x86 for testing purposes.
> - Use LiveRegUnits to fix ARM VMOV widening.
> - Fix the scheduler's DAG builder to use bundler iterator, not operand index.
> - Discard the master live range after coalescing so that LiveInterval updates don't need to preserve it when we reorder subregister defs.
> - Enable subreg scheduling on all targets that enable MI scheduler."
> 
> Is someone still working on bug fixes and enhancements? any pending patches?
> 
> R600 back-end is using sub-reg liveness: e.g. r238999 - R600: Re-enable sub-reg liveness (June 2015).
> But it seems requirements and use cases are GPU-specific.
> 
> Does anyone use sub-reg liveness for RISC/CISC+SIMD targets?
> 
> Thanks,
> Sergey
> 
> 
> On Wed, Oct 9, 2013 at 11:03 PM, Matthias Braun <mbraun at apple.com <mailto:mbraun at apple.com>> wrote:
> 
> On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <ahatanak at gmail.com <mailto:ahatanak at gmail.com>> wrote:
> 
>> What I didn't mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work.
> 
> Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think.
> 
>> 
>> For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:
>> 
>> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
>> 2. mflo $4, $lo<use> // $4 <- $lo
>> 3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.
>> 4. mfhi $5, $hi<use> // $5 <- $hi
>> 5. mthi $hi<def>, $7 // $hi <- $7
>> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) 
>> 
>> Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:
>> 
>> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
>> 2. mflo $4, $lo<use> // $4 <- $lo
>> 3. mfhi $5, $hi<use> // $5 <- $hi
>> 4. mthi $hi<def>, $7 // $hi <- $7.
>> 5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.
>> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) 
>> 
>> Probably I can define a pseudo instruction "mthilo" that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint 'l' for 'lo' register).
> 
> That is probably the cleanest solution, with the only downside being that the scheduler can’t place instruction between the mthi and mtlo anymore.
> 
> Greetings
> 	Matthias
> 
>> 
>> 
>> 
>> On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>> wrote:
>> 
>> Currently it will always spill / restore the whole vreg but only spilling the parts that are actually live would be a nice addition in the future.
>> 
>> Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an unpredictable value, then it should just have an additional (dead) def operand for $hi, shouldn’t it?
>> 
>> Greetings
>>     Matthias
>> 
>> Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:
>>> Hi,
>>> 
>>> I have a question about the way sub-registers are spilled and restored that is related to the changes I made in r192119.
>>> 
>>> Suppose I have the following piece of code with four instructions. %vreg0 and %vreg1 consist of two sub-registers indexed by sub_lo and sub_hi.
>>> 
>>> instr0 %vreg0<def>
>>> instr1 %vreg1:sub_lo<def,read-undef>
>>> instr2 %vreg0<use>
>>> instr3 %vreg1:sub_hi<def>
>>> 
>>> If register allocator decides to insert spill and restore instructions for %vreg0, will it spill the whole register that includes sub-registers lo and hi?
>>> 
>>> instr0 %vreg0<def>
>>> spill0 %vreg0
>>> instr1 %vreg1:sub_lo<def,read-undef>
>>> spill1 %vreg1:sub_lo
>>> restore0 %vreg0
>>> instr2 %vreg0<use>
>>> restore1 %vreg1:sub_lo
>>> instr3 %vreg1:sub_hi<def>
>>> 
>>> Or will it spill just the lo sub-register?
>>> 
>>> instr0 %vreg0<def>
>>> spill0 %vreg0:sub_lo
>>> instr1 %vreg1:sub_lo<def,read-undef>
>>> spill1 %vreg1:sub_lo
>>> restore0 %vreg0:sub_lo
>>> instr2 %vreg0<use>
>>> restore1 %vreg1:sub_lo
>>> instr3 %vreg1:sub_hi<def>
>>> 
>>> If it spills the whole register (both sub-registers lo and hi), the changes I made should be fine. Otherwise, I will have to find another way to prevent the problems I mentioned in r192119's commit log.
>>> 
>>> 
>>> 
>>> On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>> wrote:
>>> I've been working on patches to improve subregister liveness tracking on llvm and I wanted to inform the llvm community about the overal design/motivation for them. I will send the patches to llvm-commits later today.
>>> 
>>> Greetings
>>>     Matthias Braun
>>> 
>>> 
>>> Subregisters in llvm
>>> ====================
>>> 
>>> Some targets can access registers in different ways resulting in wider or
>>> narrower accesses. For example on ARM NEON one of the single precision
>>> floating point registers is called 'S0'. You may also access 'D0' on arm which
>>> is the combination of 'S0' and 'S1' and can store a double prevision number or
>>> 2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
>>> 'S3' (or 'D0' and 'D1') and so on.
>>> 
>>> Before register allocation llvm machine code accesses values through virtual
>>> registers, these get assigned to physical registers later. Each virtual
>>> register has an assigned register class which is a set of physical registers.
>>> So for example on ARM you have a register class containing all the 'SXX'
>>> registers and another one containing all the 'DXX' registers, ...
>>> 
>>> But sometimes you want to mix narrow and wide accesses to values. Like loading
>>> the 'D0' register but later reading the 'S0' and 'S1' components separately.
>>> This is modeled with subregister operands which specify that only parts of a
>>> wider value are accessed. For example the register class of the 'DXX'
>>> registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
>>> result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
>>> register later.
>>> 
>>> Typical operations are decomposing wider values or composing wide values with
>>> multiple smaller defs:
>>> 
>>> Decomposing:
>>> %vreg1<def> = produce a 'D' value
>>>             = use 'S' value %vreg1:ssub_0
>>>             = use 'S' value %vreg1:ssub_1
>>> 
>>> Composing:
>>> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg1:ssub_1<def>            = produce an 'S' value
>>>            = use a 'D' value %vreg1
>>> 
>>> Problems / Motivation
>>> =====================
>>> 
>>> Currently the llvm register allocator tracks liveness for whole virtual
>>> registers. This can lead to suboptimal code:
>>> 
>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg0:ssub_1<def> = produce an 'S' value
>>>        = use a 'D' value %vreg0
>>> %vreg1 = produce an 'S' value
>>>        = use an 'S' value %vreg1
>>>        = use an 'S' value %vreg0:ssub_0
>>> 
>>> The current code will realize that vreg0 and vreg1 interfere and assign them
>>> to different registers like D0+S2 aka S0+S1+S2; while in reality after the
>>> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
>>> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
>>> assignment would be D0+S1 aka S0+S1.
>>> 
>>> A even more pressing problem are artificial dependencies in the schedule
>>> graph. This is a side effect of llvms live range information being represented
>>> in a static single assignment like fashion: Every definition of a vreg starts
>>> a new interval with a new value number. This means that partial register
>>> writes must be modeled as an implicit use of the unwritten parts of a register
>>> and force the creating of a new value number. This in turn leads to artificial
>>> dependencies in the schedule graph for code like the following where all defs
>>> should be independent:
>>> 
>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>> %vreg0:ssub_1<def>            = produce an 'S' value
>>> %vreg0:ssub_2<def>            = produce an 'S' value
>>> %vreg0:ssub_3<def>            = produce an 'S' value
>>> 
>>> 
>>> Subegister liveness tracking
>>> ============================
>>> 
>>> I developed a set of patches which enable liveness tracking on the subregister
>>> level, to overcome the problems mentioned above. After these changes you can
>>> have separate live ranges for subregisters of a virtual register. With these
>>> patches the following code:
>>> 
>>>   16B  %vreg0:ssub_0<def,read-undef> = ...
>>>   32B  %vreg0:ssub_1<def>            = ...
>>>   48B               = %vreg0
>>>   64B               = %vreg0:ssub_0
>>>   80B  %vreg0 = ...
>>>   96B         = %vreg0:ssub_1
>>> 
>>> will be represented as the following live range(s):
>>> 
>>>   Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
>>>   SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
>>>   SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>>> 
>>> Patches/Changes:
>>> * Moves live range management code in the LiveInterval class to a new
>>>   class LiveRange, move the previous LiveRange class (which was just a single
>>>   interval inside a live range) to LiveRange::Segment.
>>>   LiveInterval is made a subclass of LiveRange, other code paths like
>>>   register units liveness use LiveRange instead of LiveInterval now.
>>> * Introduce a linked list of SubRange objects to the LiveInterval class.
>>>   A SubRange is a subclass of LiveRange and contains a LaneMask indicating
>>>   which subregisters are represented.
>>> * Various algorithms have been adapted to calculate/preserve subregister
>>>   liveness.
>>> * The register allocator has been adapted to track interference at the
>>>   subregister level (LaneMasks are mapped to register units)
>>> 
>>> Note that SubRegister liveness tracking has to be explicitely enabled by the
>>> target architecture, as it does not provide enough benefits for the costs on
>>> some targets (e.g. having subregister liveness for the lower/upper 8bit regs
>>> on x86 provided nearly no benefits in the llvm-testsuite, so you can't justify
>>> more computations/memory usage for that.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/a47e3479/attachment.html>