[llvm-dev] [LLVMdev] Subregister liveness tracking

Tue Jul 19 13:46:21 PDT 2016

> On Jul 19, 2016, at 1:11 PM, Sergey Yakoushkin <sergey.yakoushkin at gmail.com> wrote:
> 
> Hi Matthias,
> 
> Thanks for quick reply. I have few questions about sub-register support.
> 
> I'm currently using LLVM 3.8. Could you send links to recent patches/reviews?
Given the time the 3.8 branch was created it should definitely have all the subregister liveness tracking code, it may not have all the things necessary to enable the scheduler enhancements though.

> 
> 
> 1) Does LLVM maintain sub-register liveness while splitting live intervals?
No, currently subregister liveness information is dropped if registers get split or spilled.
There are currently some issues with recalculating liveness after invasive changes relating to the fact that you cannot set undef/dead flags on subregister granularity in todays llvm. There is no such problem when the liveness information is computed before register coalescing and the coalescer itself has code to update the liveness information rather than recomputing it.
The spilling and splitting code currently takes the easy way out and drops the subregister information (so you are only left with the liveness of the superregister). This hasn't been a problem for GPUs as those typically have many register so spilling and splitting are untypical operations.

> 
> 2) How sub-register kill/dead flags and BB liveins should look after regalloc?
The kill and the dead flag describes the whole machine operand:
     %vreg0<dead>.             // dead def of the whole vreg0
     %vreg1.sub1<dead>.    // dead def of the sub1 subregisters (the other subregs may be live)
         = %vreg2<undef>.    // undef %vreg2 usage <=> the operand does not affect vreg2s liveness
         = %vreg2.sub2<undef>.  // undef %vreg2.sub1 usage

   %AL = xxx
            = use %EAX.  // this is legal: you only need part of the physreg to be defined for a use to be fine
            = use %EBX<undef>   // this still requires the undef flag as the complete register is undefined.

> 
> E.g. target has instructions both for sub-registers and registers.
> Composite 64b reg contains 2x32b regs: Rxy = { Rx, Ry }. and scalar/vector add/ld/etc.
> 
> LiveRangeCalc::findReachingDefs expects to see all sub-registers in BB liveins together with pair registers.
> 
> BB1:
>   Rxy<def> = ...
> 
> BB2: LiveIn: Rxy and Rx ?
You can either have Rxy in the live-in list, you may also have Rx and Ry separately in the list.
(Note that each live-in also has a lanemask assigned so you can have situations in which Rxy is in the live-in list
but the lanemask is telling you that just one of Rx/Ry is actually live)

> 
>   ... = ADD   Rx<use>
>   ... = VADD Rxy<use>
> 
> 00284     if (TargetRegisterInfo::isPhysicalRegister <http://llvm.org/docs/doxygen/html/classllvm_1_1TargetRegisterInfo.html#a055858b14215864ed367a8db6c19d6f6>(PhysReg) &&
>  <>00285         !MBB->isLiveIn <http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#afe4193a0ecb73443df8c573bb29bd476>(PhysReg)) {
>  <>00286       MBB->getParent <http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#af2e482ff2a9253ec6bc2285491496bd6>()->verify <http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html#a1e613e97a2629a51e5d2d3e6f7b32b50>();
>  <>00287       errs <http://llvm.org/docs/doxygen/html/namespacellvm.html#ab8e34eca3b0817ef7a127913fbf6d9e4>() << "The register " << PrintReg <http://llvm.org/docs/doxygen/html/namespacellvm.html#a28f4a9f931a245d69d411d73e5a877a9>(PhysReg)
>  <>00288              << " needs to be live in to BB#" << MBB->getNumber <http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#a6acda287e5c19ffb173b0bf8f1dd9c5e>()
>  <>00289              << ", but is missing from the live-in list.\n";
>  <>00290       llvm_unreachable <http://llvm.org/docs/doxygen/html/Support_2ErrorHandling_8h.html#ace243f5c25697a1107cce46626b3dc94>("Invalid global physical register");
>  <>00291     }
You may be running in the recalculation problems here. Tricky situations may look like this:

BB0:
    %vreg0 = xxx
    jmp BB2

BB1:
    %vreg0.sub1<read-undef> = yyy
    jmp BB2

BB2:
      = use %vreg0.   // the instruction only cares about vreg0.sub1 but for some reason encodes the full register.
                                // However as we lack a way to add an "undef" flag for vreg0.sub0 the liveness calculation traces
                                // backwards and fails to find an actual sub0 definition in BB1.

If that is your actual problem, there is more discussion and a possible solution being discussed in  https://reviews.llvm.org/D21189 <https://reviews.llvm.org/D21189>. I must say though that I don't feel comfortable with the complexity added to the code there...

- Matthias

> 3) LLVM coalesces composite register and sub-registers, leaving some sub-registers undefined, but information is lost after regalloc rewriting.
> Is it expected?
> 
> before regalloc:
> %vreg:item1<def,read-undef> = MOV_imm 0
> STORE <..#0>, 0, %vreg56
> 
> after regalloc:
> %Rx<def> = MOV_imm 0
> STORE <..#0>, 0, %Rxy
> 
> Regards,
> Sergey
> 
> 
> On Tue, Jul 19, 2016 at 8:48 PM, Matthias Braun <mbraun at apple.com <mailto:mbraun at apple.com>> wrote:
> Oh that is an old ticket, the items in the TODO below are all done (some are obsolete).
> 
> subregister liveness tracking has been available in llvm for roughly a year now and for a few months we also have scheduler extensions in place to allow independent scheduling of subregister definitions.
> 
> While I had subregister liveness tracking working for all CPU targets I decided against enabling it in the end. This is because I could not measure any benefits in the generated code, so I couldn't justify the added compiletime.
> 
> We do have subregister liveness tracking enabled in production for AMDGPU and an out of tree target, so it certainly works today.
> 
> - Matthias
> 
>> On Jul 19, 2016, at 9:26 AM, Sergey Yakoushkin <sergey.yakoushkin at gmail.com <mailto:sergey.yakoushkin at gmail.com>> wrote:
>> 
>> Hi all,
>> 
>> I'm working on LLVM-based back-end. We have custom LLVM solution for sub-register liveness tracking.
>> 
>> Recently we tried to replace it with LLVM enableSubRegLiveness and run into multiple issues.
>> I can't share specific IR snippets now, but failures are related to spilling, undef sub-registers, etc.
>> 
>> There is Bug 17557 open since 2013: Enable subregister liveness for scheduling and register allocation.
>> https://llvm.org/bugs/show_bug.cgi?id=17557 <https://llvm.org/bugs/show_bug.cgi?id=17557>
>> 
>> with TODO list:
>> "These are the remaining steps to get Matthias' subregister liveness fully integrated:
>> - Fix LiveRegUnits to correctly handle regmasks.
>> - Benchmark/tune compile time.
>> - Enable subreg liveness on x86 for testing purposes.
>> - Use LiveRegUnits to fix ARM VMOV widening.
>> - Fix the scheduler's DAG builder to use bundler iterator, not operand index.
>> - Discard the master live range after coalescing so that LiveInterval updates don't need to preserve it when we reorder subregister defs.
>> - Enable subreg scheduling on all targets that enable MI scheduler."
>> 
>> Is someone still working on bug fixes and enhancements? any pending patches?
>> 
>> R600 back-end is using sub-reg liveness: e.g. r238999 - R600: Re-enable sub-reg liveness (June 2015).
>> But it seems requirements and use cases are GPU-specific.
>> 
>> Does anyone use sub-reg liveness for RISC/CISC+SIMD targets?
>> 
>> Thanks,
>> Sergey
>> 
>> 
>> On Wed, Oct 9, 2013 at 11:03 PM, Matthias Braun <mbraun at apple.com <mailto:mbraun at apple.com>> wrote:
>> 
>> On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <ahatanak at gmail.com <mailto:ahatanak at gmail.com>> wrote:
>> 
>>> What I didn't mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work.
>> 
>> Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think.
>> 
>>> 
>>> For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:
>>> 
>>> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
>>> 2. mflo $4, $lo<use> // $4 <- $lo
>>> 3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.
>>> 4. mfhi $5, $hi<use> // $5 <- $hi
>>> 5. mthi $hi<def>, $7 // $hi <- $7
>>> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) 
>>> 
>>> Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:
>>> 
>>> 1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3
>>> 2. mflo $4, $lo<use> // $4 <- $lo
>>> 3. mfhi $5, $hi<use> // $5 <- $hi
>>> 4. mthi $hi<def>, $7 // $hi <- $7.
>>> 5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.
>>> 6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) 
>>> 
>>> Probably I can define a pseudo instruction "mthilo" that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint 'l' for 'lo' register).
>> 
>> That is probably the cleanest solution, with the only downside being that the scheduler can’t place instruction between the mthi and mtlo anymore.
>> 
>> Greetings
>> 	Matthias
>> 
>>> 
>>> 
>>> 
>>> On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>> wrote:
>>> 
>>> Currently it will always spill / restore the whole vreg but only spilling the parts that are actually live would be a nice addition in the future.
>>> 
>>> Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an unpredictable value, then it should just have an additional (dead) def operand for $hi, shouldn’t it?
>>> 
>>> Greetings
>>>     Matthias
>>> 
>>> Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:
>>>> Hi,
>>>> 
>>>> I have a question about the way sub-registers are spilled and restored that is related to the changes I made in r192119.
>>>> 
>>>> Suppose I have the following piece of code with four instructions. %vreg0 and %vreg1 consist of two sub-registers indexed by sub_lo and sub_hi.
>>>> 
>>>> instr0 %vreg0<def>
>>>> instr1 %vreg1:sub_lo<def,read-undef>
>>>> instr2 %vreg0<use>
>>>> instr3 %vreg1:sub_hi<def>
>>>> 
>>>> If register allocator decides to insert spill and restore instructions for %vreg0, will it spill the whole register that includes sub-registers lo and hi?
>>>> 
>>>> instr0 %vreg0<def>
>>>> spill0 %vreg0
>>>> instr1 %vreg1:sub_lo<def,read-undef>
>>>> spill1 %vreg1:sub_lo
>>>> restore0 %vreg0
>>>> instr2 %vreg0<use>
>>>> restore1 %vreg1:sub_lo
>>>> instr3 %vreg1:sub_hi<def>
>>>> 
>>>> Or will it spill just the lo sub-register?
>>>> 
>>>> instr0 %vreg0<def>
>>>> spill0 %vreg0:sub_lo
>>>> instr1 %vreg1:sub_lo<def,read-undef>
>>>> spill1 %vreg1:sub_lo
>>>> restore0 %vreg0:sub_lo
>>>> instr2 %vreg0<use>
>>>> restore1 %vreg1:sub_lo
>>>> instr3 %vreg1:sub_hi<def>
>>>> 
>>>> If it spills the whole register (both sub-registers lo and hi), the changes I made should be fine. Otherwise, I will have to find another way to prevent the problems I mentioned in r192119's commit log.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Oct 7, 2013 at 1:11 PM, Matthias Braun <matze at braunis.de <mailto:matze at braunis.de>> wrote:
>>>> I've been working on patches to improve subregister liveness tracking on llvm and I wanted to inform the llvm community about the overal design/motivation for them. I will send the patches to llvm-commits later today.
>>>> 
>>>> Greetings
>>>>     Matthias Braun
>>>> 
>>>> 
>>>> Subregisters in llvm
>>>> ====================
>>>> 
>>>> Some targets can access registers in different ways resulting in wider or
>>>> narrower accesses. For example on ARM NEON one of the single precision
>>>> floating point registers is called 'S0'. You may also access 'D0' on arm which
>>>> is the combination of 'S0' and 'S1' and can store a double prevision number or
>>>> 2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
>>>> 'S3' (or 'D0' and 'D1') and so on.
>>>> 
>>>> Before register allocation llvm machine code accesses values through virtual
>>>> registers, these get assigned to physical registers later. Each virtual
>>>> register has an assigned register class which is a set of physical registers.
>>>> So for example on ARM you have a register class containing all the 'SXX'
>>>> registers and another one containing all the 'DXX' registers, ...
>>>> 
>>>> But sometimes you want to mix narrow and wide accesses to values. Like loading
>>>> the 'D0' register but later reading the 'S0' and 'S1' components separately.
>>>> This is modeled with subregister operands which specify that only parts of a
>>>> wider value are accessed. For example the register class of the 'DXX'
>>>> registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
>>>> result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
>>>> register later.
>>>> 
>>>> Typical operations are decomposing wider values or composing wide values with
>>>> multiple smaller defs:
>>>> 
>>>> Decomposing:
>>>> %vreg1<def> = produce a 'D' value
>>>>             = use 'S' value %vreg1:ssub_0
>>>>             = use 'S' value %vreg1:ssub_1
>>>> 
>>>> Composing:
>>>> %vreg1:ssub_0<def,read-undef> = produce an 'S' value
>>>> %vreg1:ssub_1<def>            = produce an 'S' value
>>>>            = use a 'D' value %vreg1
>>>> 
>>>> Problems / Motivation
>>>> =====================
>>>> 
>>>> Currently the llvm register allocator tracks liveness for whole virtual
>>>> registers. This can lead to suboptimal code:
>>>> 
>>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>>> %vreg0:ssub_1<def> = produce an 'S' value
>>>>        = use a 'D' value %vreg0
>>>> %vreg1 = produce an 'S' value
>>>>        = use an 'S' value %vreg1
>>>>        = use an 'S' value %vreg0:ssub_0
>>>> 
>>>> The current code will realize that vreg0 and vreg1 interfere and assign them
>>>> to different registers like D0+S2 aka S0+S1+S2; while in reality after the
>>>> full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
>>>> subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
>>>> assignment would be D0+S1 aka S0+S1.
>>>> 
>>>> A even more pressing problem are artificial dependencies in the schedule
>>>> graph. This is a side effect of llvms live range information being represented
>>>> in a static single assignment like fashion: Every definition of a vreg starts
>>>> a new interval with a new value number. This means that partial register
>>>> writes must be modeled as an implicit use of the unwritten parts of a register
>>>> and force the creating of a new value number. This in turn leads to artificial
>>>> dependencies in the schedule graph for code like the following where all defs
>>>> should be independent:
>>>> 
>>>> %vreg0:ssub_0<def,read-undef> = produce an 'S' value
>>>> %vreg0:ssub_1<def>            = produce an 'S' value
>>>> %vreg0:ssub_2<def>            = produce an 'S' value
>>>> %vreg0:ssub_3<def>            = produce an 'S' value
>>>> 
>>>> 
>>>> Subegister liveness tracking
>>>> ============================
>>>> 
>>>> I developed a set of patches which enable liveness tracking on the subregister
>>>> level, to overcome the problems mentioned above. After these changes you can
>>>> have separate live ranges for subregisters of a virtual register. With these
>>>> patches the following code:
>>>> 
>>>>   16B  %vreg0:ssub_0<def,read-undef> = ...
>>>>   32B  %vreg0:ssub_1<def>            = ...
>>>>   48B               = %vreg0
>>>>   64B               = %vreg0:ssub_0
>>>>   80B  %vreg0 = ...
>>>>   96B         = %vreg0:ssub_1
>>>> 
>>>> will be represented as the following live range(s):
>>>> 
>>>>   Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
>>>>   SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
>>>>   SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
>>>> 
>>>> Patches/Changes:
>>>> * Moves live range management code in the LiveInterval class to a new
>>>>   class LiveRange, move the previous LiveRange class (which was just a single
>>>>   interval inside a live range) to LiveRange::Segment.
>>>>   LiveInterval is made a subclass of LiveRange, other code paths like
>>>>   register units liveness use LiveRange instead of LiveInterval now.
>>>> * Introduce a linked list of SubRange objects to the LiveInterval class.
>>>>   A SubRange is a subclass of LiveRange and contains a LaneMask indicating
>>>>   which subregisters are represented.
>>>> * Various algorithms have been adapted to calculate/preserve subregister
>>>>   liveness.
>>>> * The register allocator has been adapted to track interference at the
>>>>   subregister level (LaneMasks are mapped to register units)
>>>> 
>>>> Note that SubRegister liveness tracking has to be explicitely enabled by the
>>>> target architecture, as it does not provide enough benefits for the costs on
>>>> some targets (e.g. having subregister liveness for the lower/upper 8bit regs
>>>> on x86 provided nearly no benefits in the llvm-testsuite, so you can't justify
>>>> more computations/memory usage for that.
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160719/38af924a/attachment-0001.html>