<div dir="ltr">Hi all,<div><br></div><div>I'm working on LLVM-based back-end. We have custom LLVM solution for sub-register liveness tracking.</div><div><br></div><div>Recently we tried to replace it with LLVM enableSubRegLiveness and run into multiple issues.</div><div>I can't share specific IR snippets now, but failures are related to spilling, undef sub-registers, etc.<br></div><div><br></div><div>There is Bug 17557 open since 2013: Enable subregister liveness for scheduling and register allocation.</div><div><a href="https://llvm.org/bugs/show_bug.cgi?id=17557">https://llvm.org/bugs/show_bug.cgi?id=17557</a></div><div><br></div><div>with TODO list:</div><div><pre class="" id="comment_text_0" style="white-space:pre-wrap;width:50em;color:rgb(0,0,0)">"These are the remaining steps to get Matthias' subregister liveness fully integrated:
- Fix LiveRegUnits to correctly handle regmasks.
- Benchmark/tune compile time.
- Enable subreg liveness on x86 for testing purposes.
- Use LiveRegUnits to fix ARM VMOV widening.
- Fix the scheduler's DAG builder to use bundler iterator, not operand index.
- Discard the master live range after coalescing so that LiveInterval updates don't need to preserve it when we reorder subregister defs.
- Enable subreg scheduling on all targets that enable MI scheduler."</pre></div><div><div><br></div><div>Is someone still working on bug fixes and enhancements? any pending patches?</div></div><div><br></div><div>R600 back-end is using sub-reg liveness: e.g. r238999 - R600: Re-enable sub-reg liveness (June 2015).</div><div>But it seems requirements and use cases are GPU-specific.<br></div><div><br></div><div>Does anyone use sub-reg liveness for RISC/CISC+SIMD targets?</div><div><br></div><div>Thanks,<br></div><div>Sergey</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 9, 2013 at 11:03 PM, Matthias Braun <span dir="ltr"><<a href="mailto:mbraun@apple.com" target="_blank">mbraun@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><br><div><span><div>On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <<a href="mailto:ahatanak@gmail.com" target="_blank">ahatanak@gmail.com</a>> wrote:</div><br><blockquote type="cite"><div dir="ltr"><div>What I didn't mention in r<span style="font-size:13px;font-family:arial,sans-serif">192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work.</span></div></div></blockquote></span><div>Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think.</div><span><br><blockquote type="cite"><div dir="ltr">
<div><span style="font-size:13px;font-family:arial,sans-serif"><br></span></div><div>For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:</div><div><br></div><div>1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3</div>
<div><div>2. mflo $4, $lo<use> // $4 <- $lo<br></div><div>3. mtlo $lo<def>, $6 // $lo <- $6. effectively clobbers $hi too.<br></div></div><div><div>4. mfhi $5, $hi<use> // $5 <- $hi</div><div>5. mthi $hi<def>, $7 // $hi <- $7</div>
</div><div><div>6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) </div><div><br></div><div>Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:</div>
<div><br></div><div><div>1. mult $lo<def>, $hi<def>, $2, $3 // $lo<def>, $hi<def> = $2 * $3</div><div><div>2. mflo $4, $lo<use> // $4 <- $lo<br></div><div>3. mfhi $5, $hi<use> // $5 <- $hi<br>
</div></div><div>4. mthi $hi<def>, $7 // $hi <- $7.<br></div><div><div>5. mtlo $lo<def>, $6 // $lo <- $6. This does not clobber $hi.<br></div><div></div><div>6. madd $lo<def>, $hi<def>, $8, $9, $lo<use>, $hi<use> // $lo<def>, $hi<def> = $2 * $3 + (lo,hi) </div>
<div><br></div><div>Probably I can define a pseudo instruction "mthilo" that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint 'l' for 'lo' register).</div></div></div></div></div></blockquote></span><div>That is probably the cleanest solution, with the only downside being that the scheduler can’t place instruction between the mthi and mtlo anymore.</div><div><br></div><div>Greetings</div><span><font color="#888888"><div><span style="white-space:pre-wrap"> </span>Matthias</div></font></span><div><div><br><blockquote type="cite"><div dir="ltr"><div><div><div>
</div></div></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Oct 8, 2013 at 1:04 PM, Matthias Braun <span dir="ltr"><<a href="mailto:matze@braunis.de" target="_blank">matze@braunis.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div>
<div bgcolor="#FFFFFF" text="#000000">
<br>
<div>
<div>Currently it will always spill /
restore the whole vreg but only spilling the parts that are
actually live would be a nice addition in the future.<br>
<br>
Looking at r192119’: if “mtlo” writes to $LO and sets $HI to an
unpredictable value, then it should just have an additional
(dead) def operand for $hi, shouldn’t it?<br>
<br>
Greetings<br>
Matthias<br>
<br>
Am 10/8/13, 11:03 AM, schrieb Akira Hatanaka:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi,</div>
<div><br>
</div>
<div>I have a question about the way sub-registers are spilled
and restored that is related to the changes I made in
r192119.</div>
<div><br>
</div>
<div>Suppose I have the following piece of code with four
instructions. %vreg0 and %vreg1 consist of two sub-registers
indexed by sub_lo and sub_hi.</div>
<div><br>
</div>
instr0 %vreg0<def>
<div>instr1 %vreg1:sub_lo<def<span style="font-family:arial,sans-serif;font-size:13px">,read-undef</span>><br>
</div>
<div>instr2 %vreg0<use></div>
<div>
<div>instr3 %vreg1:sub_hi<def></div>
</div>
<div><br>
</div>
<div>If register allocator decides to insert spill and restore
instructions for %vreg0, will it spill the whole register
that includes sub-registers lo and hi?</div>
<div><br>
</div>
<div>instr0 %vreg0<def></div>
<div>spill0 %vreg0<br>
<div>instr1 %vreg1:sub_lo<def<span style="font-family:arial,sans-serif;font-size:13px">,read-undef</span>><br>
</div>
spill1 %vreg1:sub_lo<br>
restore0 %vreg0<br>
<div>instr2 %vreg0<use></div>
restore1 %vreg1:sub_lo<br>
<div>instr3 %vreg1:sub_hi<def></div>
</div>
<div><br>
</div>
<div>Or will it spill just the lo sub-register?</div>
<div><br>
</div>
<div>
<div>instr0 %vreg0<def></div>
<div>spill0 %vreg0:sub_lo<br>
<div>instr1 %vreg1:sub_lo<def<span style="font-family:arial,sans-serif;font-size:13px">,read-undef</span>><br>
</div>
spill1 %vreg1:sub_lo<br>
restore0 %vreg0:sub_lo<br>
<div>instr2 %vreg0<use></div>
restore1 %vreg1:sub_lo<br>
<div>instr3 %vreg1:sub_hi<def></div>
</div>
</div>
<div><br>
</div>
<div>If it spills the whole register (both sub-registers lo
and hi), the changes I made should be fine. Otherwise, I
will have to find another way to prevent the problems I
mentioned in r192119's commit log.</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Oct 7, 2013 at 1:11 PM,
Matthias Braun <span dir="ltr"><<a href="mailto:matze@braunis.de" target="_blank">matze@braunis.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">I've
been working on patches to improve subregister liveness
tracking on llvm and I wanted to inform the llvm community
about the overal design/motivation for them. I will send
the patches to llvm-commits later today.<br>
<br>
Greetings<br>
Matthias Braun<br>
<br>
<br>
Subregisters in llvm<br>
====================<br>
<br>
Some targets can access registers in different ways
resulting in wider or<br>
narrower accesses. For example on ARM NEON one of the
single precision<br>
floating point registers is called 'S0'. You may also
access 'D0' on arm which<br>
is the combination of 'S0' and 'S1' and can store a double
prevision number or<br>
2 single precision floats. 'Q0' is the combination of
'S0', 'S1', 'S2' and<br>
'S3' (or 'D0' and 'D1') and so on.<br>
<br>
Before register allocation llvm machine code accesses
values through virtual<br>
registers, these get assigned to physical registers later.
Each virtual<br>
register has an assigned register class which is a set of
physical registers.<br>
So for example on ARM you have a register class containing
all the 'SXX'<br>
registers and another one containing all the 'DXX'
registers, ...<br>
<br>
But sometimes you want to mix narrow and wide accesses to
values. Like loading<br>
the 'D0' register but later reading the 'S0' and 'S1'
components separately.<br>
This is modeled with subregister operands which specify
that only parts of a<br>
wider value are accessed. For example the register class
of the 'DXX'<br>
registers supports subregisters calls 'ssub_0' and
'ssub_1' which would<br>
result in 'S4' and 'S5' getting used if 'D2' is assigned
to the virtual<br>
register later.<br>
<br>
Typical operations are decomposing wider values or
composing wide values with<br>
multiple smaller defs:<br>
<br>
Decomposing:<br>
%vreg1<def> = produce a 'D' value<br>
= use 'S' value %vreg1:ssub_0<br>
= use 'S' value %vreg1:ssub_1<br>
<br>
Composing:<br>
%vreg1:ssub_0<def,read-undef> = produce an 'S' value<br>
%vreg1:ssub_1<def> = produce an 'S' value<br>
= use a 'D' value %vreg1<br>
<br>
Problems / Motivation<br>
=====================<br>
<br>
Currently the llvm register allocator tracks liveness for
whole virtual<br>
registers. This can lead to suboptimal code:<br>
<br>
%vreg0:ssub_0<def,read-undef> = produce an 'S' value<br>
%vreg0:ssub_1<def> = produce an 'S' value<br>
= use a 'D' value %vreg0<br>
%vreg1 = produce an 'S' value<br>
= use an 'S' value %vreg1<br>
= use an 'S' value %vreg0:ssub_0<br>
<br>
The current code will realize that vreg0 and vreg1
interfere and assign them<br>
to different registers like D0+S2 aka S0+S1+S2; while in
reality after the<br>
full use of %vreg0 only %vreg0::ssub_0 must remain in a
register while the<br>
subregister used for %vreg0:ssub_1 can be reassigned to
%vreg1. An ideal<br>
assignment would be D0+S1 aka S0+S1.<br>
<br>
A even more pressing problem are artificial dependencies
in the schedule<br>
graph. This is a side effect of llvms live range
information being represented<br>
in a static single assignment like fashion: Every
definition of a vreg starts<br>
a new interval with a new value number. This means that
partial register<br>
writes must be modeled as an implicit use of the unwritten
parts of a register<br>
and force the creating of a new value number. This in turn
leads to artificial<br>
dependencies in the schedule graph for code like the
following where all defs<br>
should be independent:<br>
<br>
%vreg0:ssub_0<def,read-undef> = produce an 'S' value<br>
%vreg0:ssub_1<def> = produce an 'S' value<br>
%vreg0:ssub_2<def> = produce an 'S' value<br>
%vreg0:ssub_3<def> = produce an 'S' value<br>
<br>
<br>
Subegister liveness tracking<br>
============================<br>
<br>
I developed a set of patches which enable liveness
tracking on the subregister<br>
level, to overcome the problems mentioned above. After
these changes you can<br>
have separate live ranges for subregisters of a virtual
register. With these<br>
patches the following code:<br>
<br>
16B %vreg0:ssub_0<def,read-undef> = ...<br>
32B %vreg0:ssub_1<def> = ...<br>
48B = %vreg0<br>
64B = %vreg0:ssub_0<br>
80B %vreg0 = ...<br>
96B = %vreg0:ssub_1<br>
<br>
will be represented as the following live range(s):<br>
<br>
Common LiveRange: [16r,32r)[32r,64r),[80r,96r)<br>
SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)<br>
SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)<br>
<br>
Patches/Changes:<br>
* Moves live range management code in the LiveInterval
class to a new<br>
class LiveRange, move the previous LiveRange class
(which was just a single<br>
interval inside a live range) to LiveRange::Segment.<br>
LiveInterval is made a subclass of LiveRange, other code
paths like<br>
register units liveness use LiveRange instead of
LiveInterval now.<br>
* Introduce a linked list of SubRange objects to the
LiveInterval class.<br>
A SubRange is a subclass of LiveRange and contains a
LaneMask indicating<br>
which subregisters are represented.<br>
* Various algorithms have been adapted to
calculate/preserve subregister<br>
liveness.<br>
* The register allocator has been adapted to track
interference at the<br>
subregister level (LaneMasks are mapped to register
units)<br>
<br>
Note that SubRegister liveness tracking has to be
explicitely enabled by the<br>
target architecture, as it does not provide enough
benefits for the costs on<br>
some targets (e.g. having subregister liveness for the
lower/upper 8bit regs<br>
on x86 provided nearly no benefits in the llvm-testsuite,
so you can't justify<br>
more computations/memory usage for that.<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>
<a href="http://llvm.cs.uiuc.edu/" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu/" target="_blank">http://llvm.cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
<br>
</div>
<br>
</div>
</div></div><br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu/" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br></div>
_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div></div></div><br></div><br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" rel="noreferrer" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br></div></div>