[LLVMdev] Subregister liveness tracking
Matthias Braun
matze at braunis.de
Mon Oct 7 13:11:23 PDT 2013
I've been working on patches to improve subregister liveness tracking on
llvm and I wanted to inform the llvm community about the overal
design/motivation for them. I will send the patches to llvm-commits
later today.
Greetings
Matthias Braun
Subregisters in llvm
====================
Some targets can access registers in different ways resulting in wider or
narrower accesses. For example on ARM NEON one of the single precision
floating point registers is called 'S0'. You may also access 'D0' on arm
which
is the combination of 'S0' and 'S1' and can store a double prevision
number or
2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
'S3' (or 'D0' and 'D1') and so on.
Before register allocation llvm machine code accesses values through
virtual
registers, these get assigned to physical registers later. Each virtual
register has an assigned register class which is a set of physical
registers.
So for example on ARM you have a register class containing all the 'SXX'
registers and another one containing all the 'DXX' registers, ...
But sometimes you want to mix narrow and wide accesses to values. Like
loading
the 'D0' register but later reading the 'S0' and 'S1' components
separately.
This is modeled with subregister operands which specify that only parts
of a
wider value are accessed. For example the register class of the 'DXX'
registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
register later.
Typical operations are decomposing wider values or composing wide values
with
multiple smaller defs:
Decomposing:
%vreg1<def> = produce a 'D' value
= use 'S' value %vreg1:ssub_0
= use 'S' value %vreg1:ssub_1
Composing:
%vreg1:ssub_0<def,read-undef> = produce an 'S' value
%vreg1:ssub_1<def> = produce an 'S' value
= use a 'D' value %vreg1
Problems / Motivation
=====================
Currently the llvm register allocator tracks liveness for whole virtual
registers. This can lead to suboptimal code:
%vreg0:ssub_0<def,read-undef> = produce an 'S' value
%vreg0:ssub_1<def> = produce an 'S' value
= use a 'D' value %vreg0
%vreg1 = produce an 'S' value
= use an 'S' value %vreg1
= use an 'S' value %vreg0:ssub_0
The current code will realize that vreg0 and vreg1 interfere and assign
them
to different registers like D0+S2 aka S0+S1+S2; while in reality after the
full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
assignment would be D0+S1 aka S0+S1.
A even more pressing problem are artificial dependencies in the schedule
graph. This is a side effect of llvms live range information being
represented
in a static single assignment like fashion: Every definition of a vreg
starts
a new interval with a new value number. This means that partial register
writes must be modeled as an implicit use of the unwritten parts of a
register
and force the creating of a new value number. This in turn leads to
artificial
dependencies in the schedule graph for code like the following where all
defs
should be independent:
%vreg0:ssub_0<def,read-undef> = produce an 'S' value
%vreg0:ssub_1<def> = produce an 'S' value
%vreg0:ssub_2<def> = produce an 'S' value
%vreg0:ssub_3<def> = produce an 'S' value
Subegister liveness tracking
============================
I developed a set of patches which enable liveness tracking on the
subregister
level, to overcome the problems mentioned above. After these changes you
can
have separate live ranges for subregisters of a virtual register. With
these
patches the following code:
16B %vreg0:ssub_0<def,read-undef> = ...
32B %vreg0:ssub_1<def> = ...
48B = %vreg0
64B = %vreg0:ssub_0
80B %vreg0 = ...
96B = %vreg0:ssub_1
will be represented as the following live range(s):
Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)
Patches/Changes:
* Moves live range management code in the LiveInterval class to a new
class LiveRange, move the previous LiveRange class (which was just a
single
interval inside a live range) to LiveRange::Segment.
LiveInterval is made a subclass of LiveRange, other code paths like
register units liveness use LiveRange instead of LiveInterval now.
* Introduce a linked list of SubRange objects to the LiveInterval class.
A SubRange is a subclass of LiveRange and contains a LaneMask indicating
which subregisters are represented.
* Various algorithms have been adapted to calculate/preserve subregister
liveness.
* The register allocator has been adapted to track interference at the
subregister level (LaneMasks are mapped to register units)
Note that SubRegister liveness tracking has to be explicitely enabled by
the
target architecture, as it does not provide enough benefits for the
costs on
some targets (e.g. having subregister liveness for the lower/upper 8bit
regs
on x86 provided nearly no benefits in the llvm-testsuite, so you can't
justify
more computations/memory usage for that.
More information about the llvm-dev
mailing list