[LLVMdev] Subregister liveness tracking

Mon Oct 7 13:11:23 PDT 2013

I've been working on patches to improve subregister liveness tracking on 
llvm and I wanted to inform the llvm community about the overal 
design/motivation for them. I will send the patches to llvm-commits 
later today.

Greetings
     Matthias Braun

Subregisters in llvm
====================

Some targets can access registers in different ways resulting in wider or
narrower accesses. For example on ARM NEON one of the single precision
floating point registers is called 'S0'. You may also access 'D0' on arm 
which
is the combination of 'S0' and 'S1' and can store a double prevision 
number or
2 single precision floats. 'Q0' is the combination of 'S0', 'S1', 'S2' and
'S3' (or 'D0' and 'D1') and so on.

Before register allocation llvm machine code accesses values through 
virtual
registers, these get assigned to physical registers later. Each virtual
register has an assigned register class which is a set of physical 
registers.
So for example on ARM you have a register class containing all the 'SXX'
registers and another one containing all the 'DXX' registers, ...

But sometimes you want to mix narrow and wide accesses to values. Like 
loading
the 'D0' register but later reading the 'S0' and 'S1' components 
separately.
This is modeled with subregister operands which specify that only parts 
of a
wider value are accessed. For example the register class of the 'DXX'
registers supports subregisters calls 'ssub_0' and 'ssub_1' which would
result in 'S4' and 'S5' getting used if 'D2' is assigned to the virtual
register later.

Typical operations are decomposing wider values or composing wide values 
with
multiple smaller defs:

Decomposing:
%vreg1<def> = produce a 'D' value
             = use 'S' value %vreg1:ssub_0
             = use 'S' value %vreg1:ssub_1

Composing:
%vreg1:ssub_0<def,read-undef> = produce an 'S' value
%vreg1:ssub_1<def>            = produce an 'S' value
            = use a 'D' value %vreg1

Problems / Motivation
=====================

Currently the llvm register allocator tracks liveness for whole virtual
registers. This can lead to suboptimal code:

%vreg0:ssub_0<def,read-undef> = produce an 'S' value
%vreg0:ssub_1<def> = produce an 'S' value
        = use a 'D' value %vreg0
%vreg1 = produce an 'S' value
        = use an 'S' value %vreg1
        = use an 'S' value %vreg0:ssub_0

The current code will realize that vreg0 and vreg1 interfere and assign 
them
to different registers like D0+S2 aka S0+S1+S2; while in reality after the
full use of %vreg0 only %vreg0::ssub_0 must remain in a register while the
subregister used for %vreg0:ssub_1 can be reassigned to %vreg1. An ideal
assignment would be D0+S1 aka S0+S1.

A even more pressing problem are artificial dependencies in the schedule
graph. This is a side effect of llvms live range information being 
represented
in a static single assignment like fashion: Every definition of a vreg 
starts
a new interval with a new value number. This means that partial register
writes must be modeled as an implicit use of the unwritten parts of a 
register
and force the creating of a new value number. This in turn leads to 
artificial
dependencies in the schedule graph for code like the following where all 
defs
should be independent:

%vreg0:ssub_0<def,read-undef> = produce an 'S' value
%vreg0:ssub_1<def>            = produce an 'S' value
%vreg0:ssub_2<def>            = produce an 'S' value
%vreg0:ssub_3<def>            = produce an 'S' value

Subegister liveness tracking
============================

I developed a set of patches which enable liveness tracking on the 
subregister
level, to overcome the problems mentioned above. After these changes you 
can
have separate live ranges for subregisters of a virtual register. With 
these
patches the following code:

   16B  %vreg0:ssub_0<def,read-undef> = ...
   32B  %vreg0:ssub_1<def>            = ...
   48B               = %vreg0
   64B               = %vreg0:ssub_0
   80B  %vreg0 = ...
   96B         = %vreg0:ssub_1

will be represented as the following live range(s):

   Common LiveRange: [16r,32r)[32r,64r),[80r,96r)
   SubRange with Mask 0x0004 (=ssub_0): [16r,64r)[80r,80d)
   SubRange with Mask 0x0008 (=ssub_1): [32r,48r)[80r,96r)

Patches/Changes:
* Moves live range management code in the LiveInterval class to a new
   class LiveRange, move the previous LiveRange class (which was just a 
single
   interval inside a live range) to LiveRange::Segment.
   LiveInterval is made a subclass of LiveRange, other code paths like
   register units liveness use LiveRange instead of LiveInterval now.
* Introduce a linked list of SubRange objects to the LiveInterval class.
   A SubRange is a subclass of LiveRange and contains a LaneMask indicating
   which subregisters are represented.
* Various algorithms have been adapted to calculate/preserve subregister
   liveness.
* The register allocator has been adapted to track interference at the
   subregister level (LaneMasks are mapped to register units)

Note that SubRegister liveness tracking has to be explicitely enabled by 
the
target architecture, as it does not provide enough benefits for the 
costs on
some targets (e.g. having subregister liveness for the lower/upper 8bit 
regs
on x86 provided nearly no benefits in the llvm-testsuite, so you can't 
justify
more computations/memory usage for that.