[llvm-dev] [RFC] Register Rematerialization (remat) Extension

Mon Sep 12 08:51:40 PDT 2016

Hello Developers,

I am working with my other batchmates to improve register remat in LLVM.
We want to remat live ranges made of multiple instruction.

Just to support our proposal here is a simple example that currently remat
does
not cover

$ cat ~/tmp/tl.c
void foo(long);
void bar() {
  for (int i = 0; i < 1600; ++i)
    foo(3494348345984503943);
}

$ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
...
# BB#0:                                 # %entry
...
        lis 3, 12414
        ori 3, 3, 27470
        sldi 3, 3, 32
        oris 3, 3, 35809
        ori 30, 3, 20615
...
.LBB0_1:                                # %for.body
        mr 3, 30
        bl foo
...

There is a sequence of instructions used to materialize the constant, the
first
one (the lis) is trivially rematerialiable, and the others depend only on
that one,
and have no side effects. If we otherwise needed to spill the constant, we
might
wish to move the entire set of instructions that compute the value into the
loop body.
(Many thanks to Hal Finkel for this example and head start)

We are following very old but effective paper "Rematerialization"
http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]

This extension will specially improve code quality for RICS backends like
powerpc, MIPS, ARM, AArch64 etc.

Here is a tentative apporach ( after reading the above mentioned paper and
current remat code) that I would like to follow.

Please share your views because this may be totally wrong direction. Also I
will
be happy if this gets into main line LLVM code but if community don't want
to make remat heavy than please guide me for my class project perspective.

1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
think it does not require to build SSA graph and converting it back after
optimization completed as mentioned in [1]

2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
Constant
Propagation based on Wegman and Zadeck's work
http://dl.acm.org/citation.cfm?id=103136) as desribed in [1]. This pass
will be scheduled to run before register allocation.

3 ) Output of the pass added in Step 2 will be a Map of def to instructions
pointers (instructions which can be used to remat the given live range).
The map will contain live ranges which is due to single instruction and
multiple instructions.

4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
Map
when a spill is required for RA.

5 ) The remat transformation APIs like rematerializeAt() will be teached to
remat
live ranges with multiple instructions too.

6 ) A cost analysis will be require to decide between remat and spill. This
should be based on at least two factors register pressure and spill cost

Few points:
--------------
* The analysis pass to be addes as per (2) will use target specific
information
from TargetInstrInfo.cpp as the current remat infrastructure uses.

* This approach will not be on demand as the current approach is (i.e remat
specific
code will be executed only if there is a spill) so the pass in (2) can be an
overhead so we may want it to enable only for higher level of optimization.

* Will it be possible to use existing SCCP.cpp code with few modification
to lattice
and related mathematical operation so that it can serve both purpose?

* No changes in current register allocators or spill framework will be
required
because remat entry point will be LiveRangeEdit.

Any other way with less overhead is always welcomed.
Please help us developing a plan to implement this.

Hoping for comments!

Sincerely,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/215d72cf/attachment.html>