[llvm-dev] [RFC] Register Rematerialization (remat) Extension

Tue Sep 13 09:03:29 PDT 2016

On Tue, Sep 13, 2016 at 8:20 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
>
> On 09/12/2016 08:51 AM, vivek pandya via llvm-dev wrote:
>
> Hello Developers,
>
> I am working with my other batchmates to improve register remat in LLVM.
> We want to remat live ranges made of multiple instruction.
>
> Just to support our proposal here is a simple example that currently remat
> does
> not cover
>
> $ cat ~/tmp/tl.c
> void foo(long);
> void bar() {
>   for (int i = 0; i < 1600; ++i)
>     foo(3494348345984503943);
> }
>
> $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> ...
> # BB#0:                                 # %entry
> ...
>         lis 3, 12414
>         ori 3, 3, 27470
>         sldi 3, 3, 32
>         oris 3, 3, 35809
>         ori 30, 3, 20615
> ...
> .LBB0_1:                                # %for.body
>         mr 3, 30
>         bl foo
> ...
>
> OT: Instead of viewing this as a *single* remat problem, can you view this
> as a series of remat problems?  If each instruction can be moved while only
> increasing the live range of a single register and shrinking the live range
> of another, then moving each instruction one by one and iterating gives the
> same effect.  Note that this only (obviously) works for single use defs.
> Extending it to multiple uses would require a more complicated cost model
> as you point out.
>

Hello Philip,

Your idea seems to be effective and can be tried out with few changes. But
correct me if I got this wrong, by multiple uses do you mean a same remat
sequence being remat again? In that case we can run the analysis again to
decide which all instructions are required to be remat or we may add a
simple caching layer which can hold analysis result for once compilation
unit.
Any thoughts ?

>
>
>
> There is a sequence of instructions used to materialize the constant, the
> first
> one (the lis) is trivially rematerialiable, and the others depend only on
> that one,
> and have no side effects. If we otherwise needed to spill the constant, we
> might
> wish to move the entire set of instructions that compute the value into
> the loop body.
> (Many thanks to Hal Finkel for this example and head start)
>
> We are following very old but effective paper "Rematerialization"
> http://dl.acm.org/citation.cfm?id=143143 ------------------------------[1]
>
> This extension will specially improve code quality for RICS backends like
> powerpc, MIPS, ARM, AArch64 etc.
>
> Here is a tentative apporach ( after reading the above mentioned paper and
> current remat code) that I would like to follow.
>
> Please share your views because this may be totally wrong direction. Also
> I will
> be happy if this gets into main line LLVM code but if community don't want
> to make remat heavy than please guide me for my class project perspective.
>
> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM I
> think it does not require to build SSA graph and converting it back after
> optimization completed as mentioned in [1]
>
> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
> Constant
> Propagation based on Wegman and Zadeck's work http://dl.acm.org/citation.
> cfm?id=103136) as desribed in [1]. This pass will be scheduled to run
> before register allocation.
>
> 3 ) Output of the pass added in Step 2 will be a Map of def to
> instructions pointers (instructions which can be used to remat the given
> live range). The map will contain live ranges which is due to single
> instruction and multiple instructions.
>
> This sounds overly complex.  Can you implement this without needing the
> new side structure?  Maintaining extra state and keeping it up to date is
> expensive.  (From a maintenance and code complexity perspective.)
>
>
> Currently I don't think we have any facility to annotate instructions so
to keep analysis available till register allocation we need an immutable
pass.

> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from the
> Map
> when a spill is required for RA.
>
> 5 ) The remat transformation APIs like rematerializeAt() will be teached
> to remat
> live ranges with multiple instructions too.
>
> 6 ) A cost analysis will be require to decide between remat and spill.
> This should be based on at least two factors register pressure and spill
> cost
>
> Few points:
> --------------
> * The analysis pass to be addes as per (2) will use target specific
> information
> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>
> * This approach will not be on demand as the current approach is (i.e
> remat specific
> code will be executed only if there is a spill) so the pass in (2) can be
> an
> overhead so we may want it to enable only for higher level of optimization.
>
> This would be unfortunate.  Not fatal, just unfortunate.
>
>
> -Vivek

> * Will it be possible to use existing SCCP.cpp code with few modification
> to lattice
> and related mathematical operation so that it can serve both purpose?
>
> * No changes in current register allocators or spill framework will be
> required
> because remat entry point will be LiveRangeEdit.
>
> Any other way with less overhead is always welcomed.
> Please help us developing a plan to implement this.
>
> Hoping for comments!
>
> Sincerely,
> Vivek
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160913/6f6a6105/attachment.html>