[llvm-dev] [RFC] Register Rematerialization (remat) Extension

Sat Sep 24 10:39:00 PDT 2016

----- Original Message -----

> From: "Bruce Hoult" <bruce at hoult.org>
> To: "vivek pandya" <vivekvpandya at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Hal Finkel"
> <hfinkel at anl.gov>, "Matthias Braun" <matze at braunis.de>
> Sent: Monday, September 19, 2016 9:10:17 AM
> Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat)
> Extension

> The idea seems sound, but do you really have a CPU in which such a
> complex rematerialization is better than an L1 cache load from the
> stack frame?

> lis 3, 12414
> ori 3, 3, 27470
> sldi 3, 3, 32
> oris 3, 3, 35809
> ori 30, 3, 20615

> I'm not familiar with modern PPC64 but seems like a lose on PPC G5
> and from the docs I quickly found (2 cycle latency on dependent int
> ALU ops) Power8 too.

> OK, maybe (if I didn't screw up the rldimi):

> lis 3, 12414
> lis 30, 35809
> ori 3, 3, 27470
> ori 30, 30, 20615

> rldimi 30, 3, 32, 0

> Or is there something that optimizes such sequences building
> constants?
I don't know about the G5, but I did some experiments on the P8 and I was unable to distinguish the performance of a load vs. the materialization sequence. This matches my expectations: The load should have a load-to-use latency of 3 cycles. The materialization-sequence instructions can issue together, each instruction has only a 1-cycle latency with forwarding (and 2 can execute per cycle), and the height of the dependency chain is 3 instructions. All other things being roughly equal, keeping pressure off of the memory subsystem tends to trump other concerns. 

-Hal 

> On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:

> > Hello Developers,
> 

> > I am working with my other batchmates to improve register remat in
> > LLVM.
> 
> > We want to remat live ranges made of multiple instruction.
> 

> > Just to support our proposal here is a simple example that
> > currently
> > remat does
> 
> > not cover
> 

> > $ cat ~/tmp/tl.c
> 
> > void foo(long);
> 
> > void bar() {
> 
> > for (int i = 0; i < 1600; ++i)
> 
> > foo(3494348345984503943);
> 
> > }
> 

> > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
> 
> > ...
> 
> > # BB#0: # %entry
> 
> > ...
> 
> > lis 3, 12414
> 
> > ori 3, 3, 27470
> 
> > sldi 3, 3, 32
> 
> > oris 3, 3, 35809
> 
> > ori 30, 3, 20615
> 
> > ...
> 
> > .LBB0_1: # %for.body
> 
> > mr 3, 30
> 
> > bl foo
> 
> > ...
> 

> > There is a sequence of instructions used to materialize the
> > constant,
> > the first
> 
> > one (the lis) is trivially rematerialiable, and the others depend
> > only on that one,
> 
> > and have no side effects. If we otherwise needed to spill the
> > constant, we might
> 
> > wish to move the entire set of instructions that compute the value
> > into the loop body.
> 
> > (Many thanks to Hal Finkel for this example and head start)
> 

> > We are following very old but effective paper "Rematerialization"
> 
> > http://dl.acm.org/citation.cfm?id=143143
> > ------------------------------[1]
> 

> > This extension will specially improve code quality for RICS
> > backends
> > like
> 
> > powerpc, MIPS, ARM, AArch64 etc.
> 

> > Here is a tentative apporach ( after reading the above mentioned
> > paper and current remat code) that I would like to follow.
> 

> > Please share your views because this may be totally wrong
> > direction.
> > Also I will
> 
> > be happy if this gets into main line LLVM code but if community
> > don't
> > want
> 
> > to make remat heavy than please guide me for my class project
> > perspective.
> 

> > 1 ) As LLVM MI is already in SSA form before reg allocation so for
> > LLVM I think it does not require to build SSA graph and converting
> > it back after optimization completed as mentioned in [1]
> 

> > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse
> > Conditional Constant
> 
> > Propagation based on Wegman and Zadeck's work
> > http://dl.acm.org/citation.cfm?id=103136 ) as desribed in [1]. This
> > pass will be scheduled to run before register allocation.
> 

> > 3 ) Output of the pass added in Step 2 will be a Map of def to
> > instructions pointers (instructions which can be used to remat the
> > given live range). The map will contain live ranges which is due to
> > single instruction and multiple instructions.
> 

> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis
> > from the Map
> 
> > when a spill is required for RA.
> 

> > 5 ) The remat transformation APIs like rematerializeAt() will be
> > teached to remat
> 
> > live ranges with multiple instructions too.
> 

> > 6 ) A cost analysis will be require to decide between remat and
> > spill. This should be based on at least two factors register
> > pressure and spill cost
> 

> > Few points:
> 
> > --------------
> 
> > * The analysis pass to be addes as per (2) will use target specific
> > information
> 
> > from TargetInstrInfo.cpp as the current remat infrastructure uses.
> 

> > * This approach will not be on demand as the current approach is
> > (i.e
> > remat specific
> 
> > code will be executed only if there is a spill) so the pass in (2)
> > can be an
> 
> > overhead so we may want it to enable only for higher level of
> > optimization.
> 

> > * Will it be possible to use existing SCCP.cpp code with few
> > modification to lattice
> 
> > and related mathematical operation so that it can serve both
> > purpose?
> 

> > * No changes in current register allocators or spill framework will
> > be required
> 
> > because remat entry point will be LiveRangeEdit.
> 

> > Any other way with less overhead is always welcomed.
> 
> > Please help us developing a plan to implement this.
> 

> > Hoping for comments!
> 

> > Sincerely,
> 
> > Vivek
> 

> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160924/9fcb559e/attachment.html>