[llvm-dev] [RFC] Register Rematerialization (remat) Extension
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Sat Sep 24 10:39:00 PDT 2016
----- Original Message -----
> From: "Bruce Hoult" <bruce at hoult.org>
> To: "vivek pandya" <vivekvpandya at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Hal Finkel"
> <hfinkel at anl.gov>, "Matthias Braun" <matze at braunis.de>
> Sent: Monday, September 19, 2016 9:10:17 AM
> Subject: Re: [llvm-dev] [RFC] Register Rematerialization (remat)
> Extension
> The idea seems sound, but do you really have a CPU in which such a
> complex rematerialization is better than an L1 cache load from the
> stack frame?
> lis 3, 12414
> ori 3, 3, 27470
> sldi 3, 3, 32
> oris 3, 3, 35809
> ori 30, 3, 20615
> I'm not familiar with modern PPC64 but seems like a lose on PPC G5
> and from the docs I quickly found (2 cycle latency on dependent int
> ALU ops) Power8 too.
> OK, maybe (if I didn't screw up the rldimi):
> lis 3, 12414
> lis 30, 35809
> ori 3, 3, 27470
> ori 30, 30, 20615
> rldimi 30, 3, 32, 0
> Or is there something that optimizes such sequences building
> constants?
I don't know about the G5, but I did some experiments on the P8 and I was unable to distinguish the performance of a load vs. the materialization sequence. This matches my expectations: The load should have a load-to-use latency of 3 cycles. The materialization-sequence instructions can issue together, each instruction has only a 1-cycle latency with forwarding (and 2 can execute per cycle), and the height of the dependency chain is 3 instructions. All other things being roughly equal, keeping pressure off of the memory subsystem tends to trump other concerns.
-Hal
> On Mon, Sep 12, 2016 at 6:51 PM, vivek pandya via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > Hello Developers,
>
> > I am working with my other batchmates to improve register remat in
> > LLVM.
>
> > We want to remat live ranges made of multiple instruction.
>
> > Just to support our proposal here is a simple example that
> > currently
> > remat does
>
> > not cover
>
> > $ cat ~/tmp/tl.c
>
> > void foo(long);
>
> > void bar() {
>
> > for (int i = 0; i < 1600; ++i)
>
> > foo(3494348345984503943);
>
> > }
>
> > $ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
>
> > ...
>
> > # BB#0: # %entry
>
> > ...
>
> > lis 3, 12414
>
> > ori 3, 3, 27470
>
> > sldi 3, 3, 32
>
> > oris 3, 3, 35809
>
> > ori 30, 3, 20615
>
> > ...
>
> > .LBB0_1: # %for.body
>
> > mr 3, 30
>
> > bl foo
>
> > ...
>
> > There is a sequence of instructions used to materialize the
> > constant,
> > the first
>
> > one (the lis) is trivially rematerialiable, and the others depend
> > only on that one,
>
> > and have no side effects. If we otherwise needed to spill the
> > constant, we might
>
> > wish to move the entire set of instructions that compute the value
> > into the loop body.
>
> > (Many thanks to Hal Finkel for this example and head start)
>
> > We are following very old but effective paper "Rematerialization"
>
> > http://dl.acm.org/citation.cfm?id=143143
> > ------------------------------[1]
>
> > This extension will specially improve code quality for RICS
> > backends
> > like
>
> > powerpc, MIPS, ARM, AArch64 etc.
>
> > Here is a tentative apporach ( after reading the above mentioned
> > paper and current remat code) that I would like to follow.
>
> > Please share your views because this may be totally wrong
> > direction.
> > Also I will
>
> > be happy if this gets into main line LLVM code but if community
> > don't
> > want
>
> > to make remat heavy than please guide me for my class project
> > perspective.
>
> > 1 ) As LLVM MI is already in SSA form before reg allocation so for
> > LLVM I think it does not require to build SSA graph and converting
> > it back after optimization completed as mentioned in [1]
>
> > 2 ) We would like to add a pass similar to SCCP.cpp (Sparse
> > Conditional Constant
>
> > Propagation based on Wegman and Zadeck's work
> > http://dl.acm.org/citation.cfm?id=103136 ) as desribed in [1]. This
> > pass will be scheduled to run before register allocation.
>
> > 3 ) Output of the pass added in Step 2 will be a Map of def to
> > instructions pointers (instructions which can be used to remat the
> > given live range). The map will contain live ranges which is due to
> > single instruction and multiple instructions.
>
> > 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis
> > from the Map
>
> > when a spill is required for RA.
>
> > 5 ) The remat transformation APIs like rematerializeAt() will be
> > teached to remat
>
> > live ranges with multiple instructions too.
>
> > 6 ) A cost analysis will be require to decide between remat and
> > spill. This should be based on at least two factors register
> > pressure and spill cost
>
> > Few points:
>
> > --------------
>
> > * The analysis pass to be addes as per (2) will use target specific
> > information
>
> > from TargetInstrInfo.cpp as the current remat infrastructure uses.
>
> > * This approach will not be on demand as the current approach is
> > (i.e
> > remat specific
>
> > code will be executed only if there is a spill) so the pass in (2)
> > can be an
>
> > overhead so we may want it to enable only for higher level of
> > optimization.
>
> > * Will it be possible to use existing SCCP.cpp code with few
> > modification to lattice
>
> > and related mathematical operation so that it can serve both
> > purpose?
>
> > * No changes in current register allocators or spill framework will
> > be required
>
> > because remat entry point will be LiveRangeEdit.
>
> > Any other way with less overhead is always welcomed.
>
> > Please help us developing a plan to implement this.
>
> > Hoping for comments!
>
> > Sincerely,
>
> > Vivek
>
> > _______________________________________________
>
> > LLVM Developers mailing list
>
> > llvm-dev at lists.llvm.org
>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160924/9fcb559e/attachment.html>
More information about the llvm-dev
mailing list