[llvm-dev] [RFC] Register Rematerialization (remat) Extension

Tue Sep 27 15:00:14 PDT 2016

Hello Quentin,

I am certainly looking to implement best what can be achieved and that will
also require a cost estimation module which can address the mentioned bug.
How ever I am still looking to some simpler (or less heavy)  approach than
what I proposed so that whatever I implement that can be sent to upstream.

Sincerely,
Vivek

On Wed, Sep 28, 2016 at 2:40 AM, Quentin Colombet <qcolombet at apple.com>
wrote:

>
> On Sep 26, 2016, at 10:12 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>
> ------------------------------
>
> *From: *"Quentin Colombet via llvm-dev" <llvm-dev at lists.llvm.org>
> *To: *"vivek pandya" <vivekvpandya at gmail.com>
> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Nirav Rana" <
> h2015087 at pilani.bits-pilani.ac.in>, "Matthias Braun" <matze at braunis.de>
> *Sent: *Monday, September 19, 2016 1:27:10 PM
> *Subject: *Re: [llvm-dev] [RFC] Register Rematerialization (remat)
> Extension
>
> Hi Vivek,
>
> On Sep 19, 2016, at 11:17 AM, vivek pandya via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Mon, Sep 19, 2016 at 6:21 PM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
>> Hi,
>>
>> I've been looking at this myself for ARM, and came up with a much simpler
>> solution: lower immediate materializations to a post-RA pseudo and expand
>> the chain of materialization instructions after register allocation /
>> remat. Remat only sees one instruction with no dependencies.
>>
>> Did you look down this route and discount it?
>>
> No actually I am not much familiar with this topic so I mostly reply on
> research papers available.
> But your idea seems to be simple and good solution but I am not sure if
> this can cover every possible cases.
>
>
> This is the way all targets deal with simple rematerialization involving
> several instructions in LLVM AFAIK.
>
> Basically, the target defines a pseudo instruction that encodes this
> sequence of instructions and expands it after register allocation. This is
> a case by case thing, there is no patch that can be generalized for other
> target.
> For instance, look at the expansion of AArch64::MOVi64imm.
>
> The bottom line is, our rematerialization scheme is currently limited, but
> I am not sure your proposal get us beyond what we already support.
>
> I might have misunderstood the proposal, but why do you say that? The
> problem is not limited to constants (as perhaps evidenced by Ivan Baev's
> talk at the 2014 dev meeting). One basic thing we should get, for example,
> is:
>
>   q = ...;
>   r = ...;
>   for (...) {
>     // complicated stuff
>     foo(q, r, q - r); // should prefer to remat (q-r) here instead of
> spilling.
>
>
> I agree, but this is not what Vivek is aiming at, is he?
>
>   }
>
> Also, this is all perhaps related to https://llvm.org/bugs/show_
> bug.cgi?id=25373#c9
>
> Thanks again,
> Hal
>
>
> Cheers,
> Q.
>
>
> Do you have a patch for this? I can work on this to make it work for other
> architectures for which this will be beneficial.
>
> Sincerely,
> Vivek
>
>>
>> Cheers,
>>
>> James
>>
>> On Wed, 14 Sep 2016 at 02:43 Gerolf Hoflehner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Sep 12, 2016, at 10:14 AM, Andrew Trick via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>
>>> On Sep 12, 2016, at 8:51 AM, vivek pandya via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>
>>> 1 ) As LLVM MI is already in SSA form before reg allocation so for LLVM
>>> I think it does not require to build SSA graph and converting it back after
>>> optimization completed as mentioned in [1]
>>>
>>> 2 ) We would like to add a pass similar to SCCP.cpp (Sparse Conditional
>>> Constant
>>> Propagation based on Wegman and Zadeck's work http://dl.acm.org/
>>> citation.cfm?id=103136) as desribed in [1]. This pass will be scheduled
>>> to run before register allocation.
>>>
>>> 3 ) Output of the pass added in Step 2 will be a Map of def to
>>> instructions pointers (instructions which can be used to remat the given
>>> live range). The map will contain live ranges which is due to single
>>> instruction and multiple instructions.
>>>
>>>
>>> LiveIntervals maintains a quasi-SSA form via VNInfo. It does not allow
>>> efficient def-use queries, but use-def is there, which is all that you
>>> should need.
>>>
>>>
>>> I also only see a narrow and specific remat cost problem in the example:
>>> is it cheaper is issue a  chain of instructions rather than a fill? And for
>>> this a use-def is sufficient.
>>>
>>>
>>> It would be great to have better remat during regalloc, but please try
>>> to avoid building additional state that needs to be maintained.
>>>
>>>
>>> You proposed a fairly complex scheme. The question then is always is it
>>> worth it? To answer that question you would need to investigate and break
>>> down the current remat problems (spills but should remat, remat but should
>>> spill, should remat at a different location, etc) eg. for the llvm test
>>> suite, and show that your algorithms solves the most important ones.
>>>
>>>
>>>
>>>
>>> -Andy
>>>
>>>
>>> 4 ) The remat APIs defined in LiveRangeEdit.cpp will use analysis from
>>> the Map
>>> when a spill is required for RA.
>>>
>>> 5 ) The remat transformation APIs like rematerializeAt() will be teached
>>> to remat
>>> live ranges with multiple instructions too.
>>>
>>> 6 ) A cost analysis will be require to decide between remat and spill.
>>> This should be based on at least two factors register pressure and spill
>>> cost
>>>
>>> Few points:
>>> --------------
>>> * The analysis pass to be addes as per (2) will use target specific
>>> information
>>> from TargetInstrInfo.cpp as the current remat infrastructure uses.
>>>
>>> * This approach will not be on demand as the current approach is (i.e
>>> remat specific
>>> code will be executed only if there is a spill) so the pass in (2) can
>>> be an
>>> overhead so we may want it to enable only for higher level of
>>> optimization.
>>>
>>> * Will it be possible to use existing SCCP.cpp code with few
>>> modification to lattice
>>> and related mathematical operation so that it can serve both purpose?
>>>
>>> * No changes in current register allocators or spill framework will be
>>> required
>>> because remat entry point will be LiveRangeEdit.
>>>
>>> Any other way with less overhead is always welcomed.
>>> Please help us developing a plan to implement this.
>>>
>>> Hoping for comments!
>>>
>>> Sincerely,
>>> Vivek
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160928/ee15f21b/attachment.html>