[llvm-dev] A use of RDF to extend register Remat
vivek pandya via llvm-dev
llvm-dev at lists.llvm.org
Fri Oct 21 21:22:11 PDT 2016
On Sat, Oct 22, 2016 at 2:08 AM, Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Vivek,
>
> Hello Quentin,
> What is the scope of your extended rematerialization support?
>
> Sorry for staling http://lists.llvm.org/pipermail/llvm-dev/2016-
September/104847.html
but I would like to quote a simple example from previous discussion:
void foo(long);
void bar() {
for (int i = 0; i < 1600; ++i)
foo(3494348345984503943);
}
$ clang -O3 -S -o - ~/tmp/tl.c -target powerpc64
...
# BB#0: # %entry
...
lis 3, 12414
ori 3, 3, 27470
sldi 3, 3, 32
oris 3, 3, 35809
ori 30, 3, 20615
...
So my aim is to remat a value which is constructed with multiple
instructions (mostly on an architecture where complex addressing mode is
not supported)
Indeed, rematerialization is usually a way to avoid spilling, but proposed
> solution sounds more like an optimization to get rid of some load/store
> pairs.
> It seems to me clumsy to extend rematerialization after RA, because that
> means we have to check and recompute extra things to be able to do it. For
> instance, we have to rebuild the live-ranges, check for redefinition etc.
>
I agree.
>
> In my opinion, if you want to stay away of the register allocator, it
> would make more sense to explore a rematerialization algorithm working on
> SSA that uses register pressure to take rematerialization decisions.
>
Do you here mean working on MIR during RA phase itself ? In that case are
you referring to use size of VirtRegMap to get estimation of register
pressure or any other technique?
Also I have studied a patch for GCC for remat problem (how ever this is for
onny SH architecture) https://gcc.gnu.org/ml/gcc-
patches/2003-12/txt00058.txt
in this patch particularly df.c file . In this patch to find out complex
patterns it just travers use-def chains. I think similar thing can be
implemented at MIR level. I also thinks that it would be better to spend
more time to think about cost functions to decide if it will be good to
remat or not. Also try to address related bug https://llvm.org/bugs/show_
bug.cgi?id=25373#c9
Sincerely,
Vivek
Anyhow, whatever is the direction we want to pursue, I believe it is best
> to start with motivating examples where the current framework is not
> sufficient and see what would work best.
>
> Cheers,
> -Quentin
>
> On Oct 18, 2016, at 9:23 AM, vivek pandya via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Dear Community,
>
> I would like to discuss few points to use RDF to extend register remat
> scope. Mr. Krzysztof and I have started discussion this on private mail.
> But I think now it would be better to include community.
> Interested community member kindly previous discussion (at the end of
> mail) before starting here.
>
> After analyzing if RDF can be used for solving Remat, we think that
> problem with RDF is that since it is post-RA, rematable register values
> also will be spilled and all its use will be surrounded with load-spill
> instructions. So identifying rematable sequence of instructions will be
> possible with RDF but its uses will not be associated with these
> instructions with any use-def chain because live-ranges have been split by
> spill-relaod.
>
> One solution that I can think of is that during RA when a spill code is
> inserted , it can add a dummy instruction as a maker that can be used after
> post-RA to identify live-range (reg value) which was spilled and which
> instructions is using it after reloading (i.e where to use remated value).
> The post-RA pass can use RDF to construct remat sequence and use it and
> remove spill/restore as well as the marker instruction (dummy) inserted
> during the spill code generation. Is this possible with out changing much
> of Spill code framework? What all steps are required to add such a dummy
> /debug instruction at MIR ?
>
> In case if RDF graph is not useable for this problem then simply use-def
> chain on MIR can be traversed in bottom up manner to identify remat
> instruction sequence.
>
> Please share your thoughts.
> Vivek
>
> On Tue, Oct 11, 2016 at 12:34 AM, Krzysztof Parzyszek <
> kparzysz at codeaurora.org> wrote:
>
>> Hi Vivek,
>> Yes, RDF would help with that, even if these instructions were not in the
>> same block.
>>
>> Once you have the node corresponding to the use of R3 in the last
>> statement, you can get the node corresponding to the reaching def of that
>> node. This def node would be a member of the "oris" statement. Within that
>> statement you would then look for use nodes and you'd find the use node for
>> R3: "oris r3, *r3*, 35809". From that node, you'd follow the reaching def
>> and this would give you: "sldi *r3*, r3, 32". Then you'd look for use nodes
>> and you'd get "sldi r3, *r3*, 32". Following this traversal would
>> eventually get you to the first "lis".
>>
>> Getting from use-node to def-node is a single step.
>> Visiting other def/use nodes from the same statement, given a def-node is
>> linear in terms of number of def/use nodes in that statement.
>>
>> If you want to rematerialize r30, then the whole sequence would need to
>> use r30 (instead of r3). This is because r3 may not be available at the
>> point where r30 is to be rematerialized. Finding out if r3 is live or not
>> is also possible, but would require extra analysis.
>>
>> Getting an instruction, given an RDF node is easy:
>> - def/use nodes have a pointer to MachineOperand,
>> - statement node has a pointer to MachineInstr,
>> - block node has a pointer to MachineBasicBlok.
>>
>> I don't think there are functions to do the inverse, i.e. given a
>> MachineInstr, find the corresponding node. If necessary, such functions
>> could be added, but existing RDF applications were simply traversing the
>> RDF graph itself (instead of traversing the function and then finding
>> corresponding nodes in the graph).
>>
>> Overall, RDF would definitely provide you with the information you are
>> looking for.
>>
>> You can even try to see what you get without regmasks. Regmasks are used
>> by most targets to provide a whole set of registers in a single machine
>> operand, and they are mostly used in call instructions, where they indicate
>> which registers are clobbered. This is mostly to avoid having tens of
>> operands on these instructions. Hexagon doesn't use regmasks (I guess they
>> were not present when the backend was first added, and we never switched
>> over). R3 is used by PowerPC as a parameter/return value register, so a
>> call should modify it explicitly (and not via regmask). For other
>> registers, a regmask would simply be ignored (right now), which could lead
>> to wrong results.
>>
>> -Krzysztof
>>
>>
>>
>> On 10/10/2016 1:34 PM, vivek pandya wrote:
>>
>>> Hello Sir,
>>>
>>> I have a problem that I think can be solved with RDF. But I am not able
>>> to identify runtime complexity for this.
>>> Consider following powerpc asm :
>>> lis 3, 12414
>>> ori 3, 3, 27470
>>> sldi 3, 3, 32
>>> oris 3, 3, 35809
>>> ori 30, 3, 20615
>>> here what I am looking for is that value in reg 30 is rematerilizable
>>> (can be recalculated) because this whole sequence is using constants to
>>> calculate the value that is in reg 30.
>>> So this is how I think I can use RDFGraph here, for statement
>>> ori 30, 3, 20615
>>> reaching defs for reg 3 can be traced back on RDFGraph if it found to be
>>> (rematerilizable) then continue going backward on RDFGraph untill a live
>>> range of reg 3 does not end (in this example it will be lis 3, 12414)
>>> This is very primitive example and also this may require RDFLiveness
>>> analysis.
>>>
>>> But do you think this problem is feasible to solve with RDF if yes than
>>> can you help me to build a concrete idea? (we may switch to llvm-dev if
>>> this seems a possible path).
>>> Ultimately this can be used to avoid spill by recalculating the value
>>> for a register. This also means that it may require to access RDFGraph
>>> based on the register i.e if RAX is required to be spilled I would like
>>> to check if RAX at statement X is having rematerilizable value or not.
>>> Also this may require to access instruction for a given RDFNode.
>>> Also will it be a liner time algorithm in terms of number of nodes in
>>> RDFGraph?
>>>
>>> Please share your thoughts.
>>>
>>> Sincerely,
>>> Vivek
>>>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
>> by The Linux Foundation
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161022/0c3dacab/attachment-0001.html>
More information about the llvm-dev
mailing list