[LLVMdev] proof of concept for a loop fusion pass
Philip Reames
listmail at philipreames.com
Fri Jan 16 10:55:57 PST 2015
I would strongly suggest we not try to be overly general at this time.
Let's focus on getting a clearly profitably use case for each, and then
worry about generalizing once we have more practical experience and code
has actually landed in tree. General loop optimization is a *hard*
problem.
As a near term goal, having the profitability heuristics well separated
from the actual transforms seems like a reasonable goal. Even this
should happen after we have tests and working examples in tree though.
Philip
On 01/16/2015 01:14 AM, Kristof Beyls wrote:
>
> Hi Ramshankar,
>
> This looks really interesting.
>
> Interestingly, a patch for loop distribution has been posted for
> review recently.
> On a conceptual level loop fusion and loop distribution are each
> other’s inverse, so I would assume that
> they should ideally somehow be combined and use a single cost function
> to decide, out of all legal
> fusions/distributions, which one is most profitable.
>
> Does that make sense, or is there some reason why these fundamentally
> couldn’t be combined into a single
> loop fuse-and-or-distribute transformation?
>
> I haven’t looked closely at either the code of the distribution or the
> fusion patches, but would it be hard to
> combine them into a single transformation – assuming my assumptions
> above do make sense?
>
> If it would prove sensible to somehow combine the distribution and
> fusion transformations – could this
> be the beginning of a more general high-level loop transformation
> framework in LLVM?
>
> Thanks,
>
> Kristof
>
> *From:*llvmdev-bounces at cs.uiuc.edu
> [mailto:llvmdev-bounces at cs.uiuc.edu] *On Behalf Of *Ramanarayanan,
> Ramshankar
> *Sent:* 16 January 2015 00:22
> *To:* llvmdev at cs.uiuc.edu
> *Subject:* [LLVMdev] proof of concept for a loop fusion pass
>
> Hi,
>
> We are proposing a loop fusion pass that tries to proactive fuse loops
> across function call boundaries and arbitrary control flow.
>
> http://reviews.llvm.org/D7008
>
> With this pass, we get 103 loop fusions in SPECCPU INT 2006
> 462.libquantum with rate performance improving close to 2.5X in x86
> (results from AMD A10-6700).
>
> I took some liberties in patching up some of the code in
> ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and
> also adjusted the IPO/LTO pass managers. I would need to do a better
> job there but I would like to put this out as WIP for high level
> reviews. At this time standard loop fusion test cases may not work
> (cases where control dep are same or loops are truly adjacent etc). I
> would be doing that as well.
>
> Appreciate suggestions and other help.
>
> The pass initially attempts to inline calls which have loops in them.
> Following inlining, a separate scalar "main" pass tries to fuse loops.
> It is not necessary for loops to be adjacent or have the same control
> dependence. Specifically a control dependence closure is computed and
> loops that share a control dep closure prefix are taken as candidates
> for fusion, in case there are no other loops in a certain path between
> them. A loop graph is built with edges between loops which share the
> control dependence closure prefix. A recursive application of fusion
> follows on the graph from "bottom-up".
>
> During fusion, program slices are taken based on the control flow
> closures of the two loops being fused. Example: Suppose two loops A
> and B are going to be fused. They share the same control dependence
> prefix, but their suffices vary. The suffices are used to duplicate
> Control flow paths that pass through both loops. Specifically paths
> from the first block in the control-dep closure suffix of the first
> loop, through the first loop's exit and following into the suffix of
> the second loop through the second loop's exit on to the common
> post-dominator of either loop's exit. The number of paths grows
> exponentially. At this time some heuristics are used when exploring
> for paths. Using profile based heuristics is a better idea.
>
> Also function unswitching helps by cleaning up control flow.
>
> Example:
>
> if (a & b) {
>
> if (a & c) {
>
> if (t & 0xF) {
>
> for (i=0; i < size; i++) {
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> }
>
> }
>
> }
>
> }
>
> if (b & d) {
>
> if (a & d) {
>
> if (t & 0xA) {
>
> for (i=0; i < size; i++) {
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> }
>
> }
>
> }
>
> }
>
> After fusion we will have:
>
> if (a&b && a&c && t&0xF && b&d && a&d t&0xA) {
>
> for (i=0; i < size; i++) {
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> }
>
> } else {
>
> // original code
>
> if (a & b) {
>
> if (a & c) {
>
> if (t & 0xF) {
>
> for (i=0; i < size; i++) {
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> }
>
> }
>
> }
>
> }
>
> if (b & d) {
>
> if (a & d) {
>
> if (t & 0xA) {
>
> for (i=0; i < size; i++) {
>
> if (A[i] & d) {
>
> A[i] |= (1 << t);
>
> }
>
> }
>
> }
>
> }
>
> }
>
> }
>
> Thanks,
>
> Ramshankar
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150116/8d9b6c28/attachment.html>
More information about the llvm-dev
mailing list