[LLVMdev] proof of concept for a loop fusion pass

Philip Reames listmail at philipreames.com
Fri Jan 16 10:55:57 PST 2015


I would strongly suggest we not try to be overly general at this time.  
Let's focus on getting a clearly profitably use case for each, and then 
worry about generalizing once we have more practical experience and code 
has actually landed in tree.  General loop optimization is a *hard* 
problem.

As a near term goal, having the profitability heuristics well separated 
from the actual transforms seems like a reasonable goal. Even this 
should happen after we have tests and working examples in tree though.

Philip

On 01/16/2015 01:14 AM, Kristof Beyls wrote:
>
> Hi Ramshankar,
>
> This looks really interesting.
>
> Interestingly, a patch for loop distribution has been posted for 
> review recently.
> On a conceptual level loop fusion and loop distribution are each 
> other’s inverse, so I would assume that
> they should ideally somehow be combined and use a single cost function 
> to decide, out of all legal
> fusions/distributions, which one is most profitable.
>
> Does that make sense, or is there some reason why these fundamentally 
> couldn’t be combined into a single
> loop fuse-and-or-distribute transformation?
>
> I haven’t looked closely at either the code of the distribution or the 
> fusion patches, but would it be hard to
> combine them into a single transformation – assuming my assumptions 
> above do make sense?
>
> If it would prove sensible to somehow combine the distribution and 
> fusion transformations – could this
> be the beginning of a more general high-level loop transformation 
> framework in LLVM?
>
> Thanks,
>
> Kristof
>
> *From:*llvmdev-bounces at cs.uiuc.edu 
> [mailto:llvmdev-bounces at cs.uiuc.edu] *On Behalf Of *Ramanarayanan, 
> Ramshankar
> *Sent:* 16 January 2015 00:22
> *To:* llvmdev at cs.uiuc.edu
> *Subject:* [LLVMdev] proof of concept for a loop fusion pass
>
> Hi,
>
> We are proposing a loop fusion pass that tries to proactive fuse loops 
> across function call boundaries and arbitrary control flow.
>
> http://reviews.llvm.org/D7008
>
> With this pass, we get 103 loop fusions in SPECCPU INT 2006 
> 462.libquantum with rate performance improving close to 2.5X in x86 
> (results from AMD A10-6700).
>
> I took some liberties in patching up some of the code in 
> ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and 
> also adjusted the IPO/LTO pass managers. I would need to do a better 
> job there but I would like to put this out as WIP for high level 
> reviews.  At this time standard loop fusion test cases may not work 
> (cases where control dep are same or loops are truly adjacent etc). I 
> would be doing that as well.
>
> Appreciate suggestions and other help.
>
> The pass initially attempts to inline calls which have loops in them.  
> Following inlining, a separate scalar "main" pass tries to fuse loops. 
> It is not necessary for loops to be adjacent or have the same control 
> dependence. Specifically a control dependence closure is computed and 
> loops that share a control dep closure prefix are taken as candidates 
> for fusion, in case there are no other loops in a certain path between 
> them.  A loop graph is built with edges between loops which share the 
> control dependence closure prefix. A recursive application of fusion 
> follows on the graph from "bottom-up".
>
> During fusion, program slices are taken based on the control flow 
> closures of the two loops being fused.  Example: Suppose two loops A 
> and B are going to be fused.  They share the same control dependence 
> prefix, but their suffices vary.  The suffices are used to duplicate 
> Control flow paths that pass through both loops. Specifically paths 
> from the first block in the control-dep closure suffix of the first 
> loop, through the first loop's exit and following into the suffix of 
> the second loop through the second loop's exit on to the common 
> post-dominator of either loop's exit. The number of paths grows 
> exponentially. At this time some heuristics are used when exploring 
> for paths.  Using profile based heuristics is a better idea.
>
> Also function unswitching helps by cleaning up control flow.
>
> Example:
>
> if (a & b) {
>
>   if (a & c) {
>
>     if (t & 0xF) {
>
>       for (i=0; i < size; i++) {
>
>         if (A[i] & d) {
>
>           A[i] |= (1 << t);
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
> if (b & d) {
>
>   if (a & d) {
>
>     if (t & 0xA) {
>
>       for (i=0; i < size; i++) {
>
>         if (A[i] & d) {
>
>           A[i] |= (1 << t);
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
> After fusion we will have:
>
> if (a&b && a&c && t&0xF && b&d && a&d t&0xA) {
>
>   for (i=0; i < size; i++) {
>
>     if (A[i] & d) {
>
>       A[i] |= (1 << t);
>
>     }
>
>     if (A[i] & d) {
>
>       A[i] |= (1 << t);
>
>     }
>
>   }
>
> } else {
>
> // original code
>
>   if (a & b) {
>
>     if (a & c) {
>
>       if (t & 0xF) {
>
>         for (i=0; i < size; i++) {
>
>           if (A[i] & d) {
>
>             A[i] |= (1 << t);
>
>           }
>
>         }
>
>       }
>
>     }
>
>   }
>
>   if (b & d) {
>
>     if (a & d) {
>
>       if (t & 0xA) {
>
>         for (i=0; i < size; i++) {
>
>           if (A[i] & d) {
>
>             A[i] |= (1 << t);
>
>           }
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
> Thanks,
>
> Ramshankar
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150116/8d9b6c28/attachment.html>


More information about the llvm-dev mailing list