[LLVMdev] proof of concept for a loop fusion pass

Kristof Beyls kristof.beyls at arm.com
Fri Jan 16 01:14:37 PST 2015


Hi Ramshankar,

This looks really interesting.

Interestingly, a patch for loop distribution has been posted for review
recently.
On a conceptual level loop fusion and loop distribution are each other's
inverse, so I would assume that
they should ideally somehow be combined and use a single cost function to
decide, out of all legal
fusions/distributions, which one is most profitable.

Does that make sense, or is there some reason why these fundamentally
couldn't be combined into a single
loop fuse-and-or-distribute transformation?

I haven't looked closely at either the code of the distribution or the
fusion patches, but would it be hard to
combine them into a single transformation - assuming my assumptions above do
make sense?

If it would prove sensible to somehow combine the distribution and fusion
transformations - could this
be the beginning of a more general high-level loop transformation framework
in LLVM?

Thanks,

Kristof

 

 

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Ramanarayanan, Ramshankar
Sent: 16 January 2015 00:22
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] proof of concept for a loop fusion pass

 

Hi,

 

We are proposing a loop fusion pass that tries to proactive fuse loops
across function call boundaries and arbitrary control flow.

 

http://reviews.llvm.org/D7008

 

With this pass, we get 103 loop fusions in SPECCPU INT 2006 462.libquantum
with rate performance improving close to 2.5X in x86 (results from AMD
A10-6700).

 

I took some liberties in patching up some of the code in
ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and also
adjusted the IPO/LTO pass managers. I would need to do a better job there
but I would like to put this out as WIP for high level reviews.  At this
time standard loop fusion test cases may not work (cases where control dep
are same or loops are truly adjacent etc). I would be doing that as well.

 

Appreciate suggestions and other help. 

 

The pass initially attempts to inline calls which have loops in them.
Following inlining, a separate scalar "main" pass tries to fuse loops. It is
not necessary for loops to be adjacent or have the same control dependence.
Specifically a control dependence closure is computed and loops that share a
control dep closure prefix are taken as candidates for fusion, in case there
are no other loops in a certain path between them.  A loop graph is built
with edges between loops which share the control dependence closure prefix.
A recursive application of fusion follows on the graph from "bottom-up".

 

During fusion, program slices are taken based on the control flow closures
of the two loops being fused.  Example: Suppose two loops A and B are going
to be fused.  They share the same control dependence prefix, but their
suffices vary.  The suffices are used to duplicate Control flow paths that
pass through both loops. Specifically paths from the first block in the
control-dep closure suffix of the first loop, through the first loop's exit
and following into the suffix of the second loop through the second loop's
exit on to the common post-dominator of either loop's exit. The number of
paths grows exponentially. At this time some heuristics are used when
exploring for paths.  Using profile based heuristics is a better idea.

 

Also function unswitching helps by cleaning up control flow.

 

Example:

if (a & b) {

  if (a & c) {

    if (t & 0xF) {

      for (i=0; i < size; i++) {

        if (A[i] & d) {

          A[i] |= (1 << t);

        }

      }

    }

  }

}

 

if (b & d) {

  if (a & d) {

    if (t & 0xA) {

      for (i=0; i < size; i++) {

        if (A[i] & d) {

          A[i] |= (1 << t);

        }

      }

    }

  }

}

 

After fusion we will have:

 

if (a&b && a&c && t&0xF && b&d && a&d t&0xA) {

  for (i=0; i < size; i++) {

    if (A[i] & d) {

      A[i] |= (1 << t);

    }

    if (A[i] & d) {

      A[i] |= (1 << t);

    }

  }

} else {

// original code

  if (a & b) {

    if (a & c) {

      if (t & 0xF) {

        for (i=0; i < size; i++) {

          if (A[i] & d) {

            A[i] |= (1 << t);

          }

        }

      }

    }

  }

 

  if (b & d) {

    if (a & d) {

      if (t & 0xA) {

        for (i=0; i < size; i++) {

          if (A[i] & d) {

            A[i] |= (1 << t);

          }

        }

      }

    }

  }

}

 

Thanks,

Ramshankar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150116/4222bd56/attachment.html>


More information about the llvm-dev mailing list