[PATCH] D22778: Add Loop Sink pass to reverse the LICM based of basic block frequency.

Mon Oct 17 14:39:55 PDT 2016

On Mon, Oct 17, 2016 at 1:04 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> chandlerc added a comment.
>
> Will try to make a full pass through, thanks for the extensive updates
> Dehao! One specific point of discussion below:
>
>
>
> ================
> Comment at: lib/Transforms/Scalar/LoopSink.cpp:148-170
> +  // For every iteration:
> +  //   * Pick the ColdestBB from SortedLoopBBs
> +  //   * Find the set BBsDominatedByColdestBB that satisfy:
> +  //     - BBsDominatedByColdestBB is a subset of BBsToSinkInto
> +  //     - Every BB in BBsDominatedByColdestBB is dominated by ColdestBB
> +  //   * If Freq(ColdestBB) < Freq(BBsDominatedByColdestBB), remove
> +  //     BBsDominatedByColdestBB from BBsToSinkInto, add ColdestBB to
> ----------------
> danielcdh wrote:
> > chandlerc wrote:
> > > This seems really expensive. By my reading this is O(N * L * M) where
> L is the  number of basic blocks in a loop, N is the number of instructions
> we try to sink into that loop, and M is the number of basic blocks within
> the loop that use the instructions. If there is for example one hot basic
> block in the loop and a large number of cold basic blocks and all of the
> uses are in those cold basic blocks, it seems like this could become quite
> large.
> > >
> > > Have you looked at other algorithms? Is there a particular reason to
> go with this one? (I've not thought about the problem very closely yet...)
> > I initially started with an adhoc algorithm which is O(L * M), but Danny
> pointed out it is not optimal, so I changed to this optimal algorithm. The
> lower bound for any sinking algorithm is O(L*M), but if optimal solution is
> desired, O(N*L*M) is the best I can get.
> >
> > Yes, this could be expensive when N is large. I practice, I did not see
> noticeable compile time increase in speccpu2006 benchmarks after applying
> this patch (and enable the pass in frontend).
> >
> > How about we limit the N to be no more than a certain number to avoid
> expensive computation in extreme cases?
> I'm not worried about N being large. I'm worried about the fact that L is
> >= to M and so it is quadratic in the number of basic blocks that use each
> instruction.
>
> The other thing is that if this scales in compile time by N then it scales
> in compile time *by how much effect it is having*. If it scales in compile
> time by M^2, then we pay more and more compile time as loops get larger
> even if we only sink very few instructions.
>
> I would either bound M to a small number, and/or look for some way to not
> have this be quadratic. It seems like a bunch of this should be
> pre-computable for the loop?
>

Updated the patch to add a parameter to control the maximum M.

Thanks,
Dehao

>
>
> https://reviews.llvm.org/D22778
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161017/acb2d96b/attachment.html>