[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis

Thu Jun 9 09:21:40 PDT 2016

On Wed, Jun 8, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"Vikram TV via llvm-dev" <llvm-dev at lists.llvm.org>
> *To: *"DEV" <llvm-dev at lists.llvm.org>
> *Sent: *Wednesday, June 8, 2016 2:58:17 AM
> *Subject: *[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
>
> Hi,
>
> This is a proposal about implementing an analysis that calculates loop
> cost based on cache data. The primary motivation for implementing this is
> to write profitability measures for cache related optimizations like
> interchange, fusion, fission, pre-fetching and others.
>
> I have implemented a prototypical version at
> http://reviews.llvm.org/D21124.
>
> I quick comment on your pass: You'll also want to add the ability to get
> the loop trip count from profiling information from BFI when available (by
> dividing the loop header frequency by the loop predecessor-block
> frequencies).
>
Sure. I will add it.

>
> The patch basically creates groups of references that would lie in the
> same cache line. Each group is then analysed with respect to innermost
> loops considering cache lines. Penalty for the reference is:
> a. 1, if the reference is invariant with the innermost loop,
> b. TripCount for non-unit stride access,
> c. TripCount / CacheLineSize for a unit-stride access.
> Loop Cost is then calculated as the sum of the reference penalties times
> the product of the loop bounds of the outer loops. This loop cost can then
> be used as a profitability measure for cache reuse related optimizations.
> This is just a brief description; please refer to [1] for more details.
>
> This sounds like the right approach, and is certainly something we need.
> One question is how the various transformations you name above (loop
> fusion, interchange, etc.) will use this analysis to evaluate the effect of
> the transformation they are considering performing. Specifically, they
> likely need to do this without actually rewriting the IR (speculatively).
>
For Interchange: Loop cost can be used to build a required ordering and
interchange can try to match the nearest possible ordering. Few internal
kernels, matmul are working fine with this approach.
For Fission/Fusion: Loop cost can provide the cost before and after fusion.
Loop cost for a fused loop can be obtained by building reference groups by
assuming a fused loop. Similar case with fission but with partitions.
These require no prior IR changes.

>
> A primary drawback in the above patch is the static use of Cache Line
> Size. I wish to get this data from tblgen/TTI and I am happy to submit
> patches on it.
>
> Yes, this sounds like the right direction. The targets obviously need to
> provide this information.
>
>
> Finally, if the community is interested in the above work, I have the
> following work-plan:
> a. Update loop cost patch to a consistent, commit-able state.
> b. Add cache data information to tblgen/TTI.
> c. Rewrite Loop Interchange profitability to start using Loop Cost.
>
> Great!
>
> This does not affect loop interchange directly, but one thing we need to
> also consider is the number of prefetching streams supported by the
> hardware prefetcher on various platforms. This is generally, per thread,
> some number around 5 to 10. Splitting loops requiring more streams than
> supported, and preventing fusion from requiring more streams than
> supported, is also an important cost metric. Of course, so is ILP and other
> factors.
>
Cache based cost computation is one of the metric and yes, more of other
metrics need to be added.

>
>  -Hal
>
>
> Please share your valuable thoughts.
>
> Thank you.
>
> References:
> [1] [Carr-McKinley-Tseng]
> http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
>
> --
>
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>

-- 

Good time...
Vikram TV
CompilerTree Technologies
Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160609/b63be38a/attachment.html>