[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis

Thu Jun 23 10:28:56 PDT 2016

On Thu, Jun 23, 2016 at 9:54 AM, Adam Nemet <anemet at apple.com> wrote:

>
> On Jun 9, 2016, at 9:21 AM, Vikram TV via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> On Wed, Jun 8, 2016 at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> ------------------------------
>>
>> *From: *"Vikram TV via llvm-dev" <llvm-dev at lists.llvm.org>
>> *To: *"DEV" <llvm-dev at lists.llvm.org>
>> *Sent: *Wednesday, June 8, 2016 2:58:17 AM
>> *Subject: *[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis
>>
>> Hi,
>>
>> This is a proposal about implementing an analysis that calculates loop
>> cost based on cache data. The primary motivation for implementing this is
>> to write profitability measures for cache related optimizations like
>> interchange, fusion, fission, pre-fetching and others.
>>
>> I have implemented a prototypical version at
>> http://reviews.llvm.org/D21124.
>>
>> I quick comment on your pass: You'll also want to add the ability to get
>> the loop trip count from profiling information from BFI when available (by
>> dividing the loop header frequency by the loop predecessor-block
>> frequencies).
>>
> Sure. I will add it.
>
>>
>> The patch basically creates groups of references that would lie in the
>> same cache line. Each group is then analysed with respect to innermost
>> loops considering cache lines. Penalty for the reference is:
>> a. 1, if the reference is invariant with the innermost loop,
>> b. TripCount for non-unit stride access,
>> c. TripCount / CacheLineSize for a unit-stride access.
>> Loop Cost is then calculated as the sum of the reference penalties times
>> the product of the loop bounds of the outer loops. This loop cost can then
>> be used as a profitability measure for cache reuse related optimizations.
>> This is just a brief description; please refer to [1] for more details.
>>
>> This sounds like the right approach, and is certainly something we need.
>> One question is how the various transformations you name above (loop
>> fusion, interchange, etc.) will use this analysis to evaluate the effect of
>> the transformation they are considering performing. Specifically, they
>> likely need to do this without actually rewriting the IR (speculatively).
>>
> For Interchange: Loop cost can be used to build a required ordering and
> interchange can try to match the nearest possible ordering. Few internal
> kernels, matmul are working fine with this approach.
> For Fission/Fusion: Loop cost can provide the cost before and after
> fusion. Loop cost for a fused loop can be obtained by building reference
> groups by assuming a fused loop. Similar case with fission but with
> partitions.
> These require no prior IR changes.
>
>
> Hi Vikram,
>
> Is the analysis result specific to a loop nest or to a loop nest together
> with a set of reference groups?
>
The result is specific to each loop in the loop nest and the calculations
are based on the references in the loop nest.

>
> Adam
>
>
>> A primary drawback in the above patch is the static use of Cache Line
>> Size. I wish to get this data from tblgen/TTI and I am happy to submit
>> patches on it.
>>
>> Yes, this sounds like the right direction. The targets obviously need to
>> provide this information.
>>
>>
>> Finally, if the community is interested in the above work, I have the
>> following work-plan:
>> a. Update loop cost patch to a consistent, commit-able state.
>> b. Add cache data information to tblgen/TTI.
>> c. Rewrite Loop Interchange profitability to start using Loop Cost.
>>
>> Great!
>>
>> This does not affect loop interchange directly, but one thing we need to
>> also consider is the number of prefetching streams supported by the
>> hardware prefetcher on various platforms. This is generally, per thread,
>> some number around 5 to 10. Splitting loops requiring more streams than
>> supported, and preventing fusion from requiring more streams than
>> supported, is also an important cost metric. Of course, so is ILP and other
>> factors.
>>
> Cache based cost computation is one of the metric and yes, more of other
> metrics need to be added.
>
>>
>>  -Hal
>>
>>
>> Please share your valuable thoughts.
>>
>> Thank you.
>>
>> References:
>> [1] [Carr-McKinley-Tseng]
>> http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
>>
>> --
>>
>> Good time...
>> Vikram TV
>> CompilerTree Technologies
>> Mysore, Karnataka, INDIA
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
>
> --
>
> Good time...
> Vikram TV
> CompilerTree Technologies
> Mysore, Karnataka, INDIA
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>

-- 

Good time...
Vikram TV
CompilerTree Technologies
Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/0c777f23/attachment-0001.html>