[llvm-dev] [Proposal][RFC] Cache aware Loop Cost Analysis

Thu Jun 23 12:06:35 PDT 2016

On Thu, Jun 23, 2016 at 11:34 PM, Adam Nemet <anemet at apple.com> wrote:

>
>
>
>> Hi Vikram,
>>
>> Is the analysis result specific to a loop nest or to a loop nest together
>> with a set of reference groups?
>>
> The result is specific to each loop in the loop nest and the calculations
> are based on the references in the loop nest.
>
>
> Sorry can you please rephrase/elaborate, I don’t understand what you
> mean.  Analysis results are retained unless a transformation invalidates
> them.  My question is how the reference groups affect all this.
>

Sorry, I now understood that you meant llvm's analysis result (I thought
how the analysis calculates the result). The analysis result is specific to
each loop in its loop nest. Reference groups are necessary during cost
computation only.

>
> You could probably describe how you envision this analysis would be used
> with something like Loop Fusion.
>

For Loop Interchange, the cost calculated for each loop can provide a
desired ordering of the loops. Loop interchange can start interchanging the
loops to match the desired order iff interchange legality check passes.
Eg: For matrix multiplication case,
       for (i = 0 to 5000)
         for (j = 0 to 5000)
           for (k = 0 to 5000)
             C[i,j] = C[i,j] + A[i,k]* B[k,j]
       Here, based on cache line size and the cache penalties for the
references (please refer to first mail of the thread or the reference paper
for different penalties), following costs result:
       Loop i cost: 2.501750e+11
Loop j cost: 6.256252e+10
Loop k cost: 1.563688e+11
Loop cost here is nothing but the total cache penalities due to all
references, including penalties due to outer loops. So lower costing loop
is a better fit for innermost loop because it has better cache locality.
This would result in the desired loop order: i, k, j; and LoopInterchange
can interchange 'j' and 'k' loops given the legality check passes (a
nearest ordering for which legality is correct is also an optimal loop
ordering).

For fusion/fission, loop costs due to cache can be calculated by assigning
cache penalties for references with respect to loops before and after
fusion/fission. The costs will be a profitability measure for
fusion/fission.

I think use case description was very brief in a previous mail. So I have
elaborated this time. More details can also be found in
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf.

Thanks

Good time...
Vikram TV
CompilerTree Technologies
Mysore, Karnataka, INDIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160624/ce6ec1a2/attachment.html>