[PATCH] D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost

Tue Apr 19 13:34:17 PDT 2022

bmahjour added a comment.

In D123400#3459362 <https://reviews.llvm.org/D123400#3459362>, @congzhe wrote:

> In D123400#3457537 <https://reviews.llvm.org/D123400#3457537>, @bmahjour wrote:
>
>>> What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.
>>
>> Treating stride as a secondary component is what I respectfully objected to, and explained earlier. Not sure if taking the maximum stride would give us what we need. For example consider
>>
>>   for (i)
>>     for (j)
>>       for (k)
>>          ... A[i][j][k]
>>          ... B[i][k][j]
>>          ... C[i][k][j]
>>
>> the maximum stride will be the same for both `i-j-k` and `i-k-j` configurations (despite the second one being more profitable) bringing us back to the original problem.
>
> IMHO for this case the cost of loop-k would be higher than loop-j (remember that we compare the cost first and then stride). So loop cache analysis does output the i-k-j pattern.

Sorry I made a mistake in the example above. I meant to consider this example:

  for (i)
    for (j)
      for (k)
         ... A[i][j][k]
         ... B[j][i][k]
         ... C[j][i][k]

Here the optimal order is `j-i-k`, but if we take the maximum among all reference groups we'll end up with the same value for both the `i-j-k` and `j-i-k` configurations. With this patch the j-loop will have a cost that is larger than the i-loop and we get the optimal permutation:

  Loop 'j' has cost = 201000000
  Loop 'i' has cost = 102000000
  Loop 'k' has cost = 90000

>>> After you land this patch, I hope that I could get the test case in D122776 <https://reviews.llvm.org/D122776> merged, since that is really the motivating test for these works. I could update the "CHECK: " lines according to the approach proposed in this patch, and update D122776 <https://reviews.llvm.org/D122776> to a pure NFC patch which includes only that test. I look forward to your thoughts about it :)
>>
>> Isn't `llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll` providing the same test coverage? Note that the analysis is not sensitive to the order of the loops within the loop nest, as it considers all permutations regardless of the original order.
>
> The test case in D122776 <https://reviews.llvm.org/D122776> is the one that really shows the impact of our work, which is why I developed that test. Without our work loop cache analysis would fail that test -- it would output the loop vector as  [j, i, k] which is not the optimal access pattern.
>
> The current `llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll` (and the tests updated in this patch) do not expose the problem we are working on. Current loop cache analysis does already output the optimal access pattern for those tests, which might make it not clear enough why we want to improve loop cache analysis.

The current loop cache analysis outputs the loops in the correct order by luck (because it maintains the original breath-first order and that order just happens to be the optimal order), but it outputs the same cost value for the two outer loops, which is the root problem! By ensuring that the correct and distinguishable cost is associated with each loop we also ensure that the optimal order is maintained. I do see your point in wanting to make sure that the sort order is correct, but if that's the case we probably want to use `CHECK-NEXT` instead of `CHECK-DAG` for your test.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123400/new/

https://reviews.llvm.org/D123400