[PATCH] D123400: [LoopCacheAnalysis] Consider dimension depth of the subscript reference when calculating cost

Tue Apr 19 09:46:47 PDT 2022

congzhe added a comment.

In D123400#3457537 <https://reviews.llvm.org/D123400#3457537>, @bmahjour wrote:

>> What I did is to take the stride into account as a second component, and for each loop take the maximum stride of all reference groups to get the final stride, which presumably could resolve the motivating problem too.
>
> Treating stride as a secondary component is what I respectfully objected to, and explained earlier. Not sure if taking the maximum stride would give us what we need. For example consider
>
>   for (i)
>     for (j)
>       for (k)
>          ... A[i][j][k]
>          ... B[i][k][j]
>          ... C[i][k][j]
>
> the maximum stride will be the same for both `i-j-k` and `i-k-j` configurations (despite the second one being more profitable) bringing us back to the original problem.

IMHO for this case the cost of loop-k would be higher than loop-j (remember that we compare the cost first and then stride). So loop cache analysis does output the i-k-j pattern.

>> After you land this patch, I hope that I could get the test case in D122776 <https://reviews.llvm.org/D122776> merged, since that is really the motivating test for these works. I could update the "CHECK: " lines according to the approach proposed in this patch, and update D122776 <https://reviews.llvm.org/D122776> to a pure NFC patch which includes only that test. I look forward to your thoughts about it :)
>
> Isn't `llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll` providing the same test coverage? Note that the analysis is not sensitive to the order of the loops within the loop nest, as it considers all permutations regardless of the original order.

The test case in D122776 <https://reviews.llvm.org/D122776> is the one that really shows the impact of our work, which is why I developed that test. Without our work loop cache analysis would fail that test -- it would output the loop vector as  [j, i, k] which is not the optimal access pattern.

The current `llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll` (and the tests updated in this patch) do not expose the problem we are working on. Current loop cache analysis does already output the optimal access pattern for those tests, which might make it not clear enough why we want to improve loop cache analysis.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123400/new/

https://reviews.llvm.org/D123400