[PATCH] D124984: [LoopCacheAnalysis] Use smaller numbers when calculating the costs (PR55233)

Congzhe Cao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 4 23:19:52 PDT 2022


congzhe created this revision.
congzhe added reviewers: bmahjour, Whitney, Meinersbur, LoopOptWG.
congzhe added projects: LLVM, LoopOptWG.
Herald added subscribers: hiraditya, nemanjai.
Herald added a project: All.
congzhe requested review of this revision.
Herald added a subscriber: llvm-commits.

As documented in PR55233 (https://github.com/llvm/llvm-project/issues/55233), loop cache analysis tends to overflow when dealing with large size arrays and multi-level loopnests. The reason that is overflows is that we sometimes use large numbers in multiplications. For example in the following loopnest that is copied from PR55233:

  void foo(int N, float A[][N][N][N][N], float B[][N][N][N][N], float C[][N][N][N][N]) {
    for (int i1 = 0; i1 < M; i1++)
      for (int i2 = 0; i2 < M; i2++)
        for (int i3 = 0; i3 < M; i3++)
          for (int i4 = 0; i4 < M; i4++)
            for (int i5 = 0; i5 < M; i5++)
              A[i1][i2][i3][i4][i5] += B[i1][i2][i3][i4][i5] + C[i1][i2][i3][i4][i5];
  }

When we calculate the cost of loop-i1, we assume i1 was the innermost loop in the loopnest and first get the estimated number of cache lines loop-i1 accesses. The access is non-consecutive hence the estimation is just the tripcount of loop-i1. At this point we multiply this tripcount with tripcounts of loop-i2, loop-i3 and loop-i4. This is to take into account that i1 is the outermost dimension in the array access. However this multiplication tends to generate a large number which might overflow.

As per discussion in the loopopt meeting, the fix proposed in this patch is that, we multiply tripcount of loop-i1 with its relative depth with regard to the innermost loop, which is 4 in the example above. This avoids multiplication of potentially large numbers as described above. Similarly when calculating the cost of loop-i2, we multiply tripcount of loop-i1 with its relative depth with regard to the innermost loop, which is 3, etc.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D124984

Files:
  llvm/lib/Analysis/LoopCacheAnalysis.cpp
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/loads-store.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matmul.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/matvecmul.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/multi-store.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/single-store.ll
  llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D124984.427203.patch
Type: text/x-patch
Size: 12942 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220505/8171c468/attachment.bin>


More information about the llvm-commits mailing list