[PATCH] D84779: [AMDGPU] Add amdgpu specific loop threshold metadata

Tue Aug 18 05:23:25 PDT 2020

timcorringham added a comment.

Ignoring the explicit unroll hints (disable, full, count) for now,  the loop unrolling is controlled by the threshold. Loops are unrolled. either fully or partially, up to the threshold.

The threshold is intended to allow loops to be unrolled in cases where there is likely to be a performance gain, but to avoid unrolling loops to the extent that the performance is affected by the code size (or in our case register pressure - which threshold doesn't directly relate to). So the default threshold value is a compromise which is set such that some gains are seen with typical code, but avoiding the bad cases. The default threshold value is adjusted by heuristics which increase it in cases where a higher value is likely to be beneficial. Currently the default threshold is 150 for compute, but 700 for graphics (set using the function attribute). The value of 700 was determined to be the best compromise by testing a (large) representative sample of graphics apps. It isn't the best value for a lot of shaders, but avoids the worst effects of excessive register pressure. For some specific applications different default threshold values are passed in (again using the function attribute).

Returning to the explicit unroll hints, it has been observed that for a lot of graphics applications the hints are not optimal for amdgpu. In particular a number of applications benefit (significantly) from unrolling some of the loops that have unroll.disable hints. However, allowing all of them to be unrolled results in register pressure in some cases,  Unrolling small loops achieves most of the gains, while avoiding the bad cases - that can be achieved by using a lower threshold for such loops (250 seems to be a good compromise value).

There have been several attempts at performing analysis in the font-end to set the unroll count as a more precise way of controlling the amount of unrolling. In the general case these have not worked as well as simply using the threshold, and increased compile time significantly.

So that brings us back to this change, which provides a simple and cheap mechanism to specify a different threshold for each loop. There is no impact on anything that doesn't use the threshold metadata, but it offers some flexibility controlled by the front-end in cases where it is beneficial. It isn't a perfect solution, just a pragmatic feature.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84779/new/

https://reviews.llvm.org/D84779