[PATCH] D79792: [AMDGPU] New SIInsertHardClauses pass

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 14 03:10:18 PDT 2020


foad added a comment.

In D79792#2034306 <https://reviews.llvm.org/D79792#2034306>, @rampitec wrote:

> > shouldClusterMemOps is a heuristic to decide whether loads should be claused for performance. We have to call it from SIInsertHardClauses, so it can tell us not to bother clausing loads that have a different base address.
>
> Actually a very small limit set by shouldClusterMemOps is driven by the register pressure. There seems to be no reason to call it here at all because this is past RA.
>  Note that SIFormMemoryClauses does not call it but uses RP tracker to maintain occupancy instead.


shouldClusterMemOps decides whether it is beneficial for performance to cluster two loads. For example: two VMEM instructions with the same base address should be clustered, unless one has a sampler and the other doesn't, etc etc. All that sort of logic belongs in shouldClusterMemOps, and we have to call it here because we don't want to insert s_clause instructions if it is not going to improve performance.

Yes, shouldClusterMemOps also imposes a limit on the length of the cluster. If that is *only* to help with register pressure, then perhaps I can bypass that check by always calling it with NumLoads=2 instead of NumLoads=CI.Size+1. What do you think?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79792/new/

https://reviews.llvm.org/D79792





More information about the llvm-commits mailing list