[PATCH] D72325: [AMDGPU] Fix cluster size threshold calculation

Thu Jan 9 07:05:57 PST 2020

foad added a comment.

In D72325#1811961 <https://reviews.llvm.org/D72325#1811961>, @nhaehnle wrote:

> Don't we *want* clusters that large, and even larger?

Maybe :-)

> Consider some code that loads an array-of-structures (AoS). We really want to cluster that as aggressively as possible, to increase the chance of lowest-level cache hits on successive instructions? I would say the method is *very* inexact :)
> 
> The comment talks about not wanting to drive register pressure up too much. That's a legitimate concern, but this approach here seems to be quite wrong to me. The scheduler ought to track register pressure properly, and that's where the knowledge about whether to break clusters based on register pressure should be.

`shouldClusterMemOps` runs as part of a DAG mutation to insert "cluster" edges in the DAG, before we try to schedule the DAG. So yes you could argue that `shouldClusterMemOps` should aspire to cluster as much as possible, and it should be up to the the scheduler proper to worry about register pressure, and decide whether or not to schedule those mem ops contiguously. The scheduler does already track register pressure, so it should be able to make this kind of decision, but I don't know how well it works in practice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72325/new/

https://reviews.llvm.org/D72325