[PATCH] D79792: [AMDGPU] New SIInsertHardClauses pass

Jay Foad via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 13 02:39:18 PDT 2020


foad added a comment.

In D79792#2033325 <https://reviews.llvm.org/D79792#2033325>, @rampitec wrote:

> In D79792#2033282 <https://reviews.llvm.org/D79792#2033282>, @foad wrote:
>
> > In D79792#2033138 <https://reviews.llvm.org/D79792#2033138>, @rampitec wrote:
> >
> > > Actually depending on shouldCluster is a problem. First soft clauses are formed and then you turn them into hard clauses. Soft clauses impose higher register pressure but then you simply break them here, essentially wasting registers. It is either both passes should be guided by the same heuristic or none.
> >
> >
> > I don't understand what you're suggesting. In this pass we have to impose a limit of 64 instructions for correctness, not performance, because that's how the s_clause instruction's operand is encoded.
> >
> > Perhaps we should teach shouldCluster not to bother clausing more than 64 loads, because the s_clause won't be able to honor it, but it already has a much lower limit than that, so I don't think there's any need to change it.
>
>
> In fact it seems clauses are broken every 4 instructions. Which means there is no reason to form up to 15 instructions into a soft clause which will then be broken here. So what I propose: either do not call shallClustetMemOps here, or also call it in SIFormMemoryClauses. I do no really have a preference here, but disagreement results in registers used by soft clauses were used for nothing. Does it make sense? I.e. consider you also have xnack enabled.


Using gfx10 terminology: SIFormMemoryClauses deals with restartable //groups// but SIInsertHardClauses deals with //hard clauses//. They are similar but not the same. That's what I tried to explain in the big comment at the top of SIInsertHardClauses.cpp. Hard clauses are all about performance. Groups are all about correctness in the presence of XNACK.

shouldClusterMemOps is a heuristic to decide whether loads should be claused for performance. We have to call it from SIInsertHardClauses, so it can tell us not to bother clausing loads that have a different base address.

I'm not at all sure that calling shouldClusterMemOps from SIFormMemoryClauses is a good idea, but in any case I think that should be a separate patch with its own discussion.

> JBTW what happens to counters if hard clause exceeds 16 instructions? Every instruction increases the counter and there are only 4 bits available. Do I get it wrong?

I don't know, sorry. Maybe it's one of the conditions that can cause the hardware to break a hard clause?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79792/new/

https://reviews.llvm.org/D79792





More information about the llvm-commits mailing list