[PATCH] D72737: [AMDGPU] Bundle loads before post-RA scheduler

Wed Jan 15 08:16:30 PST 2020

rampitec added a comment.

In D72737#1821378 <https://reviews.llvm.org/D72737#1821378>, @foad wrote:

> There are nice changes in a bunch of tests, where we're preserving clusters instead of breaking them apart.
>
> But there are also strange changes in some other tests, where the clustering hasn't changed, but some instructions that use the result of a load have moved around. Does this mean we're getting the latency of the load wrong now? (Or were we getting it wrong before?) For example:
> insert_vector_elt
>  llvm.maxnum.f16.ll
>  saddo.ll
>  sign_extend.ll

We have moved uses of loaded values further from their loads, which is good. As far as I understand these changes are inducted by the removal of artificial edges which were created by MemOpClusterMutation. These edges were linking successors of any load to all the nodes in a cluster and restricted the scheduling.
In sign_extend.ll that is because of the store clustering, we have moved v_ashrrev_i32_e32 producing v2 past v_ashrrev_i32_e32 producing v3 because store cluster uses them in this order. Before it was harder to do because of the artificial edges linking all predecessors to all stores.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72737/new/

https://reviews.llvm.org/D72737