[llvm] [RISCV] Enable load clustering by default (PR #73789)

Wed Dec 6 06:27:15 PST 2023

asb wrote:

> Looking at the test diffs, two observations:
> 
>     1. This appears to be increasing register pressure in some cases.  I can't find any case where it causes a spill it didn't before, but we do have cases where more registers are live at the same time.
> 
>     2. Clustering doesn't appear to be sorting by offset, and instead appears to be a somewhat arbitrary reordering of already adjacent loads.  (When they happen to already be adjacent.)
> 
> 
> Does this work if we try doing this post RA? That would cut down on the potential register pressure concern. To be clear, this is just a question, I could easily be convinced that doing this pre-RA is the right answer.
> 
> Your heuristic looks reasonable to me. I don't have data one way or another to indicate a preference for it, or anything else.

Digging into the ordering behaviour, it is indeed a bit confusing. BaseMemOpClusterMutation::apply will collect the MemOpInfo (calling getMemOperandsWithOffsetWidth for this), then they're put into group (commonly one big group, unless over a certain size threshold), and then the array of MemOpInfo is sorted into ascending order based on base+offset. clusterNeighbouringMemOps then performs clustering using that order __except__ there's logic that avoids reordering that was added in <https://reviews.llvm.org/D72706> with the argument that the reordering "leads to test case churn and 'goes against the machine schedulers "do no harm" philosophy'. I'm not sure I follow how the reordering could be harmful, and it does seem unfortunate because it seems to me the sdag scheduler does do this reordering so there's even more churn than you'd think when trying to check the incremental benefit of adding sdag load clustering in addition.

I've added a new parameter to the pass that controls this behaviour, and enabled the reordering for RISC-V. If people are happy with this direction, I'd intend to split this out as a patch that this one depends on.

Trying to enable this for the post-ra scheduler it seems to fail to make any changes (i.e. doesn't work), though it's possible I made an error.

Thanks @wangpc-pp for the suggestion re querying cache line size. I've not added that in this iteration, but will do so in the next rev.

https://github.com/llvm/llvm-project/pull/73789