[llvm] [RISCV] Add load/store clustering in post machine schedule (PR #111504)
Pengcheng Wang via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 10 20:54:19 PDT 2024
wangpc-pp wrote:
> > > > > createGenericSchedLive
> > > >
> > > >
> > > > I think the reason is you are using pre-ra scheduler here.
> > >
> > >
> > > Thank you - the problem was indeed me being too eager with copy and paste and failing to switch to `createGenericSchedPostRA`. Now I've done that though, I see zero codegen changes within the in-tree unit tests.
> >
> >
> > Thanks! I will ask @BoyaoWang430 to add some MIR tests. And thanks in advance if you can help to evaluate/review this PR. :-)
>
> Thanks! And more generally, any note you have on how/if it affects codegen on external codebases very welcome. e.g. is this something that kicks in a lot in real-world code but we just don't trigger in our tests, or is it fairly rare it makes a difference (but of course worth addressing for the cases it helps)
The reason why there is no CodeGen change in in-tree tests is because postra scheduler is not enabled by default. To enable postra scheduler, we should use a CPU with `FeaturePostRAScheduler` or add `+use-postra-scheduler` to `-mattr`.
The problem is very common, maybe I didn't make it clear in sync-up meeting because of my bad speaking :-(.
* We have added load/store clustering mutations to pre-ra scheduler, so when doing pre-ra scheduling, we can add dependencies between load/store instructions and make them together.
* But, if we enable post-ra scheduling, and load/store clustering mutations are not added to post-ra scheduler, then there is no strong dependency between these load/store instructions and the scheduler may separate them.
For example:
```asm
// Before pre-ra scheduling
ld vreg0, 0(rs)
addi vreg1, vreg0, 1
ld vreg2, 8(rs)
addi vreg3, vreg2, 1
// After pre-ra scheduling
ld vreg0, 0(rs)
ld vreg2, 8(rs)
addi vreg1, vreg0, 1
addi vreg3, vreg2, 1
// After post-ra scheduling (possile current result)
ld a1, 0(a0)
addi a3, a1, 1
ld a2, 8(a0)
addi a4, a2, 1
// After post-ra scheduling (what we want)
ld a1, 0(a0)
ld a2, 8(a0)
addi a3, a1, 1
addi a4, a2, 1
```
I think this problem is common accoss targets and as what you have said PPC have already done it.
https://github.com/llvm/llvm-project/pull/111504
More information about the llvm-commits
mailing list