[llvm] [RISCV][PoC] Schedule RVV instructions with same type first (PR #95924)

Fri Jun 21 22:51:15 PDT 2024

BeMg wrote:

> > > ~What do you think about the following idea:~ ~1. RISCVMachineScheduler does `RISCVMachineScheduler::pickNodeFromQueue`, and the only job is to group RVV instructions according to same vtype~ ~2. Run RISCVVSETVLIInsertion~ ~3. Run MachineScheduler, whose job is to put the instructions in a good order for register allocation, also taking into account latencies and processor resources~ ~4. Run register allocation~ ~5. If the subtarget enables PostMachineScheduler, run it.`~ ~This approach would keep the RISCVMachineScheduler simple, since it could ignore register pressure, latencies, and processor resource usage. By running the normal MachineScheduler after VSETVLI insertion, the hope is that we have more freedom on scheduling since less `vsetvli` instructions means less instruction dependencies, meaning more scheduler freedom. At this point, we are accounting for register pressure, latencies, and processor resources.~
> > > EDIT: I forgot RISCVVSETVLIInsertion is after RA, so you can ignore this idea. You probably need to balance grouping vtypes, latencies, register pressures, and resource usage at the same time, otherwise individual pass approach will undo changes made in the first pass.
> > 
> > 
> > We could use mutation to constrain the same group of vtype instructions instead of the vsetvli insertion to create barriers between instructions. Based on my experience, this approach still disrupts some patterns in step 3. At best, it eliminates some vsetvli instructions; at worst, it introduces additional spills and reloads. This doesn't seem ideal.
> 
> I have thought about the mutation way before but I didn't have a try. I think that can be another feasible approach. Do you have a prototype that can be evaluated?

Sure. I have two prototype to share. 

1. https://github.com/BeMg/llvm-project/commit/a5c4dfb6517ff6eabc784b95e1c76253b8bb4f63 Use mutation to create `cluster` dependence between instruction in same vsetvl configuration 
2. https://github.com/BeMg/llvm-project/commit/3e63dc17f59c430d68f56bbe217794283d20923e Overload tryCandidate to make vsetvl be aware by machine scheduler.

They both reuse the vsetvli pass VSETVLInfo to check two instruction exist the same configuration.

The second one is more like approach in this patch but modifying `tryCandidate` to change the heuristic. It is a way to implement the tie break between vtypes, latencies, register pressures, and resource usage.

I think these PoC still has room to improve but enough to be evaluated for data to compare. (Like mutation could use the `weak` dependence instead of `cluster`, custom scheduleStrategy could change the vsetvli-aware priority)

If there are any useful thing you find in these prototype, feel free to integrate into this patch. 

## Mutation SPEC2k17 data
  | Before Spills | After Spills | Spills Diff | Before Reloads | After Relods | Reloads Diff | Before vsetvl | After vsetvl | vsetvl diff
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
500.perlbench_r | 4307 | 4307 | 0 | 10658 | 10658 | 0 | 529 | 530 | -1
502.gcc_r | 13239 | 13242 | -3 | 29697 | 29701 | -4 | 2857 | 2793 | 64
505.mcf_r | 119 | 119 | 0 | 330 | 330 | 0 | 38 | 38 | 0
520.omnetpp_r | 796 | 904 | -108 | 1552 | 1855 | -303 | 1005 | 1016 | -11
523.xalancbmk_r | 1835 | 1835 | 0 | 2563 | 2563 | 0 | 2863 | 2854 | 9
525.x264_r | 4131 | 4135 | -4 | 8033 | 8051 | -18 | 3035 | 2908 | 127
531.deepsjeng_r | 328 | 343 | -15 | 677 | 695 | -18 | 313 | 328 | -15
541.leela_r | 365 | 365 | 0 | 591 | 591 | 0 | 142 | 133 | 9
548.exchange2_r | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
557.xz_r | 347 | 362 | -15 | 697 | 711 | -14 | 252 | 259 | -7

## Custom Sched SPEC2k17 data

  | Before Spills | After Spills | Spills Diff | Before Reloads | After Relods | Reloads Diff | Before vsetvl | After vsetvl | vsetvl diff
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
500.perlbench_r | 4307 | 4307 | 0 | 10658 | 10658 | 0 | 529 | 529 | 0
502.gcc_r | 13239 | 13236 | 3 | 29697 | 29694 | 3 | 2857 | 2843 | 14
505.mcf_r | 119 | 119 | 0 | 330 | 330 | 0 | 38 | 38 | 0
520.omnetpp_r | 796 | 796 | 0 | 1552 | 1552 | 0 | 1005 | 1003 | 2
523.xalancbmk_r | 1835 | 1838 | -3 | 2563 | 2557 | 6 | 2863 | 2851 | 12
525.x264_r | 4131 | 4133 | -2 | 8033 | 8037 | -4 | 3035 | 2950 | 85
531.deepsjeng_r | 328 | 328 | 0 | 677 | 677 | 0 | 313 | 312 | 1
541.leela_r | 365 | 365 | 0 | 591 | 591 | 0 | 142 | 142 | 0
548.exchange2_r | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
557.xz_r | 347 | 347 | 0 | 697 | 697 | 0 | 252 | 253 | -1

https://github.com/llvm/llvm-project/pull/95924