[llvm] [RISCV] Move RISCVVMV0Elimination past pre-ra scheduling (PR #132057)

Luke Lau via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 26 01:51:44 PDT 2025


lukel97 wrote:

Oh I think I see the underlying cause of the x264 regression, it's #107532. I.e. the machine scheduler is now more free to reschedule masked pseudos, which results in a lot of vector spills in x264_pixel_satd_16x16

It happens under -flto -O3 **without a scheduling model**. Either applying #126608 or using `-mtune=generic-ooo` from #120712 fixes it, as they set MicroOpBufferSize=1 which gets the machine scheduler to account for register pressure.

Specifically x264_pixel_satd_16x16 is completely inlined, and in it there's a few masked `vslidedown.vi   v9, v8, 0x2, v0.t` instructions. Notably they share the same mask so previously the scheduler couldn't have scheduled them past each other. 

These must have been acting as a barrier preventing the aggressive rescheduling + spilling:

```asm
    2290: 3ed134d7      vslidedown.vi   v9, v13, 0x2
    2294: 3c8134d7      vslidedown.vi   v9, v8, 0x2, v0.t
    2298: c900f057      vsetivli        zero, 0x1, e32, m1, tu, ma
    229c: 5e068457      vmv.v.v v8, v13
    22a0: cd027057      vsetivli        zero, 0x4, e32, m1, ta, ma
    22a4: 029406d7      vadd.vv v13, v9, v8
    22a8: 0a848457      vsub.vv v8, v8, v9
    22ac: 3a8136d7      vslideup.vi     v13, v8, 0x2
    22b0: 020a8407      vle8.v  v8, (s5)
    22b4: 020b0487      vle8.v  v9, (s6)
    22b8: 0c607057      vsetvli zero, zero, e8, mf4, ta, ma
    22bc: ca84a7d7      vwsubu.vv       v15, v8, v9
    22c0: 0d007057      vsetvli zero, zero, e32, m1, ta, ma
    22c4: 4af32457      vzext.vf2       v8, v15
    22c8: 96883457      vsll.vi v8, v8, 0x10
    22cc: 02850457      vadd.vv v8, v8, v10
    22d0: cd817057      vsetivli        zero, 0x2, e64, m1, ta, ma
    22d4: a28544d7      vsrl.vx v9, v8, a0
    22d8: 96854557      vsll.vx v10, v8, a0
    22dc: 2aa484d7      vor.vv  v9, v10, v9
    22e0: c5027057      vsetivli        zero, 0x4, e32, m1, ta, mu
    22e4: 02940557      vadd.vv v10, v9, v8
    22e8: 0a940457      vsub.vv v8, v9, v8
    22ec: 3ea134d7      vslidedown.vi   v9, v10, 0x2
    22f0: 3c8134d7      vslidedown.vi   v9, v8, 0x2, v0.t
```

I think before this can land we need to either enable MicroOpBufferSize=1 by relanding #126608 (it might be the case where it needs landed in tandem with this patch?), or choose a scheduling model by default (we might need to add a generic in-order model).

https://github.com/llvm/llvm-project/pull/132057


More information about the llvm-commits mailing list