[all-commits] [llvm/llvm-project] 7c812e: [RISCV] Avoid vl toggles when lowering vector_spli...

Tue Jul 8 20:09:36 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7c812ea01a2d11545033bbed8f5094c4a4763124
      https://github.com/llvm/llvm-project/commit/7c812ea01a2d11545033bbed8f5094c4a4763124
  Author: Luke Lau <luke at igalia.com>
  Date:   2025-07-09 (Wed, 09 Jul 2025)

  Changed paths:
    M llvm/lib/Target/RISCV/RISCVFeatures.td
    M llvm/lib/Target/RISCV/RISCVISelLowering.cpp
    M llvm/lib/Target/RISCV/RISCVProcessors.td
    M llvm/test/CodeGen/RISCV/features-info.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vp-splice.ll
    M llvm/test/CodeGen/RISCV/rvv/vector-splice.ll
    M llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-fixed-vectors.ll
    M llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-vectors.ll
    M llvm/test/CodeGen/RISCV/rvv/vp-splice.ll

  Log Message:
  -----------
  [RISCV] Avoid vl toggles when lowering vector_splice/experimental_vp_splice and add +vl-dependent-latency tuning feature (#146746)

When vectorizing a loop with a fixed-order recurrence we use a splice,
which gets lowered to a vslidedown and vslideup pair.

However with the way we lower it today we end up with extra vl toggles
in the loop, especially with EVL tail folding, e.g:

    .LBB0_5:                                # %vector.body
# =>This Inner Loop Header: Depth=1
    	sub	a5, a2, a3
    	sh2add	a6, a3, a1
    	zext.w	a7, a4
    	vsetvli	a4, a5, e8, mf2, ta, ma
    	vle32.v	v10, (a6)
    	addi	a7, a7, -1
    	vsetivli	zero, 1, e32, m2, ta, ma
    	vslidedown.vx	v8, v8, a7
    	sh2add	a6, a3, a0
    	vsetvli	zero, a5, e32, m2, ta, ma
    	vslideup.vi	v8, v10, 1
    	vadd.vv	v8, v10, v8
    	add	a3, a3, a4
    	vse32.v	v8, (a6)
    	vmv2r.v	v8, v10
    	bne	a3, a2, .LBB0_5

Because the vslideup overwrites all but UpOffset elements from the
vslidedown, we currently set the vslidedown's AVL to said offset.

But in the vslideup we use either VLMAX or the EVL which causes a
toggle.

This increases the AVL of the vslidedown so it matches vslideup, even if
the extra elements are overridden, to avoid the toggle.

A new tuning feature +vl-dependent-latency has been added which keeps
the old behaviour for microarchitectures that dynamically dispatch uops
based on vl, e.g. sifive-x280.

+vl-dependent-latency can be reused for the recently proposed Ovlt
optimization directive if/when it's ratified:
https://lists.riscv.org/g/tech-privileged/message/2487

If we wanted to aggressively optimise for vl at the expense of
introducing more toggles we could probably look at doing this in
RISCVVLOptimizer.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications