[llvm] [RISCV] Account for factor in interleave memory op costs (PR #111511)

Fri Oct 11 05:32:30 PDT 2024

lukel97 wrote:

Something worth pointing out from the interleaved-accesses.ll test diff is that we now vectorize a factor 3 interleave pattern of i64s as three vlse64.v/vsse64.v (with +zvl128b) instead of a single vlseg3e64.v/vsseg3e64.v. This actually turns out to be slightly profitable change on the banana pi F3, most likely due to how strided accesses scale with VLMAX, and at e64 this is smaller.

With vlseg3e64.v + vsseg3e64.v we get 65.59 cycles/iteration
```asm
        vsetvli t0, zero, e64, m2, ta, ma
loop:
        vlseg3e64.v v8, (a0)
        vadd.vi v8, v8, 1
        vadd.vi v10, v10, 2
        vadd.vi v12, v12, 3
        vsseg3e64.v v8, (a0)
        addi a1, a1, 1
        blt a1, a2, loop
```

With 3  x vlse64.v/vsse64.v we get 62.07 cycles/iteration
```asm
	li a3, 24
	addi a5, a0, 8
	addi a6, a0, 16
	vsetvli t0, zero, e64, m2, ta, ma
loop:
     	vlse64.v v8, (a0), a3
	vlse64.v v10, (a5), a3
	vlse64.v v12, (a5), a3
	vadd.vi v8, v8, 1
	vadd.vi v10, v10, 2
	vadd.vi v12, v12, 3
	vsse64.v v8, (a0), a3
	vsse64.v v10, (a5), a3
	vsse64.v v12, (a5), a3
	addi a1, a1, 1
	blt a1, a2, loop
```

https://github.com/llvm/llvm-project/pull/111511