[llvm] [RISCV] Account for factor in interleave memory op costs (PR #111511)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 11 05:32:30 PDT 2024
lukel97 wrote:
Something worth pointing out from the interleaved-accesses.ll test diff is that we now vectorize a factor 3 interleave pattern of i64s as three vlse64.v/vsse64.v (with +zvl128b) instead of a single vlseg3e64.v/vsseg3e64.v. This actually turns out to be slightly profitable change on the banana pi F3, most likely due to how strided accesses scale with VLMAX, and at e64 this is smaller.
With vlseg3e64.v + vsseg3e64.v we get 65.59 cycles/iteration
```asm
vsetvli t0, zero, e64, m2, ta, ma
loop:
vlseg3e64.v v8, (a0)
vadd.vi v8, v8, 1
vadd.vi v10, v10, 2
vadd.vi v12, v12, 3
vsseg3e64.v v8, (a0)
addi a1, a1, 1
blt a1, a2, loop
```
With 3 x vlse64.v/vsse64.v we get 62.07 cycles/iteration
```asm
li a3, 24
addi a5, a0, 8
addi a6, a0, 16
vsetvli t0, zero, e64, m2, ta, ma
loop:
vlse64.v v8, (a0), a3
vlse64.v v10, (a5), a3
vlse64.v v12, (a5), a3
vadd.vi v8, v8, 1
vadd.vi v10, v10, 2
vadd.vi v12, v12, 3
vsse64.v v8, (a0), a3
vsse64.v v10, (a5), a3
vsse64.v v12, (a5), a3
addi a1, a1, 1
blt a1, a2, loop
```
https://github.com/llvm/llvm-project/pull/111511
More information about the llvm-commits
mailing list