[llvm] [RISCV] Account for factor in interleave memory op costs (PR #111511)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 9 01:17:30 PDT 2024
lukel97 wrote:
> I suspect a better model is performing a wide load and then performing some kind of additional shuffle uop.
I think you're right, I did some more benchmarking on the banana pi and I think the throughput is proportional to something like `Wide load + Factor * LMUL`. These cycle counts are for various segmented loads without any storing:
```
vlseg2e8 M1: 1.26B 2 + 2 * 1 = 4 ops (0.315 cycles/op)
vlseg2e8 M2: 2.52B 4 + 2 * 2 = 8 ops (0.315 cycles/op)
vlseg2e8 M4: 5.04B 8 + 2 * 4 = 16 ops (0.315 cycles/op)
vlseg3e8 M1: 2.10B 4 + 3 * 1 = 7 ops (0.30 cycles/op)
vlseg3e8 M2: 4.20B 8 + 3 * 2 = 14 ops (0.30 cycles/op)
vlseg4e8 M1: 2.52B 4 + 4 * 1 = 8 ops (0.315 cycles/op)
vlseg4e8 M2: 5.04B 8 + 4 * 2 = 16 ops (0.315 cycles/op)
```
I'm not sure what the exact formula is for the NF=3 loads are, but it seems close enough. I'll update this PR anyway.
https://github.com/llvm/llvm-project/pull/111511
More information about the llvm-commits
mailing list