[llvm] [RISCV] Use vsetvli instead of vlenb in Prologue/Epilogue (PR #113756)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 26 17:42:13 PDT 2024
lukel97 wrote:
> > Is this still profitable even for cases when there's no shift needed, e.g. would a vsetvli be better than a single csrr?
>
> CSR reads in general are seializing. Vlenb needs to be special cased in the microarchitecture. Some versions of SiFive cores missed this optimization. Have we checked BananaPi?
I did some experimenting and it looks like the F3 is also missing the optimisation.
csrr.S:
```asm
.global start
start:
li a0, 0
li a1, 10485760
loop:
csrr t0, COUNTER
addi a0, a0, 1
blt a0, a1, loop
exit:
li a7, 93
ecall
```
vsetvli.s:
```asm
.global _start
_start:
li a0, 0
li a1, 10485760
loop:
vsetvli t0, zero, e8, m1, ta, ma
addi a0, a0, 1
blt a0, a1, loop
exit:
li a7, 93
ecall
```
```
$ clang csrr.S -nostdlib -DCOUNTER=vlenb
$ perf stat ./a.out
...
105,898,187 cycles:u
$ clang csrr.S -nostdlib -DCOUNTER=fflags
$ perf stat ./a.out
...
105,894,862 cycles:u
$ clang vsetvli.s -nostdlib
$ perf stat ./a.out
...
53,484,388 cycles:u
```
I guess this could also introduce VL/VTYPE toggles, but I don't have any data as to whether or not that would be an issue in practice given that this is restricted to prologues and epilogues. I also don't know how expensive a VL/VTYPE toggle is in comparison to a CSR read.
https://github.com/llvm/llvm-project/pull/113756
More information about the llvm-commits
mailing list