[llvm] [RISCV][TTI] Reduce cost of a build_vector pattern (PR #108419)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 19 23:28:59 PDT 2024
lukel97 wrote:
Here's the benchmark diffs with LTO: https://lnt.lukelau.me/db_default/v4/nts/8?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=7
The povray regression is gone, the only noticeable change left is a 2.08% regression in 541.leela_r. In the hottest function `FastState::play_random_move()` there's a few more places where we now vectorize to a vredor.vs:
```asm
vsetivli zero, 0x1, e64, m1, ta, ma
vmv.s.x v8, a2
vsetivli zero, 0x4, e16, mf4, ta, ma
vmseq.vi v0, v8, 0x1
slliw a2, t0, 0x8
vsetvli zero, zero, e32, mf2, ta, mu
vmv.v.i v8, 0x0
ld a4, 0x38(sp)
vle32.v v8, (a4), v0.t
slliw a4, a6, 0x6
slliw a5, t1, 0x4
slliw a3, a3, 0x2
vredor.vs v8, v8, v8
vmv.x.s s1, v8
```
We're doing a packed e16 build vector which is then converted to a mask vector.
I think we can avoid the vmv.v.i if we move the masking from the vle32.v to the vredor.vs. But that's orthogonal to this PR and I don't think the vectorized code is bad per say, so I think this is fine.
https://github.com/llvm/llvm-project/pull/108419
More information about the llvm-commits
mailing list