[PATCH] D93080: [RISCV] Use tail agnostic policy for vsetvli instruction emitted in the custom inserter
Kuan Hsu Chen (Zakk) via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 10 17:57:51 PST 2020
khchen added a comment.
Hi @craig.topper
I think maybe default tail undisturbed would be more friendly and intuitive for programmer or vectorizer in reduction case.
please see below example:
//scalar
float sum=0;
for(int i=0;i<n;++i) {
sum += src1[i]*src2[i];
}
return sum;
float foo(float *src1, float *src2, size_t n) {
size_t len;
vsetvlmax_e32m8();
vfloat32m8_t v16 = vfmv_v_f_f32m8(0.0);
vsetvl_e32m1();
vfloat32m1_t v24 = vfmv_s_f_f32m1(vundefined_f32m1(), 0.0);
for (; (len = vl_extract(vsetvl_e32m8(n))) > 0; n -= len) {
vfloat32m8_t v0 = vle32_v_f32m8(src1);
vfloat32m8_t v8 = vle32_v_f32m8(src2);
#if 0
if maxvl = 2, n = 3;
src1 = [1, 2, 3]
src2 = [2, 3, 4]
1st iteration, vl=2, input v16 = [0, 0], result v16 = [2, 6]
2nd iteration, vl=1, input v16 = [2, 6], result v16 = [14, 6] // tail is still 6 because tail undisturbed.
#endif
v16 = vfmacc_vv_f32m1(v16, v0, v8);
src1 += len;
src2 += len;
}
vsetvlmax_e32m8();
// input v16 = [14, 6], result = [20, ?]
vfloat32m1_t result = vfredosum_vs_f32m8_f32m1(v16, v24);
return vfmv_f_s_f32m1_f32(result);
}
Community also discussed the difference in issue <https://github.com/riscv/riscv-v-spec/issues/157#issuecomment-527104675> before.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93080/new/
https://reviews.llvm.org/D93080
More information about the llvm-commits
mailing list