[PATCH] D93080: [RISCV] Use tail agnostic policy for vsetvli instruction emitted in the custom inserter

Thu Dec 10 19:52:27 PST 2020

craig.topper added a comment.

In D93080#2447590 <https://reviews.llvm.org/D93080#2447590>, @khchen wrote:

> Hi @craig.topper
> I think maybe default tail undisturbed would be more friendly and intuitive for programmer or vectorizer in reduction case.
> please see below example:
>
>   //scalar
>   float sum=0;
>   for(int i=0;i<n;++i) {
>     sum += src1[i]*src2[i];
>   }
>   return sum;
>
>
>
>   float foo(float *src1, float *src2, size_t n) {
>     size_t len;
>     vsetvlmax_e32m8();
>     vfloat32m8_t v16 = vfmv_v_f_f32m8(0.0);
>     vsetvl_e32m1();
>     vfloat32m1_t v24 = vfmv_s_f_f32m1(vundefined_f32m1(), 0.0);
>     for (; (len = vl_extract(vsetvl_e32m8(n))) > 0; n -= len) {
>       vfloat32m8_t v0 = vle32_v_f32m8(src1);
>       vfloat32m8_t v8 = vle32_v_f32m8(src2);
>   #if 0
>       if maxvl = 2, n = 3;
>       src1 = [1, 2, 3]
>       src2 = [2, 3, 4]
>       1st iteration, vl=2, input v16 = [0, 0], result v16 = [2, 6]
>       2nd iteration, vl=1, input v16 = [2, 6], result v16 = [14, 6] // tail is still 6 because tail undisturbed. 
>   #endif
>       v16 = vfmacc_vv_f32m1(v16, v0, v8); 
>       src1 += len;
>       src2 += len;
>     }
>     vsetvlmax_e32m8();
>     // input v16 = [14, 6], result = [20, ?]
>     vfloat32m1_t result = vfredosum_vs_f32m8_f32m1(v16, v24);
>     return vfmv_f_s_f32m1_f32(result);
>   }
>
> Community also discussed the difference in issue <https://github.com/riscv/riscv-v-spec/issues/157#issuecomment-527104675> before.

Maybe we should use tail undisturbed for instructions that have something like "let Constraints = "$rd = $rs3"?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93080/new/

https://reviews.llvm.org/D93080