[PATCH] D93080: [RISCV] Use tail agnostic policy for vsetvli instruction emitted in the custom inserter

Thu Dec 10 17:57:51 PST 2020

khchen added a comment.

Hi @craig.topper
I think maybe default tail undisturbed would be more friendly and intuitive for programmer or vectorizer in reduction case.
please see below example:

  //scalar
  float sum=0;
  for(int i=0;i<n;++i) {
    sum += src1[i]*src2[i];
  }
  return sum;

  float foo(float *src1, float *src2, size_t n) {
    size_t len;
    vsetvlmax_e32m8();
    vfloat32m8_t v16 = vfmv_v_f_f32m8(0.0);
    vsetvl_e32m1();
    vfloat32m1_t v24 = vfmv_s_f_f32m1(vundefined_f32m1(), 0.0);
    for (; (len = vl_extract(vsetvl_e32m8(n))) > 0; n -= len) {
      vfloat32m8_t v0 = vle32_v_f32m8(src1);
      vfloat32m8_t v8 = vle32_v_f32m8(src2);
  #if 0
      if maxvl = 2, n = 3;
      src1 = [1, 2, 3]
      src2 = [2, 3, 4]
      1st iteration, vl=2, input v16 = [0, 0], result v16 = [2, 6]
      2nd iteration, vl=1, input v16 = [2, 6], result v16 = [14, 6] // tail is still 6 because tail undisturbed. 
  #endif
      v16 = vfmacc_vv_f32m1(v16, v0, v8); 
      src1 += len;
      src2 += len;
    }
    vsetvlmax_e32m8();
    // input v16 = [14, 6], result = [20, ?]
    vfloat32m1_t result = vfredosum_vs_f32m8_f32m1(v16, v24);
    return vfmv_f_s_f32m1_f32(result);
  }

Community also discussed the difference in issue <https://github.com/riscv/riscv-v-spec/issues/157#issuecomment-527104675> before.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93080/new/

https://reviews.llvm.org/D93080