[PATCH] D45336: Apply accumulator to fadd/fmul experimental vector reductions (PR36734)

Fri Apr 6 09:08:58 PDT 2018

aemerson added a comment.

In https://reviews.llvm.org/D45336#1059781, @gnzlbg wrote:

> @aemerson
>
> > I think I missed out a detail when I wrote the langref, original motivation of the scalar accumulator argument was for the use in strictly ordered FP reductions only. I.e. when the intrinsic call has no FMF flags attached then the accumulator argument is used, otherwise if there are no FMF flags then the argument is meant to be ignored.
>
> Why do we need the accumulator for this case? That is, why can't we just do:
>
>   result = vector[0];
>   for i in [1, vector.len) {
>       result = binary_op(result, vector[i]);
>   }
>   return result;
>
>
>
>
>  ---
>
> I also wonder whether requiring fast-math to allow tree reductions is overkill. Tree reductions can be implemented reasonably efficiently in many architectures, while linearly ordered reduction appear to me to be more of a niche. Therefore, I wonder if it wouldn't make more sense to add llvm.experimental.vector.reduce.tree.{add,mul} that perform tree reductions without requiring fast math, and to just call those from here if fast-math is enabled.

Because not every architecture has a statically defined vector length, so you may need to generate IR loops in order to express it unless you use these intrinsics.

Repository:
  rL LLVM

https://reviews.llvm.org/D45336