[clang] [Clang] Add float type support to __builtin_reduce_add and __builtin_reduce_multipy (PR #120367)

Mon Jan 6 12:58:16 PST 2025

Il-Capitano wrote:

Just to clarify: `@llvm.vector.reduce.fadd` does sequential reduction by default, so I don't see a point in doing that manually in the frontend. Your previous implementation had the same behaviour.

The inconsistency with the sequential approach is that Clang defines the `__builtin_reduce_*` operations to do recursive even-odd pairwise reduction (i.e. `(v[0] + v[1]) + (v[2] + v[3])` instead of `((v[0] + v[1]) + v[2]) + v[3])`), and since floating-poing addition is not associative, these two can have different results.

My suggestion was to not use the existing `__builtin_reduce_add` and `mul` builtins, but to define a new one that is defined to do sequential reduction, matching the behaviour of `@llvm.vector.reduce.fadd`, and one that is unordered, i.e. `@llvm.vector.reduce.fadd` with the `reassoc` fast-math flag set (in practice this will do the even-odd pairwise reduction, but there is a difference in generated code quality between doing that in the frontend, and the backend: https://godbolt.org/z/a4rd44Eza).

You can see the generated code difference of `@llvm.vector.reduce.fadd` with and without the `reassoc` flag here: https://godbolt.org/z/zeWjxrxo5.

https://github.com/llvm/llvm-project/pull/120367