[cfe-dev] RFC: Add New Set of Vector Math Builtins
Florian Hahn via cfe-dev
cfe-dev at lists.llvm.org
Tue Sep 28 02:10:16 PDT 2021
> On Sep 27, 2021, at 23:54, Craig Topper <craig.topper at gmail.com> wrote:
> Hi Florian,
> I have a few questions about thereduction builtins.
Thanks for taking a look!
> llvm.reduce.fadd is currently defined as ordered unless the reassociate fast math flag is present. Are you proposing to change that to make it pairwise?
That’s a good point and I forgot to explicitly call this out! The reduction builtin unfortunately cannot express pairwise reductions and the reassoicate flag would be too permissive. An initial lowering in Clang could just generate the pairwise reduction tree directly, but down the road I anticipate improving the reduction builtin to allow expressing pairwise reductions. This would probably be helpful for parts of the middle-end too which at the moment manually emit pairwise reduction trees (e.g. in the epilogue of vector loops with reductions).
> llvm.reduce.fmin/fmax change behavior based on the nonans fast math flag. And I think they always imply no signed zeros regardless of whether the fast math flag is present. The vectorizers check the fast math flags before creating the intrinsics today. What are the semantics of the proposed builtin?
I tried to specify NaN handling the the `Special Values` section. At the moment it says "If exactly one argument is a NaN, return the other argument. If both arguments are NaNs, return a NaN”. This should match both the NaN handling of llvm.minnum and libm’s fmin(f). Note that in the original email, the Special Values section still includes a mention to fmax. That reference should be removed.
The current proposal does not specifically talk about signed zeros, but I am not sure we have to. The proposal defines min/max as returning the smaller/larger value. Both -0 and +0 are equal, so either can be returned. I think this again matches libm’s fmin(f)’s and llvm.minnum’s behavior although llvm.minnum’ definition calls this out explicitly by stating explicitly what happens when called with equal arguments. Should the proposed definitions also spell that out?
More information about the cfe-dev