[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Tue Feb 9 01:38:20 PST 2016

On 9 February 2016 at 03:48, Hal Finkel <hfinkel at anl.gov> wrote:
> Yes, and generically speaking, it does for FP loops as well (except, as has been noted, when there are FP reductions).

Right, and I think that's the problem, since a series of FP inductions
could converge to a different value in NEON or VFP, basically acting
like a n-wise reduction. Since we can't (yet?) prove there isn't a
series of operations with the same data, we have to treat them as
unsafe for non-IEEE FP operations.

> It seems like we need two things here:
>
>  1. Use our backend fast-math flags during instruction selection to scalarize vector instructions that don't have the right allowances (on targets where that's necessary)
>  2. Update the TTI cost model interfaces to take fast-math flags so that all vectorizers can make appropriate decisions

I think this is exactly the opposite of what James is saying, and I
have to agree with him, since this would scalarise everything.

If the scalarisation is in IR, then any NEON intrinsic in C code will
get wrongly scalarised. Builtins can be lowered in either IR
operations or builtins, and the back-end has no way of knowing the
origin.

If the scalarization is lower down, then we risk also changing inline
ASM snippets, which is even worse.

James' idea on this one is to have an additional flag to *enable* such
scalarisation when the user cares too much about it, which I also
think it's a better idea than to make that the default behaviour.

cheers,
--renato