[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Tue Feb 9 12:29:26 PST 2016

----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "James Molloy" <James.Molloy at arm.com>, "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer"
> <aschwaighofer at apple.com>, "LLVM Dev" <llvm-dev at lists.llvm.org>, "nd" <nd at arm.com>
> Sent: Tuesday, February 9, 2016 3:38:20 AM
> Subject: Re: Vectorization with fast-math on irregular ISA sub-sets
> 
> On 9 February 2016 at 03:48, Hal Finkel <hfinkel at anl.gov> wrote:
> > Yes, and generically speaking, it does for FP loops as well
> > (except, as has been noted, when there are FP reductions).
> 
> Right, and I think that's the problem, since a series of FP
> inductions
> could converge to a different value in NEON or VFP, basically acting
> like a n-wise reduction. Since we can't (yet?) prove there isn't a
> series of operations with the same data, we have to treat them as
> unsafe for non-IEEE FP operations.
> 
> 
> > It seems like we need two things here:
> >
> >  1. Use our backend fast-math flags during instruction selection to
> >  scalarize vector instructions that don't have the right
> >  allowances (on targets where that's necessary)
> >  2. Update the TTI cost model interfaces to take fast-math flags so
> >  that all vectorizers can make appropriate decisions
> 
> I think this is exactly the opposite of what James is saying, and I
> have to agree with him, since this would scalarise everything.

No, it just means that the intrinsics need to set the appropriate fast-math flags on the instructions generated. This might require some frontend enablement work, so be it.

There might be a slight issue with legacy IR bitcode, but if that's going to be a problem in practice, we can design some scheme to let auto-upgrade do the right thing.

> 
> If the scalarisation is in IR, then any NEON intrinsic in C code will
> get wrongly scalarised. Builtins can be lowered in either IR
> operations or builtins, and the back-end has no way of knowing the
> origin.
> 
> If the scalarization is lower down, then we risk also changing inline
> ASM snippets, which is even worse.

Yes, but we don't do that, so that's not a practical concern.

> 
> James' idea on this one is to have an additional flag to *enable*
> such
> scalarisation when the user cares too much about it, which I also
> think it's a better idea than to make that the default behaviour.

The --stop-pretending-to-be-IEEE-compliant-when-not-really flag? ;) I don't think that's a good idea.

To be fair, our IR language reference does not actually say that our floating-point arithmetic is IEEE compliant, but it is implied, and frontends depend on this fact. We really should not change the IR floating-point semantics contract over this. It might require some user education, but that's much better than producing subtly-wrong results.

We have a pass-feedback mechanism, I think it would be very useful for compiling with -Rpass-missed=loop-vectorize and/or -Rpass-analysis=loop-vectorize helpfully informed users that compiling with -ffast-math and/or -ffinite-math-only and -fno-signed-zeros would allow the loop to be vectorized for the targeted hardware.

 -Hal

> 
> cheers,
> --renato
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory