[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Thu Feb 11 01:53:15 PST 2016

On 11 February 2016 at 01:15, Hal Finkel <hfinkel at anl.gov> wrote:
> Rather, the user expects very specific (non-IEEE) behavior.

Precisely! :)

> I think we have two options here:
>
>  1. Lower these intrinsics into target-level intrinsics

That's not an option for the reasons you outline (performance), but
also because this would explode the number of intrinsics we have to
deal with, making the IR *very* opaque and hard to deal with.

>  2. Add flags (or something like that) that indicate the alternate non-IEEE semantics that ARM actually provides.

That's my idea, but I want to think about it only when we really need
to. Adding new flags always lead us to hard choices, and backwards
compatibility will be a problem here.

> We'd need to pass the fast-math flags to the cost model so that we'd get costs back that depended on whether or not we could actually use the vector instructions.

Indeed, that's the only way. But I foresee the cost model at least
doubling its complexity for those unfortunate targets. Right now, we
use heuristics to map the costs of casts, shuffles and memory
operations that normally disappear, but when loops can now use NEON
and VFP as well as scalar in the same objects, how the back-end will
emit those pseudo-operations will be anyone's guess.

In that sense, James' suggestion to create a flag for strict IEEE
semantics, locking SIMD FP out of the question entirely, is an easy
intermediate step.

cheers,
--renato