[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Thu Feb 11 04:24:30 PST 2016

This is fine Renato.

I worked around the local issue by using an instruction intrinsics so that the pattern would be invisible to this optimisation, and my thoughts for raising this to the TargetTransformInfo level are still not well formed.  I was actually quite impressed with the new optimisation, it cleverly handled the situation perfectly.

A coarse grained solution is fine, and it is always possible to handle this in custom lowering for ISD::FABS which could check a target specific flag to see if it should do the "safe thing" or the "fast thing".

Thanks for the feedback,

	MartinO

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: 11 February 2016 11:50
To: Martin.ORiordan at movidius.com
Cc: Hal Finkel <hfinkel at anl.gov>; LLVM Developers <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

On 11 February 2016 at 11:23, Martin J. O'Riordan <martin.oriordan at movidius.com> wrote:
> But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :-)  It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way.

Hi Martin,

So, I have a patch that right now is a big hammer:
 * Targets can have SIMD IEEE compliant or not (instead of fine grained choosing which part).
 * Any FP arithmetic / cast operation with UnsafeAlgebra will trigger a "potentially unsafe" flag in the vectorizer.
 * In the end, if the SIMD unit is not IEEE compliant and there is any potentially unsafe operations, avoid that loop.

I just need to create some more tests to submit.

The problems I can see in your case are:
 * Both scalar and vector units have problems with denormals, so my
isSIMDIEEE() is not enough.
   - To fix this, you can add isVFPIEEE(), but we may find a better solution?
 * Your optimisation is basic-block based, not loop based, so we'd have to add the same check to SLP.
   - SLP deals with both SIMD and VFP units, so we would need the additional flag anyway.
   - This will be my next step.
 * Other passes already have access to the TTI, so they can use those flags to avoid strength reduction, combine, etc. in those cases.

I don't think we need to create a fine grained solution right now, since we don't have examples with different behaviour.

Would that work for you?

cheers,
--renato