[LLVMdev] NEON vector instructions and the fast math IR flags

Fri Jun 7 10:08:46 PDT 2013

Renato, I think we agree.

On Jun 7, 2013, at 11:53 AM, Renato Golin <renato.golin at linaro.org> wrote:

> On 7 June 2013 15:41, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> We don’t want to encode backend knowledge into the vectorizer (i.e. don’t vectorize type X because the backend does not support it).
> 
> We already do, via the cost table. This case is no different. It might not be the best choice, but it is how the cost table is being built over the last months.
> 
> 

Using the cost model to communicate that the backend will generate wrong code is an abuse (in my opinion, this is not what the cost model is for). This is what I meant by encoding backend knowledge. Of course, we use the cost model to tell us how expensive an operation might be but we should not use it as an indicator how wrong it will be ;). (Which is what we would do if we give a v4f32 operation a high cost because the backend generates instructions that flush denormals to zero).

What I wanted to say is that even if you give v4f32 a high cost you still have to solve the real problem in the ARM backend.

> The only way to get this result is indirectly via the cost model but the backend must still support vectorized IR (it is part of the language) via scalarization.
> 
> Absolutely! There are two problems to solve: increase the cost for SPFP when UseNEONForSinglePrecisionFP is false, so that vectorizers don't generate such code, and legalize correctly in the backend, for vector code that does not respect that flag.
> 
> 
> (You can of course assign UMAX cost for all floating point vector types in the cost model for ARM and get the desired result - this won’t solve the problem if somebody else writes the vectorize LLVM IR though)
> 
> I wouldn't use UMAX, since the idea is not to forbid, but to tell how expensive it is. But it would be a big number, yes. ;)

I was referring to the case when you are abusing the cost model to forbid a vectorized v4f32 IR (which I thought you were proposing).

What I am suggesting is that (if you care about denormals):

* the arm backend has to be fixed to scalarize floating point vector operations (behind a flag)
* the arm target transform model has to correctly reflect that

What one could also do (but I don’t think is a good idea) is to just give floating point vector operations a max cost. You might run into unforeseen problems, including that other clients are generating vectorized LLVM IR.

(This makes we wonder whether we clamp the cost computation at TYPE_MAX :)

> 
> cheers,
> --renato