[LLVMdev] NEON vector instructions and the fast math IR flags

Fri Jun 7 07:22:46 PDT 2013

On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> It is not the vectorizer that is the issue, it is the ARM backend that
> currently translates vectorized floating point IR to NEON instructions (it
> should scalarize it if desired to do so - i.e. if people care about
> denormals).
>

Hi Arnold,

Can't the vectorizer not generate the v4f32 vectors in the first place,
with that flag disabled?

To fix this issue one would have to fix the backend: i.e not declare v4f32
> et al as legal (under a flag). As to making this predicated on fast math
> flags on operations (something like no-denormals - i don’t think we have
> that in the IR yet - we only have no nan, no infinite, no signed zeros,
> etc) I believe this would be a lot harder because I suspect you would have
> to custom lower all the operations.
>

This is one way of solving it, and maybe we will have to implement it
anyway (for hand-coded IR or external front-ends).

However, that still doesn't solve the original issue. When the vectorizer
analysis the cost of the new loop, it takes into account that now you have
four operations (v4f32) instead of one, which is clearly profitable, but if
we know that the back-end will serialize, than it's no longer profitable,
and can quite possibly hurt performance.

I think we need both solutions.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/9a121993/attachment.html>