[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Mon Feb 8 11:15:09 PST 2016

On 8 February 2016 at 16:33, James Molloy <James.Molloy at arm.com> wrote:
> The loop vectorizer does indeed require -ffast-math, but the IEEE-nonconformant transforms it does are far greater than using an ISA which may FTZ. It needs -ffast-math because any FP reductions necessarily have their execution order shuffled, due to executing some of them in parallel and reducing to scalar at the end. Therefore the LV doesn’t need to be changed - it will only work when “fast” is given and will only emit “fast” vector instructions.

Good point. This seems to be a much more rigorous definition in the
new 2008 standard. Right now, the loop vectorizer produces vector code
without -ffast-math. Are you saying we should disable it altogether
for all architectures that claim to follow the new standard?

Inner loops can be "vectorized" by SLP using only VFP instructions.

The implementation seem to have moved to Inst->hasUnsafeAlgebra(), so
we may need to return false in the legalization phase if the flag is
omitted and any instruction has unsafe algebra.

> The SLP vectoriser however should theoretically take non-fast scalars and produce non-fast vectors. Similarly people will hand-write vector IR, or generate it from other frontends.

We can't guarantee the semantics of the unsafe-math flag in any IR
that was not generated by a front-end which knows about it. So, it
follows that we'll stop vectorizing their basic blocks, and there
could be some outcry. We need some general consensus if that's what
people want. I don't think we do.

> Because of this, I think it’s important that we shouldn’t change the semantics of the IR currently. Making vector IR targeting ARM produce scalar instructions unless a modifier is given will undoubtedly cause problems down the line with frontends being out of sync or not being updated. Even worse, the symptom of this would just be “LLVM produces poor code for ARM” / “LLVM’s vector codegen is terrible for ARM” - performance errata and not conformance. That’s why I think changing to a full-strict-by-default approach would be bad for the project.
> It would also violate the principle of least surprise - I wrote vector instructions and picked a vector ISA… but they’re being scalarized?

Right, this is opposing to marking an instruction with unsafe by
default (ie my second option). If that's so, I agree with you that
it's not trivial and may create more problems than it solves.

Hand written IR, inline ASM and intrinsics should remain for what they
are. So 16274 is probably a "won't fix"?

> My experience is that the number of people who care about pull IEEE compatibility on ARMv7 hardware is limited, and the set of people who care about exact ULP constraints even more limited. I think we absolutely should make a solution that solves PR16274, but I think it would have to be opt-in, not opt-out.

And I'm guessing this is related to SLP and others. If so, I agree.

So,

For 16275, the fix is to disable loop vect. for no-fast-math + hasUnsafeAlgebra.

For 16274, disabling NEON emission in SLP would be one way, but we
must avoid any fiddling with inline asm and intrinsics, so I don't
think we should be doing that in any generic way. Certainly not
related to the example, from IR to instruction.

Makes sense?

--renato