[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Tue Feb 12 03:08:53 PST 2013

On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:

> Same architecture, different micro-arch (implementation). Could this be
> the case that vmlx-forwarding make senses for SWIFT and not for ARM
> Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is
> used but test have made show significant improvements when disabled for
> cortex-A9 (STEricsson Nova platform).
>

Hi Sebastien,

The optimization does make sense for cortex-a9, I remember to have reviewed
the patch myself and the A9 document clearly states the delays involved
between VMLAs and that this was a solution.

However, due to micro-architecture differences (as David explained), it may
interfere with other non-Swift steps (or the lack of Swift steps) and
produce worse code. It's not uncommon to see "is (isSwift())" around the
code generation or optimization passes.

I haven't done any benchmarking on that particular issue, but if you can
show that the performance regression occur on more than one cortex-A9 core
(ST, TI), than I'd be inclined to suggest only enable VMLx-forward by
default on Swift.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/54b55011/attachment.html>