[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Tue Feb 12 08:56:00 PST 2013

If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference.
If I get lucky I may be able to run on a panda board by next week and have more info to share
Best Regards
Seb

________________________________________
De : Evan Cheng [evan.cheng at apple.com]
Date d'envoi : mardi 12 février 2013 16:47
À : Renato Golin
Cc : Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions.

Im not opposed to changing the setting for a9. However, it's not a good idea to base the decision on one benchmark. I'd like to see minimally performance data of the entire llvm test suite.

Evan

Sent from my iPad

On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote:

On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote:
Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform).

Hi Sebastien,

The optimization does make sense for cortex-a9, I remember to have reviewed the patch myself and the A9 document clearly states the delays involved between VMLAs and that this was a solution.

However, due to micro-architecture differences (as David explained), it may interfere with other non-Swift steps (or the lack of Swift steps) and produce worse code. It's not uncommon to see "is (isSwift())" around the code generation or optimization passes.

I haven't done any benchmarking on that particular issue, but if you can show that the performance regression occur on more than one cortex-A9 core (ST, TI), than I'd be inclined to suggest only enable VMLx-forward by default on Swift.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev