[PATCH] D25020: [ARM] Fix 26% performance regression on Cortex-A9 caused by not using VMLA/VMLS

Tue Oct 4 10:16:00 PDT 2016

rovka added a comment.

Hi Evgeny,

Thanks for working on this.

Your patch looks good in general, but I have a few comments:

- You mention that there's a performance regression, but you don't mention where - is it in a well-known benchmark or in proprietary code? If it's in proprietary code, I think it's customary to also get the results on the test-suite/SPEC/something to show that it doesn't break anything the community cares about.
- The commit message keeps mentioning Cortex-A8 and Cortex-A9 together, but the MLx expansion pass is only enabled for Cortex-A9. If you have some performance numbers that look good for Cortex-A8, it would be a good idea to enable the pass for it too (just add FeatureExpandMLx to it in ARM.td).
- I think the commit message is a bit TL;DR, could you condense it a bit? (e.g. keep only the relevant assembly snippets instead of the whole output)
- Thanks for switching to using the subtarget feature, that would've been my first comment otherwise :)

Regards,
Diana

> ARMISelDAGToDAG.cpp:423
>  
> +/// Checks if VMLS/VMLA are used instead of N they can be issued back to back
> +/// to the previous instruction. E.g., Cortex-A9 NEON MPE uses a special multiplier

I don't understand the first part of the comment, it's either incomplete or could use some rephrasing.

> ARMISelDAGToDAG.cpp:462
> +  if (!Acc.getValueType().isFloatingPoint()
> +      || Acc.getValueType().isVector())
> +    return false;

Why return false for vectors? AFAICT the Cortex-A9 manual says accumulator forwarding should apply for VQDMLA/VQDMLS. I see this assumption everywhere in the patch - maybe there should be a comment somewhere explaining it?

> ARMISelDAGToDAG.cpp:470
>  /// node. VFP / NEON fp VMLA / VMLS instructions have special RAW hazards (at
> -/// least on current ARM implementations) which should be avoidded.
> +/// least on current ARM implementations) which should be avoided.
>  bool ARMDAGToDAGISel::hasNoVMLxHazardUse(SDNode *N) const {

This isn't related, and since it's a typo fix you can just push it without review.

> MLxExpansionPass.cpp:69
> +    bool canUseVMLxForwarding(MachineInstr *MI,
> +      MachineInstr *AccDef) const;
>      bool FindMLxHazard(MachineInstr *MI);

Did you run clang-format on this?

> MLxExpansionPass.cpp:235
> +  if (!MI->getParent()->getParent()
> +      ->getSubtarget<ARMSubtarget>().hasVMLxForwarding())
> +    return false;

This looks a bit awkward and it doesn't seem to be on a very cold path either - maybe add a private flag and set it in runOnMachineFunction?

> fml.ll:2
> +; RUN: llc -mtriple=arm-eabi -mcpu=cortex-a9 %s -o - | FileCheck %s -check-prefix=FPSCALAR
> +; RUN: llc -mtriple=arm-eabi -mcpu=cortex-a9 %s -o - | FileCheck %s -check-prefix=FPVECTOR
> +

Could you add some run lines for cores that don't have accumulator forwarding, so we can test the other behavior as well? Also, can you rename the file to something more representative?

> fml.ll:4
> +
> +define double @test1(double %a, double %b, double %c, double %d, double %e, double %f) {
> +  %1 = fmul double %a, %c

You should add a CHECK-LABEL directive for each function to make sure you're matching the expected instructions and not others appearing further down (doesn't seem likely now, but people may append tests to this file in the future).

> fml.ll:19
> +  %10 = fadd double %6, %7
> +  %11 = fmul double %9, %10
> +

Shouldn't this snippet have an expectation too? (Ditto in the other functions)

https://reviews.llvm.org/D25020