[PATCH] D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate

Tue Jan 31 08:32:55 PST 2023

david-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:483

+// Not sure why this pattern doesn't kick in with value<9
+let AddedComplexity = 9 in {
----------------
I think this is probably because isel chooses the patterns with the highest complexity that it can find, believing that this is a bigger win, although I could be wrong? So if there are other patterns that also match the sequence of DAG nodes they will be chosen first. This suggests there are multiple matching patterns for the cases you are testing.

================
Comment at: llvm/test/CodeGen/AArch64/sve-gep.ll:230
 ; CHECK-NEXT:    mov z2.d, x8
-; CHECK-NEXT:    mla z0.d, p0/m, z1.d, z2.d
+; CHECK-NEXT:    mad z1.d, p0/m, z2.d, z0.d
+; CHECK-NEXT:    mov z0.d, z1.d
----------------
This particular case looks like a regression, which is a shame.

================
Comment at: llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-unknown-linux-gnu -mattr=+sve < %s | FileCheck %s
+
----------------
It would be really nice if you could pre-commit these tests without any code changes so that we can see the diff in this patch?

================
Comment at: llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll:80
+; MAD result can be computed in register for 'a'
+define <vscale x 8 x i16> @muladd_i16_test4(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
+; CHECK-LABEL: muladd_i16_test4:
----------------
Without seeing the code for these tests before your patch it's hard to know what's changed and whether this is an improvement or not. I copied these tests and ran them manually without your patch and to be honest the code for this function isn't obviously better with the mad:

  muladd_i16_test4:                       // @muladd_i16_test4
        ptrue   p0.h
        mul     z0.h, p0/m, z0.h, z1.h
        add     z0.h, z0.h, #200                // =0xc8
        mul     z0.h, p0/m, z0.h, z1.h
        ret

I think it's because the add is able to combine the constant into the instruction, whereas mad cannot. Having said that, I can believe that for non-constant cases this might be a win. It might be a good idea to include some tests for non-constants too?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142656/new/

https://reviews.llvm.org/D142656