[PATCH] D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate

Mon Feb 20 21:43:32 PST 2023

sushgokh added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:405
+// pattern for generating pseudo for MLA_ZPmZZ/MAD_ZPmZZ
+def AArch64mlad_pseudo : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
+                             [(add node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)),
----------------
paulwalker-arm wrote:
> We have a naming convention here that follows the instruction's name with a letter suffix.  In this instance this should be `AArch64mla_p`.
_p is slightly confusing and can confused with using 'predicated version'. To be specific, I used 'pseudo'. But to be consistent at this point, I will change it as you say

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:493
+
+  defm MLA_ZPZZZ : sve_int_3op_pred_pseudo<AArch64mlad_pseudo> ;

----------------
paulwalker-arm wrote:
> Given you've added all the plumbing for MLS/MSB pseudo nodes please can you add support for `AArch64mls_p` to this patch as well.
I was reluctant to do since that's unrelated for this patch. I will take this in next patch if you agree

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:499

+let AddedComplexity = 9 in {
+class SVE_4_Op_Psuedo_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
----------------
paulwalker-arm wrote:
> We don't normally use `AddedComplexity` like this unless the containing class has something about it that makes the complexity relevant.  This is how it's used above bit for your usage it's better to just reuse `SVE_4_Op_Pat` and set `AddedComplexity` specifically for the important pattern.
> 
> With all that said for this first patch I'd rather not set `AddedComplexity` at all.  This goes hand in hand with my comment relating to the tests.  I'd prefer to get the base MLA/MLS pseudo support added thats solves the main MLA/MLS vs MAD/MSB problem and then have follow on work to solve the more nuanced issue of whether to emit "mul followed by add_imm" or "mov_imm followed by mla".
Two points:
1. As said above, will make changes for MLS/MSB in the next patch if you agree.
2. The main motive of this patch was getting 'mad/mla' generated for 

```
void
imul(uint8_t * restrict dst, 
     uint32_t * restrict src1, 
     uint32_t * restrict src2, 
     int n)
{
  for (int i = 0; i < n; i++) {
    dst[i] = ( src1[i] * src2[i] + 1 ) ;
  }
} 

```
now, coming to complexity issue, I have created custom pattern with increased complexity so that
    a) mad/mla is generated for above test case
    a) it has no side-effects for instructions using 'SVE_4_Op_Pat'

Having said all this, if you say, I am ready to go by following sequence:
1. Convert MLA/MLS/MAD/MSB to pseudo
2. Looking at which sequence to generate: mad/mla or mul+add/sub 

Let me know if you still suggest the above sequence.

================
Comment at: llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll:4-6
+define <vscale x 2 x i64> @muladd_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
+; CHECK-LABEL: muladd_i64_positiveAddend:
+; CHECK:       // %bb.0:
----------------
paulwalker-arm wrote:
> For this patch I'd rather see a set of much simpler patterns that just exercise the new isel, whereby the mla/mad choice is controlled by the function's operand order (i.e. %a is the multiplicand -> mad, %a is the addend -> mla).  You'll see example of this for the equivalent fmla/fmad testing.  This should mean all test ( 16 = 4 instructions * 4 element types) where each only emits a single instruction (ignoring the ret).
> 
> The more nuanced cases should form part of a different patch along with the code changes to improve them.
will wait for your replies on  previous comments

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142656/new/

https://reviews.llvm.org/D142656