[PATCH] D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate
Sushant Gokhale via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 20 21:43:32 PST 2023
sushgokh added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:405
+// pattern for generating pseudo for MLA_ZPmZZ/MAD_ZPmZZ
+def AArch64mlad_pseudo : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
+ [(add node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)),
----------------
paulwalker-arm wrote:
> We have a naming convention here that follows the instruction's name with a letter suffix. In this instance this should be `AArch64mla_p`.
_p is slightly confusing and can confused with using 'predicated version'. To be specific, I used 'pseudo'. But to be consistent at this point, I will change it as you say
================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:493
+
+ defm MLA_ZPZZZ : sve_int_3op_pred_pseudo<AArch64mlad_pseudo> ;
----------------
paulwalker-arm wrote:
> Given you've added all the plumbing for MLS/MSB pseudo nodes please can you add support for `AArch64mls_p` to this patch as well.
I was reluctant to do since that's unrelated for this patch. I will take this in next patch if you agree
================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:499
+let AddedComplexity = 9 in {
+class SVE_4_Op_Psuedo_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
----------------
paulwalker-arm wrote:
> We don't normally use `AddedComplexity` like this unless the containing class has something about it that makes the complexity relevant. This is how it's used above bit for your usage it's better to just reuse `SVE_4_Op_Pat` and set `AddedComplexity` specifically for the important pattern.
>
> With all that said for this first patch I'd rather not set `AddedComplexity` at all. This goes hand in hand with my comment relating to the tests. I'd prefer to get the base MLA/MLS pseudo support added thats solves the main MLA/MLS vs MAD/MSB problem and then have follow on work to solve the more nuanced issue of whether to emit "mul followed by add_imm" or "mov_imm followed by mla".
Two points:
1. As said above, will make changes for MLS/MSB in the next patch if you agree.
2. The main motive of this patch was getting 'mad/mla' generated for
```
void
imul(uint8_t * restrict dst,
uint32_t * restrict src1,
uint32_t * restrict src2,
int n)
{
for (int i = 0; i < n; i++) {
dst[i] = ( src1[i] * src2[i] + 1 ) ;
}
}
```
now, coming to complexity issue, I have created custom pattern with increased complexity so that
a) mad/mla is generated for above test case
a) it has no side-effects for instructions using 'SVE_4_Op_Pat'
Having said all this, if you say, I am ready to go by following sequence:
1. Convert MLA/MLS/MAD/MSB to pseudo
2. Looking at which sequence to generate: mad/mla or mul+add/sub
Let me know if you still suggest the above sequence.
================
Comment at: llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll:4-6
+define <vscale x 2 x i64> @muladd_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
+; CHECK-LABEL: muladd_i64_positiveAddend:
+; CHECK: // %bb.0:
----------------
paulwalker-arm wrote:
> For this patch I'd rather see a set of much simpler patterns that just exercise the new isel, whereby the mla/mad choice is controlled by the function's operand order (i.e. %a is the multiplicand -> mad, %a is the addend -> mla). You'll see example of this for the equivalent fmla/fmad testing. This should mean all test ( 16 = 4 instructions * 4 element types) where each only emits a single instruction (ignoring the ret).
>
> The more nuanced cases should form part of a different patch along with the code changes to improve them.
will wait for your replies on previous comments
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142656/new/
https://reviews.llvm.org/D142656
More information about the llvm-commits
mailing list