[PATCH] D147619: [SVE] Add patterns to delete redundant sel instructions

Thu Apr 6 05:57:19 PDT 2023

paulwalker-arm added a comment.

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting?  I ask because on first glance it seems like you care about instances where users simulate `svadd_m` like builtins via a combination of `svadd_x` and `svsel` builtins.  If this is the case then I think a cleaner implementation would be to handle this as an instcombine.  However, if you're using the `_u` intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at `AArch64fadd_m1` you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:442
+             (vselect node:$pg, (sdnode node:$pg, node:$op1, node:$op2), node:$op1),
+             (vselect node:$pg, (sdnode node:$pg, node:$op1, node:$op2), node:$op2)]>;
+
----------------
This looks wrong and the likely reason for the incorrect test output.  It doesn't make sense for the same operation to match to both these `vselect` patterns because their expected results differ.

If you follow the advice attached to `AArch64fadd_p1` below, you'll not need this class but for reference you'll either need one of these `vselect` patterns for non-commutative operations like (sub, fdiv) with the other pattern useful for the `AArch64fsubr_m1` if you want to cover those cases.

For the commutative operations the second `vselect` pattern probably wants to be something like
```
(vselect node:$pg, (sdnode node:$pg, node:$op2, node:$op1), node:$op1)
```

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:449
+
+def AArch64fadd_p1 : fp_bin_patfrags<AArch64fadd_p>;
+def AArch64fsub_p1 : fp_bin_patfrags<AArch64fsub_p>;
----------------
We already have a naming scheme for such operations which in this case is `AArch64fadd_m1`, which already exists.  You should be able to extend the existing PatFrags and then the existing patterns will just work.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:458-459
+def AArch64fdiv_p1 : fp_bin_patfrags<AArch64fdiv_p>;
+def AArch64fneg_mt1 : fp_unary_patfrags<AArch64fneg_mt>;
+def AArch64fabs_mt1 : fp_unary_patfrags<AArch64fabs_mt>;
+
----------------
Given we already have AArch64ISD nodes for the unary passthru operations I think it'll be better to remove the select via a DAG combine?  If you agree then I'd prefer this to be done as a separate patch.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:651-660
+  defm FADD_ZPZZ   : sve_fp_bin_pred_hfd<AArch64fadd_p1>;
+  defm FSUB_ZPZZ   : sve_fp_bin_pred_hfd<AArch64fsub_p1>;
+  defm FMUL_ZPZZ   : sve_fp_bin_pred_hfd<AArch64fmul_p1>;
+  defm FMAXNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmaxnm_p1>;
+  defm FMINNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fminnm_p1>;
+  defm FMAX_ZPZZ   : sve_fp_bin_pred_hfd<AArch64fmax_p1>;
+  defm FMIN_ZPZZ   : sve_fp_bin_pred_hfd<AArch64fmin_p1>;
----------------
These pseudo instructions intentionally have no requirement for the results for the inactive lanes and thus do not represent the behaviour you expect from from your `_p1` nodes.  If you follow the advice above you'll not need to change this block of code.

================
Comment at: llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
----------------
Please remove the `-mtriple` parameter because the triple is already set within the IR.

================
Comment at: llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll:11
+; CHECK-NEXT:    ret
+entry:
+  %r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
----------------
Please can you remove all the `entry:`'s because the tests shouldn't need them.

================
Comment at: llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll:40-43
+; CHECK-LABEL: fadd_rev_f64:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fadd z0.d, p0/m, z0.d, z1.d
+; CHECK-NEXT:    ret
----------------
This output is identical to `fadd_f64` which is incorrect because this test wants the results for inactive lanes to come from %b (i.e. z1).  I'd expect the output to be `fadd z1.d, p0/m, z1.d, z0.d`?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147619/new/

https://reviews.llvm.org/D147619