[PATCH] D80260: [WIP][SVE] Prototype for general merging MOVPRFX support.

Thu May 21 15:12:00 PDT 2020

sdesmalen added a comment.

Hi @cameron.mcinally thanks for this patch! The approach makes sense and nicely extends the mechanism we have for the zeroing forms. We can use similar pseudos for the zeroing case as well. Today I played around a bit with your patch and shared some changes for your reference in D80410 <https://reviews.llvm.org/D80410>. I'm not really planning to land it, but it at least highlights the bug for the zeroing-pseudos that currently exists in master. Feel free to use for reference or ignore/discard if you've already worked on something similar.

We should probably think about where we want to write down the design for this mechanism somewhere, as we use these pseudos to solve multiple problems.

================
Comment at: llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:382
 ///
 /// Note that this can only be done for _ZERO or _UNDEF variants where
 /// we can guarantee the false lanes to be zeroed (by implementing this)
----------------
This comment suggests that this is only possible for _ZERO and _UNDEF variants, but I'm not sure if that comment is still correct.

================
Comment at: llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:402
   bool UseRev = false;
-  unsigned PredIdx, DOPIdx, SrcIdx;
+  unsigned PredIdx, DOPIdx, SrcIdx, PassIdx;
   switch (DType) {
----------------
Can you update the description of `expand_DestructiveOp` to describe the new style of pseudos like `FSUB_ZPZZ_MERGE_B`?

================
Comment at: llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:408
       // FSUB Zd, Pg, Zs1, Zd  ==> FSUBR   Zd, Pg/m, Zd, Zs1
-      std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(1, 3, 2);
+      std::tie(PredIdx, DOPIdx, SrcIdx, PassIdx) = std::make_tuple(1, 3, 2, 4);
       UseRev = true;
----------------
nit: `s/PassIdx/PassthruIdx/` ?

================
Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:471
+    let FalseLanes = flags;
+    let Constraints = "$Zd = $Zpt";
+  }
----------------
I'm happy to keep the constraint for now, but I don't think it is strictly necessary, because if `$Zd != $Zpt` it is still possible to generate a valid instruction sequence.

For:

  define <vscale x 4 x float> @foo(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %z1, <vscale x 4 x float> %passthru) {
    %z0_in = select <vscale x 4 x i1> %p, <vscale x 4 x float> %z0, <vscale x 4 x float> %passthru
    %sub = call <vscale x 4 x float> @llvm.aarch64.sve.fsub.nxv4f32(<vscale x 4 x i1> %p, <vscale x 4 x float> %z0_in, <vscale x 4 x float> %z0)
    ret <vscale x 4 x float> %sub
  }

LLVM will generate:

  movprfx z2.s, p0/m, z0.s
  fsub    z2.s, p0/m, z2.s, z0.s
  mov     z0.d, z2.d
  ret

Where the last `mov` is needed because the register allocator chose to allocate the pseudo as:

  z2 = FSUB_PSEUDO p0, z0, z0, z2

with the result operand z2 tied to the merge value. I think we could alternatively fall back on using a `select` instruction if the register allocator didn't have the tied operand restriction and instead generate:

  sel z0.s, p0, z0.s, z2.s
  fsub z0.s, p0/m, z0.s, z0.s
  ret

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll:62
+; CHECK-NEXT: ret
+  %a_z = select <vscale x 16 x i1> %pg, <vscale x 16 x i8> %a, <vscale x 16 x i8> %passthru
+  %out = call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %pg,
----------------
nit: `the %a_z` name implies that the false lanes are zeroed, perhaps `%a_m` is more appropriate.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith-merging.ll:76
+; CHECK-NEXT: ret
+  %a_z = select <vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %passthru
+  %out = call <vscale x 8 x i16> @llvm.aarch64.sve.add.nxv8i16(<vscale x 8 x i1> %pg,
----------------
I think we should add tests where the %passthru value is `%b` as that should cause it to use the reverse instruction for e.g. `sub -> subr` (or swap the operands in case of `add`).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80260/new/

https://reviews.llvm.org/D80260