[PATCH] D71432: [AArch64][SVE] Proposal to use op+select to match scalable predicated operations

Fri Jan 3 11:50:45 PST 2020

cameron.mcinally added a comment.

In D71432#1788807 <https://reviews.llvm.org/D71432#1788807>, @sdesmalen wrote:

> While we don't support the generic case in our downstream compiler, we do have special support for the cases where the false lanes are zeroed or `undef`. Using the predicated MOVPRFX instruction, the false lanes can be zeroed relatively cheaply using e.g.:
>
>   movprfx z0.s, p0/z, z1.s
>   fsub z0.s, p0/m, z2.s
>   
>
> This avoids having to emit an explicit sequence of a `splat` and `select / predicated mov` to zero the false lanes. We match the `operation + select` into a Pseudo instruction (e.g. `FSUB_ZERO` or `FSUB_UNDEF`), that is expanded after register allocation (in the AArch64ExpandPseudoInsts pass) into the appropriate instructions.
>
> Even if we don't care about selecting a passthru value for the false lanes, there is still value in creating the Pseudo. The lack of a tied-operand constraint for the Pseudo gives the register allocator more freedom to come up with a better allocation. Combined with the commutative property of some instructions or by expanding to their reversed variants (like SUBR vs SUB), we can avoid a number of unnecessary register moves.
>
> We've been thinking about some ideas on how to make this support more generic to allow supporting the general use-case of:
>
>   %Res = FSUB_PSEUDO(%Pred, %Op1, %Op2, %Passthru)
>
> Depending on the value for `%Passthru`, this can be expanded to use a `movprfx` or in the worst case an explicit `select`.
>
> Ideally we'd use a Pseudo for most operations so that we can use this as a generic mechanism that natively supports the `passthru` value and benefits from better register allocation.
>
> A bit of prototyping would be required though, as our downstream compiler only covers a limited use-case. We've also had to deal with some corner-cases, but I'd need to refresh my memory on the details of those before I can comment on those. I'll try to dig up some more details!

Sorry for the slow reply... holidays.

I just checked out the PseudoOps and they're interesting. If we could make generic `XXX_PSEUDO(..., %passthru)` PseudoOps, that would be a good path forward. I don't foresee any problems adding an extra passthru operand to the existing patterns, but maybe I'm missing something. If you have any insight, it would be appreciated.

I agree that if these PseudoOps will land upstream, then the op+select solution isn't the right way to go. I'll see if I can build the PseudoOps out a bit.

That said, the current implementation is a little weird though. Here's the class that something like an FADD_ZERO would use:

  class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                     ValueType vt2, ValueType vt3, Instruction inst>
  : Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (SVEDup0)), vt3:$Op3))),
        (inst $Op1, $Op2, $Op3)>;

The select+op DAG reads like we're zeroing the inactive input elements, not the inactive output elements. I see that this directly models the movprfx+op hardware instructions, and that Op2 is an input reg as well as the destination reg, but it still seems counter-intuitive. I don't feel strongly that this needs to change though. So if I'm the only one that thinks it's weird, I'll let it drop.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71432/new/

https://reviews.llvm.org/D71432