[PATCH] D80260: [WIP][SVE] Prototype for general merging MOVPRFX support.

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 8 09:56:47 PDT 2020


sdesmalen added a comment.

Hi @cameron.mcinally, sorry for my delayed reponse, I was OoO part of last week and didn't have much bandwidth to reply.

> The general passthru merging case is largely uninteresting. Unless I've made a mistake, register allocation seems optimal for it (in the cases I've come across, at least).

Yes, that seems a fair assessment.

> The zeroing case is a bit trickier. We start with a pattern like this:
> 
>   class SVE_3_Op_Pat_SelZero_Passthru<ValueType vtd, SDPatternOperator op, ValueType vt1,
>                      ValueType vt2, ValueType vt3, Instruction inst>
>   : Pat<(vtd (vtd (op vt1:$Op1, (vselect vt1:$Op1, vt2:$Op2, (vt2 SVEDup0:$Dup)), vt3:$Op3))),
>         (inst $Op1, $Op2, $Op3, $Dup)>;
> 
> 
> The SVEDup0 is the problem. It's accounted for in the Pseudo generated, but the original DUP also hangs around. In cases where the DUP only has one use, it can be removed. That's the hang up, i.e. can we safely remove that superfluous DUP when expanding the Pseudo.

I don't think the compiler will always be able to remove the superfluous DUP (it may have been hoisted out of the loop for example), but I'm not sure how much of a problem that is in practice. Maybe this is fine for a first implementation. Did you get a chance to try this out on real code?

> You'll find the proposed solution under the `FalseLanes == AArch64::FalseLanesZero` block in this Diff. It's not an ideal solution, but I'm not convinced that it's overly risky either. The sequence we're looking for is fairly constrained, *I think*.

While it may seem trivial to remove that instruction this way, I'd rather leave this to a dedicated pass like `DeadMachineInstructionElim` that could be run directly after the pseudo-expand pass. There's details and special cases to take into account, such as defs/uses on subregisters (instead of the full register) or the use being a DEBUG instruction. The former could lead to the DUP being removed while it shouldn't, and the latter can prevent it from being removed while it should. The dedicated pass is probably better suited to handle those.

> We definitely need an extra register available in some of the MOVPRFX cases. Scavenging that reg probably isn't a good fix. So I was thinking it would be easier to generate the MOVPRFX when we're still at the virtual register phase. It's not clear to me how to make this work though, so I might be way off.

When passing `dup(0)` to the pseudo, the compiler can always fall back on the explicit merge (predicated mov) and there is no longer a need to scavenge a register. I agree it would be better to do this earlier so we can avoid this altogether though.

> @paulwalker-arm mentioned something at last week's Sync-up meeting about generating the MOVPRFX directly during DAGCombine or similar. That sounds promising, but I'm not sure if I see that path clearly yet.

What Paul meant is that we ideally want to add a new constraints to TableGen that allow us to model the constraints for using MOVPRFX more precisely. For the pseudo:

  Zd = FSUB_PSEUDO_ZEROING Pg, Zs1, Zs2

Instead of forcing `"Zd = Zs1"`, we want to allow any register allocation as long as one of the registers is unique, because the ExpandPseudo pass can always handle that case with movprfx directly or additionally with a reverse/commutative instruction. By adding a new TableGen constraint like `"one_of_is_unique(Zd, Zs1, Zs2)"` and teaching the register allocator to implement that constraint, we can avoid more ugly solutions like the conditional early-clobber pass (that only does half a job because it forces Zd to be unique, which may not necessarily be the best choice) or having to pass the `dup(0)` to the instruction and then trying to remove the instruction afterwards (which it may fail to remove). We could instead model the constraint as we want it. Other pseudos (e.g. for ternary instructions) may require slightly different constraints, but the `one_of_is_unique()` probably solves 90-95% of the cases that need solving.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80260/new/

https://reviews.llvm.org/D80260





More information about the llvm-commits mailing list