[llvm] [RISCV] Move performCombineVMergeAndVOps into RISCVFoldMasks (PR #71764)

Tue Nov 14 10:56:39 PST 2023

================
@@ -88,6 +98,304 @@ bool RISCVFoldMasks::isAllOnesMask(MachineInstr *MaskCopy) {
   }
 }
 
+static unsigned getVMSetForLMul(RISCVII::VLMUL LMUL) {
+  switch (LMUL) {
+  case RISCVII::LMUL_F8:
+    return RISCV::PseudoVMSET_M_B1;
+  case RISCVII::LMUL_F4:
+    return RISCV::PseudoVMSET_M_B2;
+  case RISCVII::LMUL_F2:
+    return RISCV::PseudoVMSET_M_B4;
+  case RISCVII::LMUL_1:
+    return RISCV::PseudoVMSET_M_B8;
+  case RISCVII::LMUL_2:
+    return RISCV::PseudoVMSET_M_B16;
+  case RISCVII::LMUL_4:
+    return RISCV::PseudoVMSET_M_B32;
+  case RISCVII::LMUL_8:
+    return RISCV::PseudoVMSET_M_B64;
+  case RISCVII::LMUL_RESERVED:
+    llvm_unreachable("Unexpected LMUL");
+  }
+  llvm_unreachable("Unknown VLMUL enum");
+}
+
+// Try to sink From to before To, also sinking any instructions between From and
+// To where there is a write-after-read dependency on a physical register.
+static bool sinkInstructionAndDeps(MachineInstr &From, MachineInstr &To) {
+  assert(From.getParent() == To.getParent());
+  SmallVector<MachineInstr *> Worklist, ToSink;
+  Worklist.push_back(&From);
+
+  // Rather than compute whether or not we saw a store for every instruction,
+  // just compute it once even if it's more conservative.
+  bool SawStore = false;
+  for (MachineBasicBlock::instr_iterator II = From.getIterator();
+       II != To.getIterator(); II++) {
+    if (II->mayStore()) {
+      SawStore = true;
+      break;
+    }
+  }
+
+  while (!Worklist.empty()) {
+    MachineInstr *MI = Worklist.pop_back_val();
+
+    if (!MI->isSafeToMove(nullptr, SawStore))
+      return false;
+
+    SmallSet<Register, 8> Defs, Uses;
+    for (MachineOperand &Def : MI->all_defs())
+      Defs.insert(Def.getReg());
+    for (MachineOperand &Use : MI->all_uses())
+      Uses.insert(Use.getReg());
+
+    // If anything from [MI, To] uses a definition of MI, we can't sink it.
+    for (MachineBasicBlock::instr_iterator II = MI->getIterator();
+         II != To.getIterator(); II++) {
+      for (MachineOperand &Use : II->all_uses()) {
+        if (Defs.contains(Use.getReg()))
+          return false;
+      }
+    }
+
+    // If MI uses any physical registers, we need to sink any instructions after
+    // it where there might be a write-after-read dependency.
+    for (MachineBasicBlock::instr_iterator II = MI->getIterator();
+         II != To.getIterator(); II++) {
+      bool NeedsSink = any_of(II->all_defs(), [&Uses](MachineOperand &Def) {
+        return Def.getReg().isPhysical() && Uses.contains(Def.getReg());
+      });
+      if (NeedsSink)
+        Worklist.push_back(&*II);
+    }
+
+    ToSink.push_back(MI);
+  }
+
+  for (MachineInstr *MI : ToSink)
+    MI->moveBefore(&To);
+  return true;
+}
+
+// Returns true if LHS is the same register as RHS, or if LHS is undefined.
+bool RISCVFoldMasks::isOpSameAs(const MachineOperand &LHS,
+                                const MachineOperand &RHS) {
+  if (LHS.getReg() == RISCV::NoRegister)
+    return true;
+  if (RHS.getReg() == RISCV::NoRegister)
+    return false;
+  return TRI->lookThruCopyLike(LHS.getReg(), MRI) ==
+         TRI->lookThruCopyLike(RHS.getReg(), MRI);
+}
+
+// Try to fold away VMERGE_VVM instructions. We handle these cases:
+// -Masked TU VMERGE_VVM combined with an unmasked TA instruction instruction
+//  folds to a masked TU instruction. VMERGE_VVM must have have merge operand
+//  same as false operand.
+// -Masked TA VMERGE_VVM combined with an unmasked TA instruction fold to a
+//  masked TA instruction.
+// -Unmasked TU VMERGE_VVM combined with a masked MU TA instruction folds to
+//  masked TU instruction. Both instructions must have the same merge operand.
+//  VMERGE_VVM must have have merge operand same as false operand.
+// Note: The VMERGE_VVM forms above (TA, and TU) refer to the policy implied,
+// not the pseudo name.  That is, a TA VMERGE_VVM can be either the _TU pseudo
+// form with an IMPLICIT_DEF passthrough operand or the unsuffixed (TA) pseudo
+// form.
+bool RISCVFoldMasks::foldVMergeIntoOps(MachineInstr &MI,
+                                       MachineInstr *MaskDef) {
+  MachineOperand *True;
+  MachineOperand *Merge;
+  MachineOperand *False;
+
+  const unsigned BaseOpc = RISCV::getRVVMCOpcode(MI.getOpcode());
+  // A vmv.v.v is equivalent to a vmerge with an all-ones mask.
+  if (BaseOpc == RISCV::VMV_V_V) {
+    Merge = &MI.getOperand(1);
+    False = &MI.getOperand(1);
+    True = &MI.getOperand(2);
+  } else if (BaseOpc == RISCV::VMERGE_VVM) {
+    Merge = &MI.getOperand(1);
+    False = &MI.getOperand(2);
+    True = &MI.getOperand(3);
+  } else
+    return false;
+
+  MachineInstr &TrueMI = *MRI->getVRegDef(True->getReg());
+
+  // We require that either merge and false are the same, or that merge
+  // is undefined.
+  if (!isOpSameAs(*Merge, *False))
+    return false;
+
+  // N must be the only user of True.
+  if (!MRI->hasOneUse(True->getReg()))
+    return false;
+
+  unsigned TrueOpc = TrueMI.getOpcode();
+  const MCInstrDesc &TrueMCID = TrueMI.getDesc();
+  bool HasTiedDest = RISCVII::isFirstDefTiedToFirstUse(TrueMCID);
+
+  bool IsMasked = false;
+  const RISCV::RISCVMaskedPseudoInfo *Info =
+      RISCV::lookupMaskedIntrinsicByUnmasked(TrueOpc);
+  if (!Info && HasTiedDest) {
+    Info = RISCV::getMaskedPseudoInfo(TrueOpc);
+    IsMasked = true;
+  }
+
+  if (!Info)
+    return false;
+
+  // When Mask is not a true mask, this transformation is illegal for some
+  // operations whose results are affected by mask, like viota.m.
+  if (Info->MaskAffectsResult && BaseOpc == RISCV::VMERGE_VVM &&
+      !isAllOnesMask(MaskDef))
+    return false;
+
+  MachineOperand &TrueMergeOp = TrueMI.getOperand(1);
+  if (HasTiedDest && TrueMergeOp.getReg() != RISCV::NoRegister) {
+    // The vmerge instruction must be TU.
+    // FIXME: This could be relaxed, but we need to handle the policy for the
+    // resulting op correctly.
+    if (Merge->getReg() == RISCV::NoRegister)
+      return false;
+    // Both the vmerge instruction and the True instruction must have the same
+    // merge operand.
+    if (!isOpSameAs(TrueMergeOp, *False))
+      return false;
+  }
+
+  if (IsMasked) {
+    assert(HasTiedDest && "Expected tied dest");
+    // The vmerge instruction must be TU.
+    if (Merge->getReg() == RISCV::NoRegister)
+      return false;
+    // The vmerge instruction must have an all 1s mask since we're going to keep
+    // the mask from the True instruction.
+    // FIXME: Support mask agnostic True instruction which would have an
+    // undef merge operand.
+    if (BaseOpc == RISCV::VMERGE_VVM && !isAllOnesMask(MaskDef))
+      return false;
+  }
+
+  // Skip if True has side effect.
+  // TODO: Support vleff and vlsegff.
+  if (TII->get(TrueOpc).hasUnmodeledSideEffects())
+    return false;
+
+  // The vector policy operand may be present for masked intrinsics
+  const MachineOperand &TrueVL =
+      TrueMI.getOperand(RISCVII::getVLOpNum(TrueMCID));
+
+  auto GetMinVL =
+      [](const MachineOperand &LHS,
+         const MachineOperand &RHS) -> std::optional<MachineOperand> {
+    if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() &&
+        LHS.getReg() == RHS.getReg())
+      return LHS;
+    if (LHS.isImm() && LHS.getImm() == RISCV::VLMaxSentinel)
+      return RHS;
+    if (RHS.isImm() && RHS.getImm() == RISCV::VLMaxSentinel)
+      return LHS;
+    if (!LHS.isImm() || !RHS.isImm())
+      return std::nullopt;
+    return LHS.getImm() <= RHS.getImm() ? LHS : RHS;
+  };
+
+  // Because MI and True must have the same merge operand (or True's operand is
+  // implicit_def), the "effective" body is the minimum of their VLs.
+  const MachineOperand VL = MI.getOperand(RISCVII::getVLOpNum(MI.getDesc()));
+  auto MinVL = GetMinVL(TrueVL, VL);
+  if (!MinVL)
+    return false;
+  bool VLChanged = !MinVL->isIdenticalTo(VL);
+
+  // If we end up changing the VL or mask of True, then we need to make sure it
+  // doesn't raise any observable fp exceptions, since changing the active
+  // elements will affect how fflags is set.
+  if (VLChanged || !IsMasked)
+    if (TrueMCID.mayRaiseFPException() &&
+        !TrueMI.getFlag(MachineInstr::MIFlag::NoFPExcept))
+      return false;
+
+  unsigned MaskedOpc = Info->MaskedPseudo;
+  const MCInstrDesc &MaskedMCID = TII->get(MaskedOpc);
+#ifndef NDEBUG
+  assert(RISCVII::hasVecPolicyOp(MaskedMCID.TSFlags) &&
+         "Expected instructions with mask have policy operand.");
+  assert(MaskedMCID.getOperandConstraint(MaskedMCID.getNumDefs(),
+                                         MCOI::TIED_TO) == 0 &&
+         "Expected instructions with mask have a tied dest.");
+#endif
+
+  // Sink True down to MI so that it can access MI's operands.
+  if (!sinkInstructionAndDeps(TrueMI, MI))
----------------
preames wrote:

I think you might be making this more complicated than it needs to be.   My mental model of this transform is that we're mutating TrueMI so as to make the merge operation a nop.  I don't think we need to (or want to) move TrueMI's operands at all.  Given the preconditions we check, I believe it is the case that False and MinVL must be operands of TrueMI.  The only remaining operand we *add* is the mask itself.  

For the mask, we have two options.
1) If the definition of V0 dominates TrueMI, then the mask is available.  In some cases, we may be able to move both the definition of the mask vreg and the copy to V0 before TrueMI.  This is a speculation problem.
2) We can sink TrueMI itself to the masked op.  We know that we *don't* need to sink any it's operands.  All we need to prove is that sinking TrueMI is sound (i.e. if TrueMI can fault in an observable way, we can't sink it below a branch or other observable fault).  Note that condition is basically the inverse of proving the mask is safe to move upwards.  (And the dual of the glue reasoning in SDAG.)

That last bit about being the dual of the glue reasoning may also be important.  The SDAG version isn't infinitely powerful, so we may not need to worry about moving around anything which "could" fault just to preserve equal power.

If I was writing this patch, I'd start by writing the simple version which tries to hoist the mask in trivial cases, and see if that was good enough.  

One thing I can't help but think - all of this would be *much* easier without the copies to V0.  Any idea why we have that?  It seems like the pseudo's could have a vreg operand with a register class which only contained V0 and have the same effect.  Why do we need to manually insert the copies so early?  I'm probably missing something, but what?  (@topperc Any context I'm missing here?)

https://github.com/llvm/llvm-project/pull/71764