[PATCH] D104293: [AMDGPU] Set more flags on Real instructions

Patrick Holland via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 15 11:31:46 PDT 2021


holland11 added a comment.

You may be completely on top of it already, but here is the function that I would like to be able to more or less copy within the AMDGPUCustomBehaviour class. (You don't have to read through the function, I summarize all the flags that get checked below.)

  void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
                                                 WaitcntBrackets *ScoreBrackets) {
    // Now look at the instruction opcode. If it is a memory access
    // instruction, update the upper-bound of the appropriate counter's
    // bracket and the destination operand scores.
    // TODO: Use the (TSFlags & SIInstrFlags::LGKM_CNT) property everywhere.
    if (TII->isDS(Inst) && TII->usesLGKM_CNT(Inst)) {
      if (TII->isAlwaysGDS(Inst.getOpcode()) ||
          TII->hasModifiersSet(Inst, AMDGPU::OpName::gds)) {
        ScoreBrackets->updateByEvent(TII, TRI, MRI, GDS_ACCESS, Inst);
        ScoreBrackets->updateByEvent(TII, TRI, MRI, GDS_GPR_LOCK, Inst);
      } else {
        ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst);
      }
    } else if (TII->isFLAT(Inst)) {
      assert(Inst.mayLoadOrStore());
  
      int FlatASCount = 0;
  
      if (mayAccessVMEMThroughFlat(Inst)) {
        ++FlatASCount;
        if (!ST->hasVscnt())
          ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);
        else if (Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst))
          ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);
        else
          ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);
      }
  
      if (mayAccessLDSThroughFlat(Inst)) {
        ++FlatASCount;
        ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst);
      }
  
      // A Flat memory operation must access at least one address space.
      assert(FlatASCount);
  
      // This is a flat memory operation that access both VMEM and LDS, so note it
      // - it will require that both the VM and LGKM be flushed to zero if it is
      // pending when a VM or LGKM dependency occurs.
      if (FlatASCount > 1)
        ScoreBrackets->setPendingFlat();
    } else if (SIInstrInfo::isVMEM(Inst) &&
               // TODO: get a better carve out.
               Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1 &&
               Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_SC &&
               Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_VOL &&
               Inst.getOpcode() != AMDGPU::BUFFER_GL0_INV &&
               Inst.getOpcode() != AMDGPU::BUFFER_GL1_INV) {
      if (!ST->hasVscnt())
        ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);
      else if ((Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst)) ||
               /* IMAGE_GET_RESINFO / IMAGE_GET_LOD */
               (TII->isMIMG(Inst) && !Inst.mayLoad() && !Inst.mayStore()))
        ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);
      else if (Inst.mayStore())
        ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);
  
      if (ST->vmemWriteNeedsExpWaitcnt() &&
          (Inst.mayStore() || SIInstrInfo::isAtomicRet(Inst))) {
        ScoreBrackets->updateByEvent(TII, TRI, MRI, VMW_GPR_LOCK, Inst);
      }
    } else if (TII->isSMRD(Inst)) {
      ScoreBrackets->updateByEvent(TII, TRI, MRI, SMEM_ACCESS, Inst);
    } else if (Inst.isCall()) {
      if (callWaitsOnFunctionReturn(Inst)) {
        // Act as a wait on everything
        ScoreBrackets->applyWaitcnt(AMDGPU::Waitcnt::allZero(ST->hasVscnt()));
      } else {
        // May need to way wait for anything.
        ScoreBrackets->applyWaitcnt(AMDGPU::Waitcnt());
      }
    } else if (SIInstrInfo::isEXP(Inst)) {
      unsigned Imm = TII->getNamedOperand(Inst, AMDGPU::OpName::tgt)->getImm();
      if (Imm >= AMDGPU::Exp::ET_PARAM0 && Imm <= AMDGPU::Exp::ET_PARAM31)
        ScoreBrackets->updateByEvent(TII, TRI, MRI, EXP_PARAM_ACCESS, Inst);
      else if (Imm >= AMDGPU::Exp::ET_POS0 && Imm <= AMDGPU::Exp::ET_POS_LAST)
        ScoreBrackets->updateByEvent(TII, TRI, MRI, EXP_POS_ACCESS, Inst);
      else
        ScoreBrackets->updateByEvent(TII, TRI, MRI, EXP_GPR_LOCK, Inst);
    } else {
      switch (Inst.getOpcode()) {
      case AMDGPU::S_SENDMSG:
      case AMDGPU::S_SENDMSGHALT:
        ScoreBrackets->updateByEvent(TII, TRI, MRI, SQ_MESSAGE, Inst);
        break;
      case AMDGPU::S_MEMTIME:
      case AMDGPU::S_MEMREALTIME:
        ScoreBrackets->updateByEvent(TII, TRI, MRI, SMEM_ACCESS, Inst);
        break;
      }
    }
  }

So the list of flags:

  mayLoad
  mayStore
  MI.getDesc().TSFlags & SIInstrFlags::DS
  MI.getDesc().TSFlags & SIInstrFlags::LGKM_CNT;
  MI.getDesc().TSFlags & SIInstrFlags::FLAT;
  MI.getDesc().TSFlags & SIInstrFlags::IsAtomicNoRet
  MI.getDesc().TSFlags & SIInstrFlags::IsAtomicRet
  MI.getDesc().TSFlags & SIInstrFlags::MUBUF
  MI.getDesc().TSFlags & SIInstrFlags::MTBUF
  MI.getDesc().TSFlags & SIInstrFlags::MIMG
  MI.getDesc().TSFlags & SIInstrFlags::SMRD;
  MI.getDesc().TSFlags & SIInstrFlags::EXP

So basically just the TSFlags (assuming it's easier to copy them all at once rather than just the individual ones needed), mayLoad, and mayStore.

It seems as though the TSFlags were already being copied properly, but I'm fairly certain that the `ds_read` instruction was failing the following check within CB last week. So there may have been a class of instructions that weren't having their TSFlags copied? (But the copy is happening within `InstSI` so I'm not sure how that would happen. Maybe the `ds_read` instruction isn't having the flags set properly from within the pseudo instruction itself?)

  if ((MCID.TSFlags & SIInstrFlags::DS) &&
      (MCID.TSFlags & SIInstrFlags::LGKM_CNT)) {

If there's anything you want me to do or test, let me know.

One function that gets used within the logic is:

  // This is a flat memory operation. Check to see if it has memory tokens other
  // than LDS. Other address spaces supported by flat memory operations involve
  // global memory.
  bool SIInsertWaitcnts::mayAccessVMEMThroughFlat(const MachineInstr &MI) const {
    assert(TII->isFLAT(MI));
  
    // All flat instructions use the VMEM counter.
    assert(TII->usesVM_CNT(MI));
  
    // If there are no memory operands then conservatively assume the flat
    // operation may access VMEM.
    if (MI.memoperands_empty())
      return true;
  
    // See if any memory operand specifies an address space that involves VMEM.
    // Flat operations only supported FLAT, LOCAL (LDS), or address spaces
    // involving VMEM such as GLOBAL, CONSTANT, PRIVATE (SCRATCH), etc. The REGION
    // (GDS) address space is not supported by flat operations. Therefore, simply
    // return true unless only the LDS address space is found.
    for (const MachineMemOperand *Memop : MI.memoperands()) {
      unsigned AS = Memop->getAddrSpace();
      assert(AS != AMDGPUAS::REGION_ADDRESS);
      if (AS != AMDGPUAS::LOCAL_ADDRESS)
        return true;
    }
  
    return false;
  }

(Actually `SIInsertWaitcnts::mayAccessLDSThroughFlat` is another function that is also in the same boat as this one.) I'm not positive if I'll be able to recreate this from within llvm-mca because I'm not sure whether the `MCInst` can be queried for `memoperands`. (It is a `MachineInstr` that is being used in this function, not an `MCInst`.) I don't think there's anything that you can do to correct this though. Maybe the `MCInst` can be queried for `memoperands` and if not, maybe I'll have to make a conservative assumption, but this can be something I explore and discuss further when I submit the patch for AMDGPUCustomBehaviour.

If you're curious about this memoperand issue, I just looked into it a bit more. The `MCInst` class contains a list of `MCOperandInfo`. This class has a member `OperandType` which refers to an enum where one of the enum elements is `OPERAND_MEMORY`. I'm not sure if this is comparable to the `MachineInstr::memoperands()` function though since that one does things a bit strange:

  /// Access to memory operands of the instruction. If there are none, that does
  /// not imply anything about whether the function accesses memory. Instead,
  /// the caller must behave conservatively.
  ArrayRef<MachineMemOperand *> memoperands() const {
    if (!Info)
      return {};
  
    if (Info.is<EIIK_MMO>())
      return makeArrayRef(Info.getAddrOfZeroTagPointer(), 1);
  
    if (ExtraInfo *EI = Info.get<EIIK_OutOfLine>())
      return EI->getMMOs();
  
    return {};
  }

Anyways, if this isn't something that you have any familiarity or insight with, then I can bring it back up when I submit the patch for AMDGPUCB.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104293/new/

https://reviews.llvm.org/D104293



More information about the llvm-commits mailing list