[PATCH] D104149: [MCA] Adding the CustomBehaviour class to llvm-mca

Sat Jun 12 06:31:44 PDT 2021

andreadb added a comment.

Thanks for the patch.

I like the idea of adding a CustomBehaviour class in llvm-mca to customise/post-process dependencies and model extra delays due to unsimulated structural hazards.

I had a quick look at the patch, and so far, I like the general idea/design. There are a couple of things which must be done differently, or that needs to be improved. But I think that hopefully all those points can be easily addressed.

Speaking about the AMDGPU specific changes. It may be worthy to split this patch into two patches at least.
Specifically, the scheduling model changes related to the `RetireOOO` flags (as well as the new tests) should be committed as a separate patch, and possibly reviewed by @foad (and/or whoever else is interested in supporting llvm-mca for AMDGPU).

Thanks,
Andrea

================
Comment at: llvm/docs/CommandGuide/llvm-mca.rst:989-994
+Some instructions aren't modeled properly by :program:`llvm-mca`. This is
+usually due to the instruction not being expressed perfectly within the
+scheduling model. Modifying the scheduling models isn't always a viable
+option though (maybe because the instruction is modeled incorrectly on
+purpose or the instruction's behaviour is quite complex). This is where the
+CustomBehaviour class can be utilized.
----------------
I think this paragraph should be more explicit on what is the goal of the CustomBehaviour class.

The first two sentences should probably be merged together.
The focus on those two sentences is the "lack of expressiveness" (of scheduling models and machine instructions). That lack of expressivenes is what eventually leads to poor simulations.

Starting the entire paragraph with that first sentence is not great; in the absence of context, it sounds more like there are bugs in llvm-mca itself.

I also noticed that you don't explain what is the main goal of this new CustomBehaviour class. You have done a much better job at introducing it in the summary of this bug. At this point, you should at least drop a hint about what is the most common goal for it:
"it allows targets to customise data dependencies, and model delays which cannot be normally predicted just by simply evaluating register defs/uses". Something like that... Maybe you can be more accurate, but keep in mind that the description should not be too low level in this document.

================
Comment at: llvm/include/llvm/MCA/Instruction.h:501
+  // List of operands which can be used by mca::CustomBehaviour
+  SmallVector<MCAOperand, 8> Operands;
+
----------------
Can this vector be fully optional?

I am thinking about using a simple std::vector here, rather than a SmallVector, which would force us to consume space for a default number of inline elements (8 in this case).

For the (majority) of targets that don't require/implement a custom behaviour class, that field is literally wasted space. In general, I am already concerned about the `sizeof` mca::instruction. Those objects tend to be often too big, and this change has the potential of making them even bigger.

================
Comment at: llvm/lib/MCA/InstrBuilder.cpp:619-638
+  std::unique_ptr<Instruction> NewIS =
+      std::make_unique<Instruction>(D, MCI.getOpcode());
+
+  // Build the list of operands to be used by mca::CustomBehaviour
+  for (int Idx = 0, N = MCI.size(); Idx < N; Idx++) {
+    MCAOperand Op;
+    const MCOperand &MCOp = MCI.getOperand(Idx);
----------------
Three questions:

1. Is it necessary to add MCAOperands to every single instruction?
2. Is it possible to do this as a post-processing step, if requested by the simulator?
3. Can we limit the number of MCAOperand objects that we store in an mca::Instruction?

Ideally, this logic should be split from the normal `InstrBuilder::createInstruction` and moved to a separate `InstrBuilder::postProcessInstruction` step. That post-processing step would then be performed only if the so-called "custom behaviour" is requested.

In order to address points 1. and 3., it may be useful to implement a new "MCA Lowering Context" class, which knows about:
 a) which instructions require a list of MCAOperand, and
 b) which operands must be lowered to MCAOperand for the custom hazard check to work.

The MCA lowering context would know about which instructions/operands require a custom post-processing step, and skip the post-processing step for those instructions that can be safely ignored.

Essentially, have a context class that filters opcodes and rules out which operands are important to keep as MCAOperand, and which are not important for the custom behaviour class.

To give you an idea: if we know that the custom behaviour class is only interested in the first operand of a specific load opcode, then we should only create MCAOperand objects for that load, and just for its first operand.

In most cases (correct me if I am wrong), I suspect that we may not need to store MCAOperands for every single instruction out there. So, blindly lowering every MCOperand into MCAOperand - while it work in general - is a bit extreme in my opinion. Keep in mind that most targets won't implement any custom behaviour for their subtargets, and for those targets, translating to MCAOperand would be unnecessary, and it would simply waste space.

Essentially, if a custom behaviour is not requested/implemented, then we should ignore this extra post-processing step entirely, and not populate the vector of MCAOperands (this is also why it may be worthy to make the new MCAOperand vector field a std::vector).

I hope it makes sense.

================
Comment at: llvm/tools/llvm-mca/Views/DispatchStatistics.cpp:80
   printStalls(SS, HWStalls[HWStallEvent::DispatchGroupStall], NumCycles);
+  SS << "\nCB      - Custom Behaviour stall:                    ";
+  printStalls(SS, HWStalls[HWStallEvent::CustomBehaviourStall], NumCycles);
----------------
Please just use something like "Uncategorised Structural Hazards stalls" (or something like that).

================
Comment at: llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.cpp:43-44
+  case AMDGPU::S_WAITCNT_VSCNT:
+    assert(false &&
+           "These are pseudo instructions and should never appear in asm.");
+    return 0;
----------------
I am not familiar with those pseudos. However, I would be careful about asserting here. One day, people might want to drive an mca pipeline from a backend pass, and even after regalloc there might still be MachineInstr pseudos around.
Just merge these with the "default" case for now (and maybe add a small code comment).

================
Comment at: llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.cpp:47-51
+    // Can't find any documentation in the ISAs about what this
+    // instruction does so I'm unsure of how to model it.
+    WithColor::warning() << "Instruction s_waitcnt_depctr "
+                         << "currently unsupported and will not "
+                         << "trigger a wait.\n";
----------------
I don't think it is a good idea to generate a warning here. Instead, just raise a bug for it once this patch is committed in main. For now, please convert it into a TODO comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104149/new/

https://reviews.llvm.org/D104149