[PATCH] D94928: [llvm-mca] Add support for in-order CPUs

Tue Jan 19 08:19:47 PST 2021

andreadb added a comment.

Thanks for working on this feature!

I went through the patch and overall the approach seems sound.

There are however a few questions and concerns about the overall design.
That being said, overall it shouldn't be difficult to address most of my comments.

Speaking about the design:

Your model assumes an unbounded queue of instructions (something like rudimental reservation station) where to store dispatched instructions.

Correct me if I am wrong, but in-order processor don't use a reservation station. 
In the absence of structural hazards, if data dependencies are met, then uOPs are directly issued to the underlying execution units.
So the dispatch event is not decoupled from the issue event.

The fact that your patch adds an unbounded queue sounds a bit strange to me. Not sure what @dmgreen 
 thinks about it. But this basically means that dispatch and issue are different events.

I also noticed how there are no checks on `NumMicroOps`. Is there a reason why you don't check for it?
In one of the tests, the target is dual-issue. However, there are cycles where three opcodes are dispatched.
See for example the test where two loads are dispatched in a same cycle (with the first load decoded into two uOPs).
Note also that the presence of '=' chars in the timeline represents cycles spent by an instruction while waiting in a scheduler to be issued.
To be honest, I was not expecting to see those characters at all.

Btw. There is also one particular case where two instructions seems to execute out of order.

You can find other comments below.

Thanks,
-Andrea

================
Comment at: llvm/include/llvm/MCA/Stages/InOrderIssueStage.h:41-51
+  int NumExecuted;
+
+  /// Instructions that must be issued during the next cycle. If the front
+  /// instruction cannot execute due to an unmet register or resource
+  /// dependency, the whole queue is stalled for StallCyclesLeft.
+  std::queue<InstRef> InstQueue;
+  int StallCyclesLeft;
----------------
Please use unsigned quantities instead of signed integers.

================
Comment at: llvm/lib/MCA/Context.cpp:31-33
 std::unique_ptr<Pipeline>
 Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
   const MCSchedModel &SM = STI.getSchedModel();
----------------
andreadb wrote:
> For readability reasons, I suggest to split this method into two methods.
> 
> For example, something like `if (!SM.isOutOfOrder()) createInOrderPipeline(opts, srcMgr);`.
> 
> You then need to move all the pipeline composition logic for in-order processors inside `createInOrderPipeline()`.
> 
> I think it would be much more readable (just my opinion though).
I suggest to move the support for in-order pipeline composition into a separate function. I think it would help in terms of readability.

================
Comment at: llvm/lib/MCA/Context.cpp:31-40
 std::unique_ptr<Pipeline>
 Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
   const MCSchedModel &SM = STI.getSchedModel();

   // Create the hardware units defining the backend.
-  auto RCU = std::make_unique<RetireControlUnit>(SM);
+  std::unique_ptr<RetireControlUnit> RCU;
+  if (SM.isOutOfOrder())
----------------
For readability reasons, I suggest to split this method into two methods.

For example, something like `if (!SM.isOutOfOrder()) createInOrderPipeline(opts, srcMgr);`.

You then need to move all the pipeline composition logic for in-order processors inside `createInOrderPipeline()`.

I think it would be much more readable (just my opinion though).

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:34-37
+bool InOrderIssueStage::isAvailable(const InstRef &IR) const {
+  return Bandwidth > 0;
+}
+
----------------
We also need to check if IR "begins a group".
Instructions that begin a group, must always be the first instructions to be dispatched in a cycle. See how the `isAvailable()` check is implemented by the DispatchStage.

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:35
+bool InOrderIssueStage::isAvailable(const InstRef &IR) const {
+  return Bandwidth > 0;
+}
----------------
Shouldn't we also add checks on `NumMicroOps` somewhere?
One of the tests reports two loads dispatched within a same cycle. However one of the loads is 2 uOps, so - correct me if I am wrong - it shouldn't be possible to dispatch two of those in a same cycle.

Out of curiosity:

``` ldr	w4, [x2], #4```

is the resource consumption info correct for that instruction?

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:49
+/// instructions are met.
+static int checkRegisterHazard(const RegisterFile &PRF, const MCSchedModel &SM,
+                               const MCSubtargetInfo &STI, const InstRef &IR) {
----------------
As mentioned in another comment we should use unsigned quanitites whenever possible. So, int quantities that are only used to describe zero or more cycles should really be converted into unsigned.

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:107-114
+  if (IS.isEliminated())
+    return;
+
+  for (ReadState &RS : IS.getUses())
+    PRF.addRegisterRead(RS, STI);
+
+  for (WriteState &WS : IS.getDefs())
----------------
I expect register renaming to only occur for out-of-order processors. So move elimination shouldn't ever happen in this context.

That being said, instructions that are fully eliminated at register renaming stage still have their writes propagated to the register file. So, if you want to correctly handle that case, then you cannot early exit; you still need to add writes.

Note that the only instructions that can be eliminated at register renaming stage are register moves.  See the definition and description of tablegen class `RegisterFile` in 
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSchedule.td

I don't expect any of this to really affect in-order targets. So, personally I suggest to replace the initial if-stmt with an assert (i.e. check tha IS is NOT eliminated).

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:142
+
+  IS.dispatch(0);
+
----------------
Maybe add a comment here explaining why we are passing 0 as token-id.
Since this is an in-order processor, the retire stage (and therefore the token id) are not really required in practice. We don't expect the retire stage to do anything in particular.

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:148-150
+
+  --Bandwidth;
+  InstQueue.push(IR);
----------------
`Bandwidth` should be set to zero if IR `ends a group`.

You probaby need something semantically equivalent to what method DispatchStage::dispatch(InstRef IR) does:

```
// Check if this instructions ends the dispatch group.                                                                                                                      if (Desc.EndGroup)
  AvailableEntries = 0;      
```

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s:114-126
+# CHECK:      [0,0]     DeeeER    .    .    .   ldr	w4, [x2], #4
+# CHECK-NEXT: [0,1]     D=eeeER   .    .    .   ldr	w5, [x3]
+# CHECK-NEXT: [0,2]     .D===eeeeER    .    .   madd	w0, w5, w4, w0
+# CHECK-NEXT: [0,3]     .   DeeeER.    .    .   add	x3, x3, x13
+# CHECK-NEXT: [0,4]     .    DeeeER    .    .   subs	x1, x1, #1
+# CHECK-NEXT: [0,5]     .    D==eeeeER .    .   str	w0, [x21, x18, lsl #2]
+# CHECK-NEXT: [1,0]     .    . DeeeER  .    .   ldr	w4, [x2], #4
----------------
Interesting.
According to this design, the "dispatch" event is still decoupled from the "issue" event. Is that expected?
I am asking because in my mind, at least for in-order processors, the two events should coincide. Basically there is no reservation station, and uOPs are directly issued to the underlying execution unit at a IssueWidth maximum rate.

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s:116-117
+# CHECK-NEXT: [0,1]     D=eeeER   .    .    .   ldr	w5, [x3]
+# CHECK-NEXT: [0,2]     .D===eeeeER    .    .   madd	w0, w5, w4, w0
+# CHECK-NEXT: [0,3]     .   DeeeER.    .    .   add	x3, x3, x13
+# CHECK-NEXT: [0,4]     .    DeeeER    .    .   subs	x1, x1, #1
----------------
Why are these two executing out of order?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94928/new/

https://reviews.llvm.org/D94928