[PATCH] D98604: [MCA] Ensure that writes occur in-order

Mon Mar 15 04:33:43 PDT 2021

andreadb added inline comments.

================
Comment at: llvm/include/llvm/MCA/Stages/InOrderIssueStage.h:41
   SmallVector<InstRef, 4> IssuedInst;
+  InstRef LastIssuedInst;

----------------
Why not just storing a number of cycles left?

You can easily compute that latency once, and then decrease it on every cycle (saturating to zero).

When you check for write-order hazards, you simply compare the longest write latency against that quantity. You won't need to recompute the `cycles left` from scratch starting from a `LastIssuedInst`.

NOTE: it should not always reference the "last issued instruction". In case of OoO execution (see the comment below) the slowest write may not have been contributed by the last issued instruction. That quantity should always be the MAX write latency in general.

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:152-158
+  if (LastIssuedInst && !LastIssuedInst.getInstruction()->getDesc().RetireOOO) {
+    // Delay the instruction to ensure that writes occur in program
+    // order
+    if (unsigned StallWritesOrder = checkWritesOrder(LastIssuedInst, IR)) {
+      *StallCycles = StallWritesOrder;
+    }
+  }
----------------
There is still a bit of confusion around the meaning of flag `RetireOOO`, and what it actually means in practice for llvm-mca.

Over the last few days, I did some searches on "out-of-order commit" and what it actually means in practice. It is not to be confused with "out-of-order execution". Our RCU knows how to commit "in-order", but it doesn't know how to simulate "out-of-order" commit.

I don't have access to a cortex-a55, so I cannot run experiments on it.
However, I think that for now it is safe to just assume that `RetireOOO` means: out-of-order execution, plus out-of-order commit.

For out-of-order commit, we may not need to simulate an RCU. So I think your change on the RCU tokens is fine.

The question however is: what happens if you alternate instructions with in-order/out-of-order execution?
Example: `FDIV, ADD, FDIV, ADD, ...`

Your `LastIssuedInst` currently points to the instruction that reaches write-back stage last. However, you still have to account for the FDIV write latency when checking if the following ADD `canIssue`. ADDs must still be issued in-order. So it needs to wait for the completion of FDIV. 

`RetireOOO` would become just a way to bypass the `checkWritesOrder` check. 

Could you also add a test for sequences of {FDIV,ADD} pairs?

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-in-order-retire.s:106-108
+# CHECK:      [0,0]     DeeeeeeeER.    .   .   sdiv	w12, w21, w0
+# CHECK-NEXT: [0,1]     .    DeeER.    .   .   add	w8, w8, #1
+# CHECK-NEXT: [0,2]     .    DeeER.    .   .   add	w1, w2, w0
----------------
tl;dr: not an inssue introduced by this patch. However, in future we may need to restrict the number of read/write ports available to prevent that too many instructions end up updating the PRF at the same time.

All those instructions will reach write-back stage at the same time. I don't know how many write ports there are in the PRF of this processor. If it allows up to 2 writes per cycle, then the issue of the last instruction should be artificially delayed by one cycle.

More in general, limits in the number of read/write ports in the register file are not currently modelled. It would be nice to add support for those in the future (assuming that this may become an important souce of inaccuracy especially when simulating in-order processors).
The register file descriptor in the "extra processor info" struct is mainly a way to:
 - limit the amount of phys registers for renaming
 - enable/disable move elimination
 - enable/disable the matching of zero-idioms.

None of that is probably useful for in-order processors. We may instead want to add extra fields for the number of read/write ports. So that, when we check if an instruction can execute, we check the availability of read ports on issue, and we check the availability of write ports on write-back. Otherwise we stall the instruction.

This would be a future development.

How many write ports there are in the register file? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98604/new/

https://reviews.llvm.org/D98604