[PATCH] D98604: [MCA] Ensure that writes occur in-order

Mon Mar 15 11:15:56 PDT 2021

asavonic added inline comments.

================
Comment at: llvm/lib/MCA/Stages/InOrderIssueStage.cpp:152-158
+  if (LastIssuedInst && !LastIssuedInst.getInstruction()->getDesc().RetireOOO) {
+    // Delay the instruction to ensure that writes occur in program
+    // order
+    if (unsigned StallWritesOrder = checkWritesOrder(LastIssuedInst, IR)) {
+      *StallCycles = StallWritesOrder;
+    }
+  }
----------------
andreadb wrote:
> There is still a bit of confusion around the meaning of flag `RetireOOO`, and what it actually means in practice for llvm-mca.
> 
> Over the last few days, I did some searches on "out-of-order commit" and what it actually means in practice. It is not to be confused with "out-of-order execution". Our RCU knows how to commit "in-order", but it doesn't know how to simulate "out-of-order" commit.
> 
> I don't have access to a cortex-a55, so I cannot run experiments on it.
> However, I think that for now it is safe to just assume that `RetireOOO` means: out-of-order execution, plus out-of-order commit.
> 
> For out-of-order commit, we may not need to simulate an RCU. So I think your change on the RCU tokens is fine.
> 
> The question however is: what happens if you alternate instructions with in-order/out-of-order execution?
> Example: `FDIV, ADD, FDIV, ADD, ...`
> 
> Your `LastIssuedInst` currently points to the instruction that reaches write-back stage last. However, you still have to account for the FDIV write latency when checking if the following ADD `canIssue`. ADDs must still be issued in-order. So it needs to wait for the completion of FDIV. 
> 
> `RetireOOO` would become just a way to bypass the `checkWritesOrder` check. 
> 
> Could you also add a test for sequences of {FDIV,ADD} pairs?
> 
> There is still a bit of confusion around the meaning of flag `RetireOOO`, and what it actually means in practice for llvm-mca.
> 
> Over the last few days, I did some searches on "out-of-order commit" and what it actually means in practice. It is not to be confused with "out-of-order execution". Our RCU knows how to commit "in-order", but it doesn't know how to simulate "out-of-order" commit.

I think the intention is to allow some instructions to retire naturally, even if the order of retire does not match the program order.
In that case, RCU can do nothing and allow an instruction to retire as soon as it finishes execution.

> 
> I don't have access to a cortex-a55, so I cannot run experiments on it.
> However, I think that for now it is safe to just assume that `RetireOOO` means: out-of-order execution, plus out-of-order commit.
> 
> For out-of-order commit, we may not need to simulate an RCU. So I think your change on the RCU tokens is fine.
> 
> The question however is: what happens if you alternate instructions with in-order/out-of-order execution?
> Example: `FDIV, ADD, FDIV, ADD, ...`

I don't think that Cortex-A55 can execute instructions out-of-order. Some FP instructions can commit/retire out-of-order,
but they use a different set of registers and their timing characteristics are aligned (according to the documentation).

The model we have in MCA now seems to match the hardware (at least by IPC):
    [0,0]     DeeeeeeeeeeeER .    .    .   fdiv	s1, s2, s3
    [0,1]     DeeER.    .    .    .    .   add	w8, w8, #1
    [0,2]     .DeeER    .    .    .    .   add	w1, w2, w0
    [0,3]     .DeeER    .    .    .    .   add	w3, w4, #1
    [0,4]     . DeeER   .    .    .    .   add	w5, w6, w0
    [0,5]     .    .  DeeeeeeeeeeeER   .   fdiv	s4, s5, s6
    [0,6]     .    .  DeeER  .    .    .   add	w7, w9, w0
    [0,7]     .    .   DeeER .    .    .   add	w3, w1, w3
    [0,8]     .    .   DeeER .    .    .   add	w7, w2, w4
    [0,9]     .    .    DeeER.    .    .   add	w5, w5, w9

Note that I had to fix the FDIV latency/resource-cycles in LLVM scheduler model. 
Original values match the ARM documentation, but I get different timings just by running FDIV in a loop.

The following sequence also matches the hardware:

    [0,0]     DeeeeeeeeeeeER .    .    .    .    .   fdiv       s1, s2, s3
    [0,1]     DeeER.    .    .    .    .    .    .   add        w8, w8, #1
    [0,2]     .    .    . DeeeeeeeeeeeER    .    .   fdiv       s4, s2, s1
    [0,3]     .    .    . DeeER   .    .    .    .   add        w1, w8, #1
    [1,0]     .    .    .    .    DeeeeeeeeeeeER .   fdiv       s1, s2, s3
    [1,1]     .    .    .    .    DeeER.    .    .   add        w8, w8, #1

> 
> Your `LastIssuedInst` currently points to the instruction that reaches write-back stage last. However, you still have to account for the FDIV write latency when checking if the following ADD `canIssue`. ADDs must still be issued in-order. So it needs to wait for the completion of FDIV. 
> 
> `RetireOOO` would become just a way to bypass the `checkWritesOrder` check. 

Can you elaborate why ADD needs to wait for the completion of FDIV? 
If FDIV is allowed to retire out-of-order, then it can retire after ADD, so ADD doesn't have to wait.

> 
> Could you also add a test for sequences of {FDIV,ADD} pairs?
> 

There is A55-out-of-order-retire.s test with FDIV and ADD sequence. 
I can add the second example above as a LIT test as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98604/new/

https://reviews.llvm.org/D98604