[PATCH] D94928: [llvm-mca] Add support for in-order CPUs
Andrew Savonichev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 20 09:38:25 PST 2021
asavonic added a comment.
Thanks for the review Andrea!
In D94928#2506972 <https://reviews.llvm.org/D94928#2506972>, @andreadb wrote:
> Your model assumes an unbounded queue of instructions (something like rudimental reservation station) where to store dispatched instructions.
If you mean `InstQueue`, then it is bounded by `Bandwidth` variable - the
maximum number of instructions that can be issued in the next cycle.
> Correct me if I am wrong, but in-order processor don't use a reservation station.
> In the absence of structural hazards, if data dependencies are met, then uOPs are directly issued to the underlying execution units.
> So the dispatch event is not decoupled from the issue event.
>
> The fact that your patch adds an unbounded queue sounds a bit strange to me. Not sure what @dmgreen
> thinks about it. But this basically means that dispatch and issue are different events.
That is true. However, the problem here is that MCA timeline view counts stalls
as a number of cycles between dispatch and issue events. If dispatch and issue
always happen in the same cycle, stalls are not displayed:
[0,3] . DeeER . . add w13, w30, #1
[0,4] . DeeeER . . smulh x30, x29, x28
[0,5] . DeeeER . smulh x27, x30, x28
[0,6] . DeeeER. smulh xzr, x27, x26
[0,7] . . DeeeER umulh x30, x29, x28
To avoid this, the implementation emits a dispatch event for instructions that
should be executed in the next cycle. If an instruction is unable to execute due
to a hazard, it is delayed and a stall is displayed starting from the dispatch
event:
[0,3] . DeeeER . . add w13, w30, #4095, lsl #12
[0,4] . DeeeeER . . smulh x30, x29, x28
[0,5] . D==eeeeER . smulh x27, x30, x28
[0,6] . D=====eeeeER. smulh xzr, x27, x26
[0,7] . . D=eeeeER umulh x30, x29, x28
I remember that I did this intentionally, but now I'm not really convinced that
this difference is worth extra complexity. Let me know what you think about
this.
> I also noticed how there are no checks on `NumMicroOps`. Is there a reason why you don't check for it?
Good point, I will fix that.
> In one of the tests, the target is dual-issue. However, there are cycles where three opcodes are dispatched.
> See for example the test where two loads are dispatched in a same cycle (with the first load decoded into two uOPs).
I think this should not happen. I will add a check for NumMicroOps.
================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s:116-117
+# CHECK-NEXT: [0,1] D=eeeER . . . ldr w5, [x3]
+# CHECK-NEXT: [0,2] .D===eeeeER . . madd w0, w5, w4, w0
+# CHECK-NEXT: [0,3] . DeeeER. . . add x3, x3, x13
+# CHECK-NEXT: [0,4] . DeeeER . . subs x1, x1, #1
----------------
andreadb wrote:
> Why are these two executing out of order?
Madd and add are issued in the same cycle, subs is issued next.
However, they should not retire out-of-order. Some instructions can
retire out-of-order, but not these.
I have to look into this. Probably an RCU is actually needed for the
in-order pipeline.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94928/new/
https://reviews.llvm.org/D94928
More information about the llvm-commits
mailing list