[PATCH] D94928: [llvm-mca] Add support for in-order CPUs

Wed Jan 20 10:37:44 PST 2021

andreadb added a comment.

In D94928#2510059 <https://reviews.llvm.org/D94928#2510059>, @asavonic wrote:

> Thanks for the review Andrea!
>
> In D94928#2506972 <https://reviews.llvm.org/D94928#2506972>, @andreadb wrote:
>
>> Your model assumes an unbounded queue of instructions (something like rudimental reservation station) where to store dispatched instructions.
>
> If you mean `InstQueue`, then it is bounded by `Bandwidth` variable - the
> maximum number of instructions that can be issued in the next cycle.
>
>> Correct me if I am wrong, but in-order processor don't use a reservation station. 
>> In the absence of structural hazards, if data dependencies are met, then uOPs are directly issued to the underlying execution units.
>> So the dispatch event is not decoupled from the issue event.
>>
>> The fact that your patch adds an unbounded queue sounds a bit strange to me. Not sure what @dmgreen 
>>  thinks about it. But this basically means that dispatch and issue are different events.
>
> That is true. However, the problem here is that MCA timeline view counts stalls
> as a number of cycles between dispatch and issue events. If dispatch and issue
> always happen in the same cycle, stalls are not displayed:
>
>   [0,3]     .  DeeER  .    .    add	w13, w30, #1
>   [0,4]     .  DeeeER .    .    smulh	x30, x29, x28
>   [0,5]     .     DeeeER   .    smulh	x27, x30, x28
>   [0,6]     .        DeeeER.    smulh	xzr, x27, x26
>   [0,7]     .    .    DeeeER    umulh	x30, x29, x28
>
> To avoid this, the implementation emits a dispatch event for instructions that
> should be executed in the next cycle. If an instruction is unable to execute due
> to a hazard, it is delayed and a stall is displayed starting from the dispatch
> event:
>
>   [0,3]     . DeeeER  .    .    add	w13, w30, #4095, lsl #12
>   [0,4]     . DeeeeER .    .    smulh	x30, x29, x28
>   [0,5]     .  D==eeeeER   .    smulh	x27, x30, x28
>   [0,6]     .  D=====eeeeER.    smulh	xzr, x27, x26
>   [0,7]     .    .  D=eeeeER    umulh	x30, x29, x28
>
> I remember that I did this intentionally, but now I'm not really convinced that
> this difference is worth extra complexity. Let me know what you think about
> this.

Ok, I understand now.
It was done intentionally to emphasize issue stalls.

Personally I prefer if the 'dispatch' event coincides with the 'issue' event.
>From a theoretical point of view, those events should really be the same.

Conceptually, "dispatch" is only really useful when describing the lifetime of an instruction in an out-of-order processor.
For those processors it makes sense to distinguish between cycles where uOPs are picked from the decoder's queue, from cycles where uOPs are actually sent by the scheduler(s) to the underlying pipeline(s).

In future, we may introduce a new option for the timeline to toggle the generation of extra chars for dispatch stalls (possibly using a different character from = to avoid confusions).
That might be contributed by a separate patch in future. For now, if you don't mind I'd rather keep your patch simple.

>> I also noticed how there are no checks on `NumMicroOps`. Is there a reason why you don't check for it?
>
> Good point, I will fix that.
>
>> In one of the tests, the target is dual-issue. However, there are cycles where three opcodes are dispatched.
>> See for example the test where two loads are dispatched in a same cycle (with the first load decoded into two uOPs).
>
> I think this should not happen. I will add a check for NumMicroOps.

Thanks!

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s:116-117
+# CHECK-NEXT: [0,1]     D=eeeER   .    .    .   ldr	w5, [x3]
+# CHECK-NEXT: [0,2]     .D===eeeeER    .    .   madd	w0, w5, w4, w0
+# CHECK-NEXT: [0,3]     .   DeeeER.    .    .   add	x3, x3, x13
+# CHECK-NEXT: [0,4]     .    DeeeER    .    .   subs	x1, x1, #1
----------------
asavonic wrote:
> andreadb wrote:
> > Why are these two executing out of order?
> Madd and add are issued in the same cycle, subs is issued next.
> However, they should not retire out-of-order. Some instructions can
> retire out-of-order, but not these.
> 
> I have to look into this. Probably an RCU is actually needed for the
> in-order pipeline.
> 
In theory, younger instructions should not be allowed to reach the write-back stage before older instructions because that would lead to out-of-order execution. 
In this case I was expecting a compulsory stall to artificially delay the issue of the `add` so that it can write-back in-order w.r.t. the madd.
What are those cases where it is allowed to write-back instructions out of order? Shouldn't architectural commits always happen in-order?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94928/new/

https://reviews.llvm.org/D94928