[PATCH] D94604: [CodeGen] Allow parallel uses of a resource

Tue Mar 9 03:15:10 PST 2021

andreadb added a comment.

In D94604#2506907 <https://reviews.llvm.org/D94604#2506907>, @dpenry wrote:

> <snip>
>
> Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:
>
> 1. Some instructions require the entire FP datapath
> 2. Other instructions require half of the FP datapath
> 3. It is possible to dual-issue two instructions each requiring half of the FP datapath
> 4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.
>
> I would love it if there was a way to just make this work out of the box.  However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick.  Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

> I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles.  And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW.  What it cares about is the list of resources and cycles in the WriteProcResTable.

The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.

In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.

In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94604/new/

https://reviews.llvm.org/D94604