[PATCH] D94604: [CodeGen] Allow parallel uses of a resource

Wed Mar 10 18:26:26 PST 2021

dpenry added a comment.

In D94604#2613522 <https://reviews.llvm.org/D94604#2613522>, @andreadb wrote:

> In D94604#2506907 <https://reviews.llvm.org/D94604#2506907>, @dpenry wrote:
>
>> <snip>
>>
>> Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:
>>
>> 1. Some instructions require the entire FP datapath
>> 2. Other instructions require half of the FP datapath
>> 3. It is possible to dual-issue two instructions each requiring half of the FP datapath
>> 4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.
>>
>> I would love it if there was a way to just make this work out of the box.  However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick.  Nor did trying to define VPort0 and VPort1 as sub-units of VPort.
>
> What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).
>
> More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

That is certainly what I would have expected resource groups to do.

>> I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles.  And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW.  What it cares about is the list of resources and cycles in the WriteProcResTable.
>
> The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
> For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.
>
> In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

That is the computation I'm seeing.

> If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.
>
> In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

That's what I don't see happening in MachineScheduler.  It sees the group as if it were a separate resource which is consumed for two cycles and doesn't try to find parallel units within the group to provide those two cycles of resource consumption.  As far as I can tell, the concept of groups does not exist at all in MachineScheduler.

> Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

I think where we're getting to is that MachineScheduler has never been made to work "as expected" with groups.  If groups are the accepted way to specify this sort of resource usage, then making MachineScheduler use group information would seem to be preferable to adding a new annotation.  However, I do have one reservation.  Groups are used fairly widely at present -- I see them in PowerPC, X86, AArch64, and ARM.  I am not at all sanguine about changing something which would have such widespread effects without more people chiming in.  Any idea who else should be part of this discussion?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94604/new/

https://reviews.llvm.org/D94604