[PATCH] D94604: [CodeGen] Allow parallel uses of a resource

Tue Jan 19 07:55:29 PST 2021

dpenry added a comment.

In D94604#2506534 <https://reviews.llvm.org/D94604#2506534>, @andreadb wrote:

> In D94604#2505268 <https://reviews.llvm.org/D94604#2505268>, @dpenry wrote:
>
>> In D94604#2504464 <https://reviews.llvm.org/D94604#2504464>, @andreadb wrote:
>>
>>> I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field `ResourceUses`.
>>>
>>> The "problematic" resource is M7UnitVPort
>>>
>>>   def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }
>>>
>>> In your case, you want to allow the consumption of both resource units from a single write.
>>> You can do that if you convert M7UnitVPort into a group (see example below)
>>>
>>>   def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
>>>   def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }
>>>   
>>>   def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;
>>>
>>> At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:
>>>
>>> Example - before:
>>>
>>>   def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>
>>>
>>> Example - after:
>>>
>>>   def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>
>>>
>>> In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
>>> The obvious downside is that currently you need to declare multiple resources to do what you want to do.
>>
>> Unfortunately, I have tried doing this with a resource group with no success.  ExpandProcResources ends up marking the resource group as used for multiple cycles:
>>
>> From CortexM7ModelSchedClasses:
>>
>>   {DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136
>>
>> From ARMWriteProcResTable:
>>
>>   { 9,  1}, // #161
>>   {10,  2}, // #162
>>   {11,  1}, // #163
>>   {12,  1}, // #164
>>
>> From CortexM7ModelProcResources:
>>
>>   {"M7UnitVFP",       1, 0, 0, nullptr}, // #9
>>   {"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
>>   {"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
>>   {"M7UnitVPort1",    1, 0, 0, nullptr}, // #12
>>
>> In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles.  That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.
>
> Maybe I am missing some context here (apologies in case), but why is that a problem in practice?
>
> This is how I see it:
>
> Resource-cycles are there to limit the resource throughput. The write from your example can only be issued when both ports (M7UnitVPort0 and M7UnitVPort1) are available. If group M7UnitVPort is partially or fully used, then your write needs to be delayed until both ports become available. The model assumes that micro-opcodes are all dispatched at the same cycle. We cannot currently model "delayed consumption of resources", so resource consumption starts immediately at the beginning of the issue cycle.
> In practice, what that means is that ports are "consumed" during the entire duration of the issue cycle. The two resource cycles set by ExpandProcResources for group M7UnitVPort are in practice contributed by the underlying units (i.e. 1 cycle of M7UnitVPort0, and 1 cycle by M7UnitVPort1). So the group doesn't need to be consumed for any extra cycles.
> That write alone is enough to maximise the throughput of M7UnitVPort; no other write that uses M7UnitVPort0 and/or M7UnitVPort1 can issue during that same cycle.

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

1. Some instructions require the entire FP datapath
2. Other instructions require half of the FP datapath
3. It is possible to dual-issue two instructions each requiring half of the FP datapath
4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box.  However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick.  Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles.  And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW.  What it cares about is the list of resources and cycles in the WriteProcResTable.  So what appears to be happening when the resource group method is used is that MachineScheduler, when scheduling one of these instructions that uses both M7UnitVPort0 and M7UnitVPort1, marks M7UnitVPort0 and M7VPort1 as being occupied the until next cycle and **one** M7UnitVPort as occupied until **two** cycles from now.  (SchedBoundary::bumpNode, line 2427 for the top-down scheduling side).  Similarly, the top-down check for scheduling this instruction looks for the first cycle in which all three of M7UnitVPort0, M7UnitVPort1, and M7UnitVPort are available, and doesn't try to find two of M7UnitVPort (SchedBoundary::getNextResourceCycle).  (Note that bottom-up scheduling takes the cycle count into account by adding it to the first available cycle and not by incorporating it into the recorded occupancy, with essentially the same effect.) This does two unwanted things:

1. It allows another M7UnitVPort user (e.g., a VLDR) to simultaneously issue in this cycle, which is not wanted.
2. It prevents two M7UnitVPort users (e.g VLDR.F32 <https://reviews.llvm.org/F32>) from simultaneously issuing in the next cycle, when they should be able to.

It does succeed in preventing two dual-M7UnitVPort users from simultaneously issuing due to the limitations in M7UnitVPort0 and M7UnitVPort1.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94604/new/

https://reviews.llvm.org/D94604