[PATCH] D98976: [CodeGen] Use ProcResGroup information in SchedBoundary

Tue Apr 13 08:36:43 PDT 2021

dpenry added a comment.

In D98976#2685193 <https://reviews.llvm.org/D98976#2685193>, @andreadb wrote:

> In D98976#2684092 <https://reviews.llvm.org/D98976#2684092>, @dpenry wrote:
>
>> In D98976#2682694 <https://reviews.llvm.org/D98976#2682694>, @RKSimon wrote:
>>
>>> I'm not sure this avoids the criticisms from D94604 <https://reviews.llvm.org/D94604> - we have resource groups and the child resource units, and we can control the width of both of these to model usage each cycle.
>>>
>>> But for starters, I'd like to see tests of some kind (llvm-mca timeline or whatever you prefer) that clearly shows the issue you think you're encountering.
>>
>> The issue comes when using resource groups with BufferSize = 0 -- MachineScheduler just doesn't care about the group indicating a parallel use of resources, though from the discussion in D94604 <https://reviews.llvm.org/D94604>, it seems clear that this is what resource groups are intended to model.  It hasn't shown up as an issue before because resource groups with BufferSize = 0 haven't been used before D98977 <https://reviews.llvm.org/D98977> (which tries to use resource groups in the CortexM7 scheduling model instead of directly using multiple copies of a resource, as code comments indicate that using a resource twice for an in-order model is supposed to mean using it over multiple cycles.)
>>
>> Some sort of test showing the change in scheduling outcomes is a good idea and I'll come up with something.
>
> This patch doesn't look right to me. I share the same concerns as Simon.
>
> How resource cycles are contributed by individual units of a group has nothing to do with the value of BufferSize.
> I don't understand why you think it should make a difference.
>
> A BufferSize of zero simply means that the dispatch event must coincide with the issue event. In the absence of a buffer, opcodes are forced to be sent immediately for execution to the underlying resource units. Basically, there is no delayed execution, and resource consumption starts immediately. It doesn't say anything about how and which units are selected for execution.

In MachineScheduler, BufferSize = 0 means more than just dispatch = issue.  I don't know if this is what was intended originally, but it's certainly what's in the code. Only resources with BufferSize = 0 are "reserved" resources.  ReservedCycles is only updated for these resources  Thus, if BufferSize <> 0, the consumption is *not* tracked on a cycle-by-cycle basis.  This is not anything I have changed; if it doesn't make sense, it hasn't made sense for a long time.

But this actually does make sense, I think.  When BufferSize = 0, the structural hazard stalls both issue and dispatch; the resource consumption can be assigned to a particular cycle.  When it isn't, it's stalling dispatch, but not issue, and the resource consumption can 't really be assigned to a particular cycle.  Thus, the current approach of not tracking it exactly but using it to determine overall resource pressure is a valid one when BufferSize <> 0.

TableGen doesn't allow a resource to be repeated in the list of processor resources for an instruction, stating in the comments that it's important to use them serially for in-order processors.  (I'm not sure I agree with that, but that's what it says and does.) Thus uses of two sub-resources of a resource group, two uses of the same resource, or uses of two resources with the same parent all end up with some resource being used for 2 cycles instead of two uses of 1 cycle.  That's perfectly fine if BufferSize <> 0 for the resource, as the amount of resource pressure is the same for two uses of one cycle and one use of two cycles.  But for BufferSize = 0, you get weird scheduling results instead.  D94604 <https://reviews.llvm.org/D94604> attempted to get around this by allowing an explicit statement in the scheduling model of how many parallel uses of a resource were intended and carrying that information in the WriteProcRes table.  As that was deemed not necessary because uses of resources within resource groups are supposed to imply parallel uses, this patch looks to see if the unbuffered resource at hand is a resource group and adjusts its cycle use as if the uses are intended to be in parallel.

If this all seems surprising, it is... who would have thought that it was impossible to schedule multiple uses of a resource by an in-order processor correctly?  But it turns out that no processor scheduling model had tried to use BufferSize = 0 with a processor resource group before.  The CortexM7 model uses BufferSize = 0 and multiple uses of the same resource, which doesn't produce good results; debugging that is what led to these patches.  One use of two cycles and two uses of one cycle are **not** the same thing for an in-order processor.

I will work up a test that shows what is happening currently, and hopefully that will make it more clear.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98976/new/

https://reviews.llvm.org/D98976