[PATCH] D98976: [CodeGen] Use ProcResGroup information in SchedBoundary

Tue Apr 13 13:31:37 PDT 2021

andreadb added a comment.

I had another look at the patch.

The resource cycle computation doesn't look correct. I think that there is a bit of confusion on how resource cycles of a group are allocated to the underlying sub-units. See my comments below.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2061-2062
+             u != ue; ++u) {
+          ResourceGroupIDs[SchedModel->getProcResource(i)
+                               ->SubUnitsIdxBegin[u]] = i;
+        }
----------------
This doesn't look correct.

A same processor resource may belong to multiple groups, not just one.

Example:
```
def X : ProcResource<1>;
def Y : ProcResource<1>;
def Z : ProcResource<1>;

def A : ProcResGroup<[X, Y]>;
def B : ProcResGroup<[X, Z]>;
def C : ProcResGroup<[X, Y, Z]>;
```

Here X is included in A, B and C.

Your logic is only able to track a single group identifier. This is not enough.
It may be uncommon or non-sensical for your particular model. However the syntax let's you define this particular nesting of groups.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2116-2124
+    for (unsigned I = StartIndex, End = StartIndex + NumberOfInstances; I < End;
+         ++I) {
+      unsigned NextUnreserved = getNextResourceCycleByInstance(I, Cycles);
+      if (MinNextUnreserved > NextUnreserved) {
+        InstanceIdx = I;
+        MinNextUnreserved = NextUnreserved;
+      }
----------------
Remove tabs.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2125
+    return std::make_pair(MinNextUnreserved, InstanceIdx);
+  } else {
+    SmallVector<std::pair<unsigned, unsigned>, 8> NextResCycles;
----------------
unnecessary else after return.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2189
+  PR.clear();
+  if (HasUnbufferedResourceGroup) {
+    SmallVector<uint16_t, 32> GroupAdjustments;
----------------
Early exit to reduce indentation if `HasUnbufferedResourceGroup` is false.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2199
+      if (ResGrpID && !SchedModel->getProcResource(ResGrpID)->BufferSize) {
+        GroupAdjustments[ResGrpID]++;
+        HasResGroup = true;
----------------
Why do you need to keep track of the number of uses?

The number of users of a group isn't important.
The only thing that matters for the resource latency computation is how many cycles are contributed by the sub-units. See my comment below.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2204
+    if (HasResGroup) {
+      PR.clear();
+      for (TargetSchedModel::ProcResIter
----------------
This seems redundant. PR is already cleared on entry to this method.

================
Comment at: llvm/lib/CodeGen/MachineScheduler.cpp:2212
+              PI->ProcResourceIdx,
+              uint16_t(PI->Cycles / GroupAdjustments[PI->ProcResourceIdx]),
+              GroupAdjustments[PI->ProcResourceIdx]});
----------------
This doesn't look right.

Two problems:
1. Different sub-units may be consumed for a different number of resource cycles.
2. The number of resource cycles consumed by all the sub-units of a group may not match the value of `PI->Cycles`.

About 1.
Example:

```
def X : ProcResource<1>;
def Y : ProcResource<1>;
def A : ProcResGroup<[X, Y]>;

def MyWrite : SchedWriteRes<[X,Y]> {
  let Latency = 3;
  let ResourceCycles = [1, 2];
}
```

MyWrite is a write that consumes X for 1 cycle, and Y for 2 cycles.
Both X and Y are part of group A, so the subtarget emitter will generate the following profile:

```
X -> 1 resource cycle
Y ->  2  resource cycles
A -> 3  resource cycles
```

In this case, the resource cycles of group A are equivalent to the sum of the resource cycles of X and the resource cycles of Y.

Group A is not consumed for any extra cycles, so the resource latency of A is `MAX(ResourceCycle_of(X), ResourceCycle_of(Y)) = 2cy`. X is consumed for 1cy. Y is consumed for 2cy.

As far as I understand, your code assumes that X and Y will be consumed for the same number of cycles. This is not true in general.
Also, the number of group cycles may not match the total cycles contributed by the sub-units (see the following example).

About point 2.

```
def MyWrite : SchedWriteRes<[X,Y,A]> {
  let Latency = 3;
  let ResourceCycles = [1, 2, 1];
}
```

This is similar to the previous example. However , A is consumed for an extra cycle.
the profile generated by the subtarget emitter is:

```
X -> 1 resource cycle
Y ->  2  resource cycles
A -> 4  resource cycles
```

This situation is more uncommon, but still possible. The model doesn't prevent users from "overbooking" group resources.

If we subtract the resource cycles from X and Y, we find out that A is still consumed for an extra cycle. In this case, conservatively assume that A is consumed for `MAX(ResourceCycle_of(X), ResourceCycle_of(Y)) + 1 -> 3cy`.
So, 3cy instead of 2cy.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98976/new/

https://reviews.llvm.org/D98976