[PATCH] D39805: [Power9] Set MicroOpBufferSize for Power 9

Stefan Pintilie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 22 11:50:45 PST 2017


stefanp added a comment.

Hi Eric,

When I was looking at the scheduler code I came across the CurrCycle variable. The way I understand it is that this variable keeps track of the cycle that we are in the process of scheduling for. For example, an instruction that depends on no other instructions is scheduled at cycle 0 since it can be dispatched immediately. If instruction B depends on instruction A then it cannot be dispatched until A completes. If A takes two cycles then instruction B will not be scheduled until CurrCycle is at least 2. If I set MicroOpBufferSize=1 this is the way things generally work. It will dispatch everything it can at cycle 0 and then increase the cycle count to allow currently dispatched instructions to complete and then it will dispatch more. 
However, if I set MicroOpBufferSize to a value greater than 1 this CurrCycle is not updated in the same way. As in the scenario above B depends on the result of the two cycle operation A. However, in this case we may dispatch both A and B when CurrCycle=0. I have seen situations where an instruction was dispatched at a given CurrCycle count even though I knew that the inputs to that instruction would require more cycles than that to compute.

The issue seems to come from here:

  switch (SchedModel->getMicroOpBufferSize()) {
    case 0:
      assert(ReadyCycle <= CurrCycle && "Broken PendingQueue");
      break;
    case 1:
      if (ReadyCycle > NextCycle) {
        NextCycle = ReadyCycle;
        DEBUG(dbgs() << "  *** Stall until: " << ReadyCycle << "\n");
      }
      break;
    default:
      // We don't currently model the OOO reorder buffer, so consider all
      // scheduled MOps to be "retired". We do loosely model in-order resource
      // latency. If this instruction uses an in-order resource, account for any
      // likely stall cycles.
      if (SU->isUnbuffered && ReadyCycle > NextCycle)
        NextCycle = ReadyCycle;
      break;
    }

In cases when MicroOpBufferSize=1 the NextCycle is bumped up by the ReadyCycle whenever we need to wait for things to finish. However, when MicroOpBufferSize>1 the NextCycle is bumped up only when we use a reserved or in-order resource. Based on the comment in the default case I believe that this is the way the scheduler was designed to run. I don't think that this is a bug or something that needs to be fixed in the `MachineScheduler.cpp` where it is now.  
We have a few options here to get the performance we want:

1. Use MicroOpBufferSize=1 as I did here
2. Copy the `MachineScheduler.cpp` code into something like `PPCMachineScheduler.cpp` and then change that switch statement to have it do the same thing for the default case as for the 1 case.
3. Create a proper custom PPC Machine Scheduler.

The reason I picked option 1 was mostly due to time constraints. I feel that options 1 and 2 are both non-ideal but that doing option 3 correctly will probably take months to complete. In the long term I think we should be moving toward option 3.

So, to answer your question, if we were to do option 3 we would be able to set MicroOpBufferSize to a value greater than 1 which makes more sense for the out-of-order Power PC architecture. Due to time constraints we are thinking of using a value of 1 for now and then in the future implement a custom scheduler and set the MicroOpBufferSize to a different value.


https://reviews.llvm.org/D39805





More information about the llvm-commits mailing list