[LLVMdev] scoreboard hazard det. and instruction groupings

Hal Finkel hfinkel at anl.gov
Mon Jun 11 12:07:48 PDT 2012


On Mon, 11 Jun 2012 10:48:18 -0700
Andrew Trick <atrick at apple.com> wrote:

> On Jun 11, 2012, at 9:30 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > I'm considering writing more-detailed itineraries for some PowerPC
> > CPUs that use the 'traditional' instruction grouping scheme. In
> > essence, this means that multiple instructions will stall in some
> > pipeline stage until a complete group is formed, then all will
> > continue.
> > 
> > I expect to provide CPU-specific code to help determine when the
> > currently-waiting instructions would form a group. Is there a
> > straightforward way that I could override the scoreboard hazard
> > detector (or layer on top of it) to insert this kind of logic?
> > 
> > Should I instead be looking to do something like what Hexagon does
> > for VLIW cores? I think the main difference between the PowerPC
> > scheme and a VLIW scheme, is that the CPU itself can form groups
> > internally, it is just more efficient if the groups are provided
> > pre-made. Maybe this difference, if it is one, is insignificant.
> 
> Hal, I think you're asking whether to use the
> ScheduleHazardRecognizer or DFAPacketizer. I suggest sticking with
> the hazard recognizer unless you have an important problem that can't
> be solved that way. It's the API used by most targets and doesn't
> require a custom scheduler. Although I don't want to stop you from
> generalizing the DFA work either if you feel compelled to do that.

I don't yet feel compelled, and I don't know much about the
DFAPacketizer. I just want something that will work cleanly ;)

Looking at VLIWPacketizerList::PacketizeMIs, it seems like the
instructions are first scheduled (via some external scheme?), and then
packetized 'in order'. Is that correct?

> 
> Ignoring compile time for a moment, I think an advantage of a DFA is
> modeling a situation where the hardware can assign resources to best
> fit the entire group rather then one instruction at a time. For
> example, if InstA requires either Unit0 or Unit1, and InstB requires
> Unit0, is {InstA, InstB} a valid group? Depending on your cpu, a DFA
> could either recognize that it's valid, or give you a chance to
> reorder the instructions within a group once they've been selected.

In the PowerPC grouping scheme, resources are assigned on a group
basis (by the instruction dispatching stages). However, once the group
is dispatched to the appropriate functional units, 'bypass' is still
available on an instruction-by-instruction basis to instructions in
later groups. Final writeback waits until all members of the group
complete.

> 
> Ideally, you can express your constraints using InstrStage itinerary
> entries. 

I don't see how, in the current scheme, to express that an instruction
must wait in FU0 until there are also waiting instructions in FU1, FU2
and FU3. Furthermore, there are certain constraints on what those
instructions can be, and which ones will move forward as the next
dispatched group, and I think we need to fallback into C++ to deal with
them.

> If not, then you need to do your own bookkeeping by saving
> extra state during EmitInstruction and checking for hazards in
> getHazardType. At this point, you need to decide whether your custom
> logic can be easily generalized to either top-down or bottom-up
> scheduling.

I think that it can be either. Within the current system, however, it
might need to be top down. To do bottom up, you'd need to have
look-ahead of the depths of the pipelines to emit grouping hazards
when considering the ends of the pipelines (although this may just be
for corner cases, I think that normal dependency analysis should catch
most of these).

> If not, you can force MISched to scheduling one either
> direction. SD scheduling is stuck with bottom-up for the remainder of
> its days, and postRA scheduling is top-down.

I would rather do something that will be easy to maintain going
forward, and so if that can be accomplished within the normal
framework, then that would be great.

I think that I'll try ignoring the issue for now, just use a normal
itinerary with bottom-up scheduling, and then the existing top-down
pass (which attempts to enforce some of the ordering constraints
(which are most severe on the G5s)). If that gives unsatisfactory
results, then we can think about something more involved.

Thanks again,
Hal

> 
> -Andy

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list