[llvm] r190717 - Adds support for Atom Silvermont (SLM) - -march=slm

Fri Sep 20 05:59:29 PDT 2013

----- Original Message -----
> Hi Preston,
> 
> On Sep 17, 2013, at 4:01 PM, Gurd, Preston <preston.gurd at intel.com>
> wrote:
> 
> > Hello Andy,
> > 
> > Why do you say below that there is "no reason to use itineraries
> > for Atom in the first place”?
> 
> Sorry for that hasty comment. I only meant that the old-style
> itineraries are a bad fit for the Atom machine model. I didn't mean
> to imply that you were wrong for using it at the time.
> 
> The itineraries attempt to model each stage in the processor
> pipeline. This is massive overkill for someone in your situation who
> simply wants to specify latency and functional units. You're
> currently faking latency by forcing instructions to occupy several
> pipeline stages in the reservation table. That not how itineraries
> were meant to be used, but it is getting the job done for you
> (innefficiently in terms of compile time).
> 
> > In the PostRA scheduler, is there any way to represent the
> > "throughput" (the number of cycles which must elapse before an
> > instruction of the same type can start) of an instruction?
> 
> The new model that works with MachineScheduler (not PostRA) lets you
> specify throughput in two dimensions, horizontally as a functional
> unit list, and vertically as a ResourceCycles attribute.
> 
> Horizontal:
> 
> def : WriteRes<WriteTwoPorts, [Port1, Port2]>;
> 
> Vertical:
> 
> def : WriteRes<WriteTwoCycles, [Port1]> { let ResourceCycles = [2]; }
> 
> > Do you expect that the new machine model will produce a better
> > schedule than the current PostRA scheduler?
> 
> Yes. If not, then something is misconfigured or we need some minor
> adjustments to the model or scheduler itself (and remember there are
> always cases where the scheduler just gets lucky/unlucky). If you
> have a badly scheduled loop, file a bug including as much analysis
> as you can. Try to read the -debug-only=misched output.
> 
> For SLM, you may not need the PostRA scheduler at all. I expect the
> MachineScheduler to be a better fit. You can currently enable the
> MachineScheduler and it will use your existing (old-style)
> itineraries (X86SubTarget::enableMachineScheduler() { if (Atom)
> return true; }). But we really don't want to support that. The
> proper thing to do for SLM is define an out-of-order machine model.
> See the SandyBridge/Haswell model.
> 
> For original Atom, you might still want a PostRA scheduler. Running
> the new MachineScheduler a second time as a replacement postRA
> scheduler is something I intended to do, but haven't had a client.
> You could enable MachineScheduler and benchmark to determine if
> PostRA sched is still really needed (-disable-post-ra). If you do
> still need it, then I’d like to try replacing it with a second
> MachineScheduler run. We do want to kill off the current PostRA
> scheduler. It is a maintenance burden that doesn't serve a purpose
> for targets that have migrated to MachineScheduler.

Andy,

I'd certainly like to try this :) -- Post-RA scheduling still gives me ~10-15% speedup on the PPC A2 (just because of the relatively long pipeline, adjusting after the spill code is inserted is quite beneficial).

Also, regarding the new machine model, can it handle instructions with more than one output where the different outputs have different latencies? My pre-increment loads have this property.

Thanks again,
Hal

> 
> > 
> > Is there any documentation about the new machine model?
> 
> As with the old itineraries, I don't have formal docs. Only the
> attempt at self-documentation in TargetSchedule.td. There are also
> some BOF slides from last year’s LLVM dev meeting.
> 
> The best way for me to improve the docs and for someone to migrate
> their target is to work together in an iterative process. Let me
> know when you have a chance to work on migration of SLM or Atom. I
> can provide a sample of what I think the machine model should look
> like. You can proceed until you hit a difficult or messy case. At
> that time, I can offer suggestions for handling it. If something is
> confusing, let me know. I'll try to explain, adding docs in the
> process.
> 
> -Andy
> 
> > 
> > -----Original Message-----
> > From: Andrew Trick [mailto:atrick at apple.com]
> > Sent: Monday, September 16, 2013 2:46 AM
> > To: Hal Finkel
> > Cc: Gurd, Preston; llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm] r190717 - Adds support for Atom Silvermont
> > (SLM) - -march=slm
> > 
> > 
> > On Sep 13, 2013, at 1:26 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> >> ----- Original Message -----
> >>> 
> >>> 
> >>> Just out of curiosity, when you have this:
> >>> +  InstrItinData<IIC_SHD16_REG_IM, [InstrStage<2, [IEC_RSV0]>] >,
> >>> 
> >>> do you intend this to mean that the shift occupied the IEC_RSV0
> >>> unit,
> >>> and nothing else can use it for 2 cycles? Or you do mean that the
> >>> latency is 2 cycles, but you can still issue back-to-back
> >>> independent
> >>> shifts?
> >>> 
> >>> -Hal
> >>> 
> >>> For the above itinerary, I am trying to represent that this
> >>> instruction must use the IEC reservation station 0 and that it
> >>> will
> >>> take two cycles to execute. I would like to also be able to
> >>> represent
> >>> that the throughput of the instruction is 2 cycles, but I do not
> >>> know
> >>> how to do this.
> >> 
> >> Okay, that's what I thought. I think you want to say this:
> >> InstrItinData<IIC_SHD16_REG_IM, [InstrStage<1, [IEC_RSV0]>], [2,
> >> 1,
> >> 1] >,
> >> 
> >> The implementation is fully pipelined, right? Assuming that it is,
> >> you only need to track the first pipeline stage (because there
> >> are no hazards later). The numbers at the end say that the output
> >> operand will be ready in cycle 2 post-dispatch, and that the
> >> input operands are read in the first cycle.
> >> 
> >> I've cc'd Andy so that he can correct me if I'm wrong ;)
> > 
> > Right. Although there is no reason to use itineraries for Atom to
> > begin with. I would be very willing to help anyone who wants to
> > rewrite these using the new machine model, but should be done by
> > someone who can adequately test it. We want to drop support for
> > the old itinerary format and postRA scheduler as soon as possible.
> > 
> > To give you an example, the new model would look something like
> > this:
> > 
> > def : WriteRes<WriteShift, [IEC_RSV0]> { let Latency = 2; }
> > 
> > Maybe a new WriteShiftCL type should be added to X86Schedule.td and
> > referenced in X86InstrShiftRotate.td. Then SLM can define it with
> > Latency = 4, and X86SchedSandyBridge.td can have:
> > 
> > def : SchedAlias<WriteShiftCL, WriteShift>;
> > 
> > It's also possible for a subtarget to override specific operations
> > by pattern matching opcodes without complicating the architecture
> > definitions files.
> > 
> > 
> > -Andy
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory