[LLVMdev] [llvm] r190717 - Adds support for Atom Silvermont (SLM) - -march=slm

Thu Sep 26 14:39:21 PDT 2013

Hello Andy,

Thank you for your offer to work together on implementing the your new scheduler on X86. I can start working on this right away.

In case you were unaware, the new Silvermont micro-architecture is only out of order on the integer side. The SSE instructions are still in order, so the current postRA scheduler is very beneficial for code with lots of SSE instructions, such as the ISPC (http://ispc.github.io) (example programs. Hence I would be looking at re-implementing the schedulers for both Atom and Silvermont.

In the meantime, I would appreciate it if the current PostRA scheduler could be kept in place until such time as we can prove that the new scheduler is at least as good as what we have now.

It is also considered around here that it is desirable that the new scheduler be made to run after register allocation is done, as has already been discussed in LLVMDev.

Please let me know what the next step should be.

Preston

-----Original Message-----
From: Andrew Trick [mailto:atrick at apple.com] 
Sent: Friday, September 20, 2013 2:02 AM
To: Gurd, Preston
Cc: Hal Finkel; llvm commits
Subject: Re: [llvm] r190717 - Adds support for Atom Silvermont (SLM) - -march=slm

Hi Preston,

On Sep 17, 2013, at 4:01 PM, Gurd, Preston <preston.gurd at intel.com> wrote:

> Hello Andy,
> 
> Why do you say below that there is "no reason to use itineraries for Atom in the first place"?

Sorry for that hasty comment. I only meant that the old-style itineraries are a bad fit for the Atom machine model. I didn't mean to imply that you were wrong for using it at the time.

The itineraries attempt to model each stage in the processor pipeline. This is massive overkill for someone in your situation who simply wants to specify latency and functional units. You're currently faking latency by forcing instructions to occupy several pipeline stages in the reservation table. That not how itineraries were meant to be used, but it is getting the job done for you (innefficiently in terms of compile time).

> In the PostRA scheduler, is there any way to represent the "throughput" (the number of cycles which must elapse before an instruction of the same type can start) of an instruction?

The new model that works with MachineScheduler (not PostRA) lets you specify throughput in two dimensions, horizontally as a functional unit list, and vertically as a ResourceCycles attribute.

Horizontal:

def : WriteRes<WriteTwoPorts, [Port1, Port2]>;

Vertical:

def : WriteRes<WriteTwoCycles, [Port1]> { let ResourceCycles = [2]; }

> Do you expect that the new machine model will produce a better schedule than the current PostRA scheduler?

Yes. If not, then something is misconfigured or we need some minor adjustments to the model or scheduler itself (and remember there are always cases where the scheduler just gets lucky/unlucky). If you have a badly scheduled loop, file a bug including as much analysis as you can. Try to read the -debug-only=misched output.

For SLM, you may not need the PostRA scheduler at all. I expect the MachineScheduler to be a better fit. You can currently enable the MachineScheduler and it will use your existing (old-style) itineraries (X86SubTarget::enableMachineScheduler() { if (Atom) return true; }). But we really don't want to support that. The proper thing to do for SLM is define an out-of-order machine model. See the SandyBridge/Haswell model.

For original Atom, you might still want a PostRA scheduler. Running the new MachineScheduler a second time as a replacement postRA scheduler is something I intended to do, but haven't had a client. You could enable MachineScheduler and benchmark to determine if PostRA sched is still really needed (-disable-post-ra). If you do still need it, then I'd like to try replacing it with a second MachineScheduler run. We do want to kill off the current PostRA scheduler. It is a maintenance burden that doesn't serve a purpose for targets that have migrated to MachineScheduler.

> 
> Is there any documentation about the new machine model?

As with the old itineraries, I don't have formal docs. Only the attempt at self-documentation in TargetSchedule.td. There are also some BOF slides from last year's LLVM dev meeting.

The best way for me to improve the docs and for someone to migrate their target is to work together in an iterative process. Let me know when you have a chance to work on migration of SLM or Atom. I can provide a sample of what I think the machine model should look like. You can proceed until you hit a difficult or messy case. At that time, I can offer suggestions for handling it. If something is confusing, let me know. I'll try to explain, adding docs in the process.

-Andy

> 
> -----Original Message-----
> From: Andrew Trick [mailto:atrick at apple.com]
> Sent: Monday, September 16, 2013 2:46 AM
> To: Hal Finkel
> Cc: Gurd, Preston; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r190717 - Adds support for Atom Silvermont (SLM) - 
> -march=slm
> 
> 
> On Sep 13, 2013, at 1:26 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
>> ----- Original Message -----
>>> 
>>> 
>>> Just out of curiosity, when you have this:
>>> +  InstrItinData<IIC_SHD16_REG_IM, [InstrStage<2, [IEC_RSV0]>] >,
>>> 
>>> do you intend this to mean that the shift occupied the IEC_RSV0 
>>> unit, and nothing else can use it for 2 cycles? Or you do mean that 
>>> the latency is 2 cycles, but you can still issue back-to-back 
>>> independent shifts?
>>> 
>>> -Hal
>>> 
>>> For the above itinerary, I am trying to represent that this 
>>> instruction must use the IEC reservation station 0 and that it will 
>>> take two cycles to execute. I would like to also be able to 
>>> represent that the throughput of the instruction is 2 cycles, but I 
>>> do not know how to do this.
>> 
>> Okay, that's what I thought. I think you want to say this:
>> InstrItinData<IIC_SHD16_REG_IM, [InstrStage<1, [IEC_RSV0]>], [2, 1, 
>> 1] >,
>> 
>> The implementation is fully pipelined, right? Assuming that it is, you only need to track the first pipeline stage (because there are no hazards later). The numbers at the end say that the output operand will be ready in cycle 2 post-dispatch, and that the input operands are read in the first cycle.
>> 
>> I've cc'd Andy so that he can correct me if I'm wrong ;)
> 
> Right. Although there is no reason to use itineraries for Atom to begin with. I would be very willing to help anyone who wants to rewrite these using the new machine model, but should be done by someone who can adequately test it. We want to drop support for the old itinerary format and postRA scheduler as soon as possible.
> 
> To give you an example, the new model would look something like this:
> 
> def : WriteRes<WriteShift, [IEC_RSV0]> { let Latency = 2; }
> 
> Maybe a new WriteShiftCL type should be added to X86Schedule.td and referenced in X86InstrShiftRotate.td. Then SLM can define it with Latency = 4, and X86SchedSandyBridge.td can have:
> 
> def : SchedAlias<WriteShiftCL, WriteShift>;
> 
> It's also possible for a subtarget to override specific operations by pattern matching opcodes without complicating the architecture definitions files.
> 
> 
> -Andy