[PATCH] Adds Cortex-A53 and Cortex-A57 subtargets.

Wed Feb 26 15:28:22 PST 2014

On Feb 26, 2014, at 2:58 PM, Dave Estes <cestes at codeaurora.org> wrote:

> On 02/25/2014 05:29 PM, Andrew Trick wrote:
>> First off, in case it isn't clear, the machine model will only be used by the MachineScheduler. To enable the MachineScheduler pass your target must implement TargetSubtargetInfo::enableMachineScheduler.
>> 
>> I can't comment on these processors at all, so I'll just give some general advice.
>> 
> 
> Few more details on AArch64 for the reviewers:
> 
> The MachineScheduler is turned on by default for the AArch64. AArch64Subtarget::enableMachineScheduler() is implemented to return true.
> 
> I noticed the other day that the MachineScheduler can get latency information from legacy itineraries as well as from the machine model. The AArch64 backend currently has not subtargets that implement legacy instruction itineraries, so TargetSchedModel::hasInstrSchedModel() will return false even though the scheditins command line option is defaulted to true. On the other hand, TargetSchedModel::hasInstrSchedModel() returns true only for the Cortex-A53 subtarget since that's the only machine model defined for the AArch64 backend so far.
> 
> This all means that the MachineScheduler pass runs for all AArch64 compilations, but that it works valiantly without latency information most of the time until someone specifies -mcpu=cortex-a53.

Right, the way to change that, if you wanted to, is check for the subtarget type inside enableMachineScheduler.

>> The default is MicroOpBufferSize=0, which is suitable for in-order execution in that it prioritizes latency hiding. Setting it to 1 makes it less strict by allowing latency to be balanced with other heuristics:
>> 
>> def CortexA53Model {
>> ...
>>   let MicroOpBufferSize = 1
>> }
> 
> Sweet, I didn't realize that. I've left it at the default, but I'll definitely run some experiments with MicroOpBufferSize=1 once the model is more fully implemented. BTW, is the default 0 or -1? I saw -1 in TargetSchedule.td.

The options are confusingly documented in two places. The actual defaults come from MCSchedule.h. Using “-1” in the .td file just says that the target does not override the default. I tried to comment this stuff. More comments are welcome.

For anyone porting the machine model I would say, do not assume things are working as you expect when you gather performance data. Please verify with a scheduler trace on some test cases first.

>> To model in-order resource conflicts, if you care about that, you actually need to explicitly set BufferSize=0 in the processor resource def:
>> 
>> def A53UnitXX  : ProcResource<1> { let BufferSize = 0; }
>> 
>> That prevents multiple instructions sharing the same resource from being scheduled in the same cycle.
> 
> Thanks for this. It prompted me to re-evaluate my understanding of BufferSize...for half of the day. :)
> 
> I mistakenly thought that I could define ProcResources that are "pipelined", which is why for this simple processor, I'm modeling each pipeline as a resource. I had a disconnect between Latency and ResourceCycles. My plan now us to make each of these ProcResources unbuffered (BufferSize=0) and to set ResourceCycles to 1 (if needed). For instructions that cause hazards in their pipelines, I'll set a ResourceCycles value that matches the length of the hazard.

ResourceCycles=1 should be the default.

I only recently added support for counting stalls based on resource conflicts and it is not well tested. Hal Finkel was going to use this for PowerPC. Until then, in-order targets were all using itineraries.

-Andy

> 
> Thanks, Andy.
> -Dave
> 
> -- 
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation