[llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Thu Oct 14 09:23:35 PDT 2021

Hello,

I know just about enough to find the right file to describe the scheduling model, but I don't know much about the details myself. I'm hoping that one of my colleagues or someone knowing about scheduling in general can help/correct what I'm writing below.

>From what I glean from:
https://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf
https://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf

The basics of superscalar modelling are the IssueWidth

def CortexR52Model : SchedMachineModel {
  let MicroOpBufferSize = 0;  // R52 is in-order processor
  let IssueWidth = 2;         // 2 micro-ops dispatched per cycle
  let LoadLatency = 1;        // Optimistic, assuming no misses
  let MispredictPenalty = 8;  // A branch direction mispredict, including PFU
  let CompleteModel = 0;      // Covers instructions applicable to cortex-r52.
}

I would expect the forwarding information to be useful as to dual issue certain pairs the dependencies would need to be available.

// Forwarding information - based on when an operand is read
def : ReadAdvance<R52Read_ISS, 0>;
def : ReadAdvance<R52Read_EX1, 1>;
def : ReadAdvance<R52Read_EX2, 2>;
def : ReadAdvance<R52Read_F0, 0>;
def : ReadAdvance<R52Read_F1, 1>;
def : ReadAdvance<R52Read_F2, 2>;

>From https://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf assuming it still holds (5 years ago)
LLVM Scheduler -What's missing?
* Instructions with slot constraints
** Cannot issue in second slot - specification and pickNode changes
** Cannot issue with any other - micro-ops
** Cannot issue with specific another - reliance on resource constraint (not adequate)
* Inter-lock constraint modelling
** Cannot slow down previous instruction
* First-half, second-half and in-stage forwarding
** Further divide pipeline stages
* Variadic instructions
** SchedPredicate, SchedVariant - an alternate compact representation necessary

It may be that more complex superscalar constraints cannot be modelled.

Hope that helps

Peter

> -----Original Message-----
> From: Chu, Benson <b-chu1 at ti.com>
> Sent: 14 October 2021 16:17
> To: Peter Smith <Peter.Smith at arm.com>; Phipps, Alan <a-phipps at ti.com>;
> llvm-dev at lists.llvm.org
> Subject: RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td
> 
> Hey Peter,
> 
> I've begun looking into adapting the model for the R52 into a model for the
> R5.
> 
> Tweaking the instruction timings and removing V8-r specific stuff has been
> mostly straightforward, and I'm seeing about a 3% improvement in
> benchmarks like coremark.
> 
> However, the R5 rules on which instructions can be dual issued are different
> from the R52, and I don't see how the superscalar behavior is modeled in the
> existing R52 schedule.
> 
> Would you happen to know what part of the R52 tablegen file is for modeling
> the superscalar behavior?
> 
> Thanks,
> Benson
> 
> -----Original Message-----
> From: Peter Smith <Peter.Smith at arm.com>
> Sent: Wednesday, September 23, 2020 11:55 AM
> To: Phipps, Alan <a-phipps at ti.com>; llvm-dev at lists.llvm.org
> Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in
> ARM.td
> 
> Hello Alan,
> 
> Looking at the public information for Cortex-R5
> (https://developer.arm.com/ip-products/processors/cortex-r/cortex-r5) and
> Cortex-R52  (https://developer.arm.com/ip-products/processors/cortex-
> r/cortex-r52) shows that both are in-order with similar length pipelines. It is
> possible that the Cortex-R52 scheduling model may match the Cortex-R5
> more closely than the choices available at the time that Cortex-R5 was
> upstreamed.
> 
> I haven't written a schedule model myself. My understanding of the process
> is that the technical reference manual or any other publicly available
> information about the micro-architecure  is used to provide initial values for
> the model. Then it is a matter of refinement against as many benchmarks as
> you can run.
> 
> I think if empirically the Cortex-R52 model is producing better results than
> the Cortex-A8 then it could be possible to adapt the model for the Cortex-R5
> by removing the parts specific to V8-R and tweaking parameters based on
> cycle times from the technical reference manual (TRM). I'm sure we could
> find someone to review a patch if there is good enough set of benchmarks
> showing that a model is better than the Cortex-A8.
> 
> The technical reference manual for the Cortex-R5:
> https://developer.arm.com/documentation/ddi0460/c/
> 
> Peter
> 
> ________________________________________
> From: Phipps, Alan <a-phipps at ti.com>
> Sent: 23 September 2020 17:24
> To: Peter Smith; llvm-dev at lists.llvm.org
> Subject: RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td
> 
> Thanks, Peter, for your response.  Right -- certainly not incorrect in the sense
> of generating an incorrect schedule, but definitely seems suboptimal.
> 
> I've also noticed that if I experimentally base the v7-r model on the Cortex-
> R52 ProcessModel (or even build for Cortex-R52), I achieve a better schedule
> than if it were based on cortex-a8, and I see 2%-3% performance
> improvement on benchmarks like Coremark running on cortex-r5 hardware.
> Do you know why that might be the case?  Can you suggest other, more
> straightforward ways one might improve performance scheduling for cortex-
> r5 if there aren't any plans to develop a custom model for v7-r?
> 
> Thanks for your help,
> 
> -Alan
> 
> -----Original Message-----
> From: Peter Smith [mailto:Peter.Smith at arm.com]
> Sent: Wednesday, September 23, 2020 11:06 AM
> To: llvm-dev at lists.llvm.org; Phipps, Alan
> Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in
> ARM.td
> 
> Hello Alan,
> 
> Using a cortex-a8 scheduling model for v7-r CPUs may not be optimal but I
> wouldn't go as far as to call it incorrect. The cortex-r4, cortex-r4f and cortex-
> r5 are in-order cores like cortex-a8 (another in-order core) is the closest
> match. We don't have any current plans to develop a custom scheduling
> model for r4, r4f or r5.
> 
> Peter
> 
> ________________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Phipps, Alan
> via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 23 September 2020 15:27
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td
> 
> In ARM.td, I see that the ProcessorModel for cortex-r4, cortex-r4f, and
> cortex-r5 (as well as r7 and r8) is based on "CortexA8Model", which seems
> incorrect.  When this was added in 2015, there were also comments
> associated with this configuration, such as "// FIXME: R5 has currently the
> same ProcessorModel as A8" (later removed).  The processor model for
> Cortex-r52 appears to be correct and corresponds to an associated
> "CortexR52Model".
> 
> Does anyone know why r4/r4f/r5 were setup based on "CortexA8Model".
> 
> Is there a plan to upstream a fix to correct this?
> 
> Thanks!
> 
> Alan Phipps