[llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Fri Oct 22 09:17:51 PDT 2021

Hey Peter, 

Thanks for the reply, I was able to flesh out most of the R5 model with the information you had provided. 

However, I had a question about the R5 TRM regarding the meaning of "Issue Cycles". The description of Issue Cycles says "the minimum number of cycles required to issue an instruction". Do issue cycles indicate that no other instructions will be issued for that amount of time?  

For example, here's an entry from the timings chapter: 

| Instruction                                    | Cycles | Early Regs     | Result Latency | 
| VDIV.F64 <Dd>, <Dn>, <Dm>    | 3          | <Dn>, <Dm> | 63                       |

And let's say I have the following sequence:

VDIV r1, r2, r3
ADD r4, r5, r6

Since there's no data dependence, these instructions should be issued right after one another. However, since VDIV has 3 "issue cycles", if VDIV is issued on cycle 0, does that mean ADD is issued on cycle 3? Or, are they both issued respectively on cycle 1 and 2, and "issue cycles" indicate something else?

(I am assuming that some aspects of the superscalar behavior come into play here, but I'm not sure how)

Thanks again!
Benson

-----Original Message-----
From: Peter Smith <Peter.Smith at arm.com> 
Sent: Thursday, October 14, 2021 11:24 AM
To: Chu, Benson <b-chu1 at ti.com>; Phipps, Alan <a-phipps at ti.com>; llvm-dev at lists.llvm.org
Subject: [EXTERNAL] RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Hello,

I know just about enough to find the right file to describe the scheduling model, but I don't know much about the details myself. I'm hoping that one of my colleagues or someone knowing about scheduling in general can help/correct what I'm writing below.

>From what I glean from:
https://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf
https://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf

The basics of superscalar modelling are the IssueWidth

def CortexR52Model : SchedMachineModel {
  let MicroOpBufferSize = 0;  // R52 is in-order processor
  let IssueWidth = 2;         // 2 micro-ops dispatched per cycle
  let LoadLatency = 1;        // Optimistic, assuming no misses
  let MispredictPenalty = 8;  // A branch direction mispredict, including PFU
  let CompleteModel = 0;      // Covers instructions applicable to cortex-r52.
}

I would expect the forwarding information to be useful as to dual issue certain pairs the dependencies would need to be available.

// Forwarding information - based on when an operand is read def : ReadAdvance<R52Read_ISS, 0>; def : ReadAdvance<R52Read_EX1, 1>; def : ReadAdvance<R52Read_EX2, 2>; def : ReadAdvance<R52Read_F0, 0>; def : ReadAdvance<R52Read_F1, 1>; def : ReadAdvance<R52Read_F2, 2>;

>From https://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf assuming it still holds (5 years ago) LLVM Scheduler -What's missing?
* Instructions with slot constraints
** Cannot issue in second slot - specification and pickNode changes
** Cannot issue with any other - micro-ops
** Cannot issue with specific another - reliance on resource constraint (not adequate)
* Inter-lock constraint modelling
** Cannot slow down previous instruction
* First-half, second-half and in-stage forwarding
** Further divide pipeline stages
* Variadic instructions
** SchedPredicate, SchedVariant - an alternate compact representation necessary

It may be that more complex superscalar constraints cannot be modelled.

Hope that helps

Peter

> -----Original Message-----
> From: Chu, Benson <b-chu1 at ti.com>
> Sent: 14 October 2021 16:17
> To: Peter Smith <Peter.Smith at arm.com>; Phipps, Alan <a-phipps at ti.com>; 
> llvm-dev at lists.llvm.org
> Subject: RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td
> 
> Hey Peter,
> 
> I've begun looking into adapting the model for the R52 into a model 
> for the R5.
> 
> Tweaking the instruction timings and removing V8-r specific stuff has 
> been mostly straightforward, and I'm seeing about a 3% improvement in 
> benchmarks like coremark.
> 
> However, the R5 rules on which instructions can be dual issued are 
> different from the R52, and I don't see how the superscalar behavior 
> is modeled in the existing R52 schedule.
> 
> Would you happen to know what part of the R52 tablegen file is for 
> modeling the superscalar behavior?
> 
> Thanks,
> Benson
> 
> -----Original Message-----
> From: Peter Smith <Peter.Smith at arm.com>
> Sent: Wednesday, September 23, 2020 11:55 AM
> To: Phipps, Alan <a-phipps at ti.com>; llvm-dev at lists.llvm.org
> Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in 
> ARM.td
> 
> Hello Alan,
> 
> Looking at the public information for Cortex-R5
> (https://developer.arm.com/ip-products/processors/cortex-r/cortex-r5) 
> and
> Cortex-R52  (https://developer.arm.com/ip-products/processors/cortex-
> r/cortex-r52) shows that both are in-order with similar length 
> pipelines. It is possible that the Cortex-R52 scheduling model may 
> match the Cortex-R5 more closely than the choices available at the 
> time that Cortex-R5 was upstreamed.
> 
> I haven't written a schedule model myself. My understanding of the 
> process is that the technical reference manual or any other publicly 
> available information about the micro-architecure  is used to provide 
> initial values for the model. Then it is a matter of refinement 
> against as many benchmarks as you can run.
> 
> I think if empirically the Cortex-R52 model is producing better 
> results than the Cortex-A8 then it could be possible to adapt the 
> model for the Cortex-R5 by removing the parts specific to V8-R and 
> tweaking parameters based on cycle times from the technical reference 
> manual (TRM). I'm sure we could find someone to review a patch if 
> there is good enough set of benchmarks showing that a model is better than the Cortex-A8.
> 
> The technical reference manual for the Cortex-R5:
> https://developer.arm.com/documentation/ddi0460/c/
> 
> Peter
> 
> ________________________________________
> From: Phipps, Alan <a-phipps at ti.com>
> Sent: 23 September 2020 17:24
> To: Peter Smith; llvm-dev at lists.llvm.org
> Subject: RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td
> 
> Thanks, Peter, for your response.  Right -- certainly not incorrect in 
> the sense of generating an incorrect schedule, but definitely seems suboptimal.
> 
> I've also noticed that if I experimentally base the v7-r model on the 
> Cortex-
> R52 ProcessModel (or even build for Cortex-R52), I achieve a better 
> schedule than if it were based on cortex-a8, and I see 2%-3% 
> performance improvement on benchmarks like Coremark running on cortex-r5 hardware.
> Do you know why that might be the case?  Can you suggest other, more 
> straightforward ways one might improve performance scheduling for 
> cortex-
> r5 if there aren't any plans to develop a custom model for v7-r?
> 
> Thanks for your help,
> 
> -Alan
> 
> -----Original Message-----
> From: Peter Smith [mailto:Peter.Smith at arm.com]
> Sent: Wednesday, September 23, 2020 11:06 AM
> To: llvm-dev at lists.llvm.org; Phipps, Alan
> Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in 
> ARM.td
> 
> Hello Alan,
> 
> Using a cortex-a8 scheduling model for v7-r CPUs may not be optimal 
> but I wouldn't go as far as to call it incorrect. The cortex-r4, 
> cortex-r4f and cortex-
> r5 are in-order cores like cortex-a8 (another in-order core) is the 
> closest match. We don't have any current plans to develop a custom 
> scheduling model for r4, r4f or r5.
> 
> Peter
> 
> ________________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Phipps, 
> Alan via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 23 September 2020 15:27
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in 
> ARM.td
> 
> In ARM.td, I see that the ProcessorModel for cortex-r4, cortex-r4f, 
> and
> cortex-r5 (as well as r7 and r8) is based on "CortexA8Model", which 
> seems incorrect.  When this was added in 2015, there were also 
> comments associated with this configuration, such as "// FIXME: R5 has 
> currently the same ProcessorModel as A8" (later removed).  The 
> processor model for
> Cortex-r52 appears to be correct and corresponds to an associated 
> "CortexR52Model".
> 
> Does anyone know why r4/r4f/r5 were setup based on "CortexA8Model".
> 
> Is there a plan to upstream a fix to correct this?
> 
> Thanks!
> 
> Alan Phipps