[llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Thu Oct 14 08:16:46 PDT 2021

Hey Peter, 

I've begun looking into adapting the model for the R52 into a model for the R5. 

Tweaking the instruction timings and removing V8-r specific stuff has been mostly straightforward, and I'm seeing about a 3% improvement in benchmarks like coremark.

However, the R5 rules on which instructions can be dual issued are different from the R52, and I don't see how the superscalar behavior is modeled in the existing R52 schedule. 

Would you happen to know what part of the R52 tablegen file is for modeling the superscalar behavior? 

Thanks,
Benson

-----Original Message-----
From: Peter Smith <Peter.Smith at arm.com> 
Sent: Wednesday, September 23, 2020 11:55 AM
To: Phipps, Alan <a-phipps at ti.com>; llvm-dev at lists.llvm.org
Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Hello Alan,

Looking at the public information for Cortex-R5 (https://developer.arm.com/ip-products/processors/cortex-r/cortex-r5) and Cortex-R52  (https://developer.arm.com/ip-products/processors/cortex-r/cortex-r52) shows that both are in-order with similar length pipelines. It is possible that the Cortex-R52 scheduling model may match the Cortex-R5 more closely than the choices available at the time that Cortex-R5 was upstreamed.

I haven't written a schedule model myself. My understanding of the process is that the technical reference manual or any other publicly available information about the micro-architecure  is used to provide initial values for the model. Then it is a matter of refinement against as many benchmarks as you can run.

I think if empirically the Cortex-R52 model is producing better results than the Cortex-A8 then it could be possible to adapt the model for the Cortex-R5 by removing the parts specific to V8-R and tweaking parameters based on cycle times from the technical reference manual (TRM). I'm sure we could find someone to review a patch if there is good enough set of benchmarks showing that a model is better than the Cortex-A8.

The technical reference manual for the Cortex-R5:  https://developer.arm.com/documentation/ddi0460/c/

Peter

________________________________________
From: Phipps, Alan <a-phipps at ti.com>
Sent: 23 September 2020 17:24
To: Peter Smith; llvm-dev at lists.llvm.org
Subject: RE: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Thanks, Peter, for your response.  Right -- certainly not incorrect in the sense of generating an incorrect schedule, but definitely seems suboptimal.

I've also noticed that if I experimentally base the v7-r model on the Cortex-R52 ProcessModel (or even build for Cortex-R52), I achieve a better schedule than if it were based on cortex-a8, and I see 2%-3% performance improvement on benchmarks like Coremark running on cortex-r5 hardware.  Do you know why that might be the case?  Can you suggest other, more straightforward ways one might improve performance scheduling for cortex-r5 if there aren't any plans to develop a custom model for v7-r?

Thanks for your help,

-Alan

-----Original Message-----
From: Peter Smith [mailto:Peter.Smith at arm.com]
Sent: Wednesday, September 23, 2020 11:06 AM
To: llvm-dev at lists.llvm.org; Phipps, Alan
Subject: [EXTERNAL] Re: Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

Hello Alan,

Using a cortex-a8 scheduling model for v7-r CPUs may not be optimal but I wouldn't go as far as to call it incorrect. The cortex-r4, cortex-r4f and cortex-r5 are in-order cores like cortex-a8 (another in-order core) is the closest match. We don't have any current plans to develop a custom scheduling model for r4, r4f or r5.

Peter

________________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Phipps, Alan via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 23 September 2020 15:27
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] Incorrect Cortex-R4/R4F/R5 ProcessorModel in ARM.td

In ARM.td, I see that the ProcessorModel for cortex-r4, cortex-r4f, and cortex-r5 (as well as r7 and r8) is based on "CortexA8Model", which seems incorrect.  When this was added in 2015, there were also comments associated with this configuration, such as "// FIXME: R5 has currently the same ProcessorModel as A8" (later removed).  The processor model for Cortex-r52 appears to be correct and corresponds to an associated "CortexR52Model".

Does anyone know why r4/r4f/r5 were setup based on "CortexA8Model".

Is there a plan to upstream a fix to correct this?

Thanks!

Alan Phipps