[clang] [llvm] [AArch64] Add support for Qualcomm Oryon processor (PR #91022)

Fri May 17 10:57:55 PDT 2024

================
@@ -0,0 +1,1664 @@
+//=- AArch64SchedOryon.td - Nuvia Inc Oryon CPU 001 ---*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the scheduling model for Nuvia Inc Oryon
+// family of processors.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Pipeline Description.
+
+def OryonModel : SchedMachineModel {
+  let IssueWidth            =  14; // 14 micro-ops dispatched at a time. IXU=6, LSU=4, VXU=4
+  let MicroOpBufferSize     = 376; // 192 (48x4) entries in micro-op re-order buffer in VXU.
+                                   // 120 ((20+20)x3) entries in micro-op re-order buffer in IXU
+                                   // 64  (16+16)x2 re-order buffer in LSU
+                                   // total 373
+  let LoadLatency           =   4; // 4 cycle Load-to-use from L1D$
+                                   // LSU=5 NEON load
+  let MispredictPenalty     =  13; // 13 cycles for mispredicted branch.
+  // Determined via a mix of micro-arch details and experimentation.
+  let LoopMicroOpBufferSize =   0; // Do not have a LoopMicroOpBuffer
----------------
joelkevinjones wrote:

Thanks for this attention to detail and bringing up this issue. We had a discussion internally about this setting. At one extreme would be to try to specify all details relevant to instruction scheduling. For micro-op buffers, to be completely architecturally accurate, a scalar value would be insufficient. We certainly wouldn't sign up to extend LLVM in this fashion. We decided to take the other extreme here. While not explicitly varying LoopMcroOpBufferSize, our benchmarking of the effects of varying scheduling settings showed little effect. Our team hasn't done any experiments on non-AArch64 targets varying this specific setting recently. I will the submitter decide what to do. We may decide to do some experiments varying this value. If the processor is in order, it would matter more.

https://github.com/llvm/llvm-project/pull/91022