[llvm-dev] A question about AArch64 Cortex-A57 subtarget definition

Xing Su via llvm-dev llvm-dev at lists.llvm.org
Thu May 12 17:13:04 PDT 2016

Hello everybody,

I'm reading the .td files defining the Cortex-A57 processor,

which is a subtarget of AArch64 target, and there is something

confusing me in the `AArch64SchedA57.td` file.

In the top of `AArch64SchedA57.td`, various processor resource are

defined, as follows


def A57UnitB : ProcResource<1>;  // Type B micro-ops

def A57UnitI : ProcResource<2>;  // Type I micro-ops

def A57UnitM : ProcResource<1>;  // Type M micro-ops

def A57UnitL : ProcResource<1>;  // Type L micro-ops

def A57UnitS : ProcResource<1>;  // Type S micro-ops

def A57UnitX : ProcResource<1>;  // Type X micro-ops

def A57UnitW : ProcResource<1>;  // Type W micro-ops

let SchedModel = CortexA57Model in {

  def A57UnitV : ProcResGroup<[A57UnitX, A57UnitW]>;    // Type V micro-ops



According the Cortex-A57 software optimization manual, Cortex-A57 has 8

function units in the backend,

- Branch(B)

- Integer 0(I0)

- Integer 1(I1)

- Integer Muti-Cycle(M)

- Load(L)

- Store(S)

- FP/ASIMD 0(F0)

- FP/ASIMD 1(F1)

So I think `A57UnitW` and `A57UnitX` should be the TableGen records

defining pipeline F0 and F1, respectively. So `A57UnitW` and `A57UnitX`

together compose a `ProcResGroup`, `A57UnitV`,

which can execute a 128bit ASIMD floating point operation,

such as FMLA(Q-form), in a single clock cycle.


But in line 479-483 of `AArch64SchedA57.td`, as shown below


def A57WriteFPVMAD : SchedWriteRes<[A57UnitV]> { let Latency = 9;  }

def A57WriteFPVMAQ : SchedWriteRes<[A57UnitV, A57UnitV]> { let Latency = 10;  }

def A57ReadFPVMA5  : SchedReadAdvance<5, [A57WriteFPVMAD, A57WriteFPVMAQ]>;

def : InstRW<[A57WriteFPVMAD, A57ReadFPVMA5], (instregex "^FML[AS](v2f32|v1i32|v2i32|v1i64)")>;

def : InstRW<[A57WriteFPVMAQ, A57ReadFPVMA5], (instregex "^FML[AS](v4f32|v2f64|v4i32|v2i64)")>;


In this code, an 128bit ASIMD FP multiply accumulate(FMLA/FMLS Q-form) requires

two `A57UnitV`s, meaning that two clock cycles are needed.


There must be something wrong with my understanding, anyone could help me

figure out the problem? thanks a lot!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160513/6d513e01/attachment-0001.html>

More information about the llvm-dev mailing list