[LLVMdev] extra one cycle of getOperandLatency
Andrew Trick
atrick at apple.com
Tue Dec 31 12:22:39 PST 2013
On Dec 19, 2013, at 10:35 PM, Wei-cheng Wang <cole945 at gmail.com> wrote:
> Hi llvm-dev,
>
> I wonder why there is an extra cycle for getOperandLatency.
> It doesn't seem intuitive.
>
> UseCycle = DefCycle - UseCycle + 1;
>
> When I read the comments in TargetItinerary.td, it said
>
> OperandCycles are optional "cycle counts". They specify the cycle after
> instruction issue the values which correspond to specific operand indices
> are defined or read.
>
> I thought if an instruction reads the operands at the first cycle
> and produces the result at the second cycle. InstrItinData should be written
> in something like this,
>
> InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>
>
> Therefore, for operand latency of iALUr output to iALUr input is latency
> of "1". However, by the implementatoin of getOperandLatency, the latency
> of such definition is latency of "2". That's not what I want.
>
> After some digging around, I found the expression, "DefCycle - UseCycle + 1",
> was first appearing in r79425 committed by David Goodwin, and seems
> OperandCycles
> was initially designed for ARM cortex-a8 (see also r79247 and r79436).
>
> Then I checked "Cortex-A8 Technical Reference Manual - Instruction
> Cycle Timing".
> There are tables for instructions, for example
>
> Data-processing instructions
> Source1 Source2 Result1
> Rn:E2 Rm:E2 Rd:E2
>
> That means Rn and Rm are read at the begin of E2 stage,
> Rd is produced at the end of E2, and there is 1 cycle latency.
>
> And that was implemented in llvm as such
>
> InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,
>
>
>
> Is that mean, OperandCycles and getOperandLatency were simply designed
> in such a way, so it is easier to use the table from cortex-a8 RTM?
> So OperandCycles are not actually referred to "cycle",
> for input operand it means at the begin of what stage
> and for output operand it means at the end of what stage?
>
> If so, is there any other reasons it should be designed this way?
> What not remove the +1 cycle and define the instruction as such?
>
> InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,
I think it’s done this way so that if both def and use cycles are unspecified we get a default of one cycle latency.
At any rate, the itineraries have been around a long time with many out-of-tree targets. I don’t think it’s a good idea to change that old API. New ports should try to use the new machine model instead.
-Andy
>
> Thanks
>
> Wei-cheng Wang
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list