[LLVMdev] extra one cycle of getOperandLatency

Tue Dec 31 12:22:39 PST 2013

On Dec 19, 2013, at 10:35 PM, Wei-cheng Wang <cole945 at gmail.com> wrote:

> Hi llvm-dev,
> 
> I wonder why there is an extra cycle for getOperandLatency.
> It doesn't seem intuitive.
> 
>  UseCycle = DefCycle - UseCycle + 1;
> 
> When I read the comments in TargetItinerary.td, it said
> 
>  OperandCycles are optional "cycle counts". They specify the cycle after
>  instruction issue the values which correspond to specific operand indices
>  are defined or read.
> 
> I thought if an instruction reads the operands at the first cycle
> and produces the result at the second cycle.  InstrItinData should be written
> in something like this,
> 
>   InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>
> 
> Therefore, for operand latency of iALUr output to iALUr input is latency
> of "1".  However, by the implementatoin of getOperandLatency, the latency
> of such definition is latency of "2".  That's not what I want.
> 
> After some digging around, I found the expression, "DefCycle - UseCycle + 1",
> was first appearing in r79425 committed by David Goodwin, and seems
> OperandCycles
> was initially designed for ARM cortex-a8 (see also r79247 and r79436).
> 
> Then I checked "Cortex-A8 Technical Reference Manual - Instruction
> Cycle Timing".
> There are tables for instructions, for example
> 
>   Data-processing instructions
>   Source1    Source2    Result1
>   Rn:E2      Rm:E2      Rd:E2
> 
> That means Rn and Rm are read at the begin of E2 stage,
> Rd is produced at the end of E2, and there is 1 cycle latency.
> 
> And that was implemented in llvm as such
> 
>  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,
> 
> 
> 
> Is that mean, OperandCycles and getOperandLatency were simply designed
> in such a way, so it is easier to use the table from cortex-a8 RTM?
> So OperandCycles are not actually referred to "cycle",
> for input operand it means at the begin of what stage
> and for output operand it means at the end of what stage?
> 
> If so, is there any other reasons it should be designed this way?
> What not remove the +1 cycle and define the instruction as such?
> 
>  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,

I think it’s done this way so that if both def and use cycles are unspecified we get a default of one cycle latency.

At any rate, the itineraries have been around a long time with many out-of-tree targets. I don’t think it’s a good idea to change that old API. New ports should try to use the new machine model instead. 

-Andy

> 
> Thanks
> 
> Wei-cheng Wang
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev