[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca
Andrea Di Biagio via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 7 06:33:19 PDT 2019
On Fri, Jun 7, 2019 at 2:30 PM Andrea Di Biagio <andrea.dibiagio at gmail.com>
> In the absence of data dependencies, throughput of a block of code is
> superiorly limited by the dispatch rate (i.e. our DispatchWidth), and the
> availability of hardware resources.
> DispatchWidth is the maximum number of micro opcodes that can be
> dispatched to the out-of-order every cycle. That value inevitably affects
> the block throughput. Example: if a block in input decodes to 4
> micro-opcodes in total, and the processor can only dispatch up to 2 opcodes
> per cycle, then the maximum block throughput cannot exceed 0.5 (i.e. one
> block every two cycles).
> Block throughput is also constrained by the availability of hardware
> Example: if we have 4 ADD micro-opcodes, and each opcode consumes 1cy of
> ALU pipeline, then the block throughput is superiorly limited by N/4, where
> N is the number of ALU pipelines available on the target, and 4 is the
> number of ALU cycles consumed. So, if there is only 1 ALU pipeline, then
> the block throughput is superiorly limited to 1/4 = 0.25 (blocks per cycle)
> Back to the computation of the "Block Throughput".
Sorry, I should have written "Block RThroughput" here.
It is statically computed as the reciprocal of the block throughput. As for
> the normal instruction throughput, the computation doesn't take into
> account operand dependencies. Therefore, we could say that it is computed
> as the MAX of:
> - #MicroOpcodes of a block / DispatchWidth
> - #Consumed resource cycles / #Resources [ for every resource kind ].
> In the absence of loop-carried dependencies between different iterations,
> the observed ‘uOps Per Cycle’ tends to a theoretical maximum throughput
> which can be computed by dividing the total number of uOps of a block by
> the Block RThroughput.
> You can find more information about it in the llvm-mca docs under section
> "How LLVM-MCA works".
> I hope it helps!
> On Fri, Jun 7, 2019 at 12:43 PM Tom Chen <cyt046 at gmail.com> wrote:
>> Hi Andrea,
>> So does this definition make sense for basic blocks with more than one
>> instructions? E.g. how should one interpret a basic block with RThroughput
>> of 2.3?
>> On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <
>> andrea.dibiagio at gmail.com> wrote:
>>> Hi Tom,
>>> Field 'Total Cycles' from the summary view simply reports the elapsed
>>> number of cycles for the entire simulation.
>>> Rthroughput (from the "Instruction Info" view) is the reciprocal of the
>>> instruction throughput.
>>> Throughput is computed as the maximum number of instructions of a same
>>> type that can be executed per clock cycle in the absence of operand
>>> Example (x86 - AMD Jaguar):
>>> ADD EAX, ESI
>>> The integer unit in Jaguar has two ALU pipelines. An ADD instruction can
>>> issue to any of those pipelines. That means, two independent ADD can be
>>> issue during a same cycle. Therefore, throughput is 2 (instructions per
>>> cycle), and RThrougput (1/throughput) is 0.5.
>>> I hope it helps,
>>> On Thu, Jun 6, 2019 at 10:11 PM Tom Chen via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>> What is the difference between the two? I thought "Rthroughput" is
>>>> basically the number of cycles required to execute a single iteration at
>>>> steady state, but this does not seem to match with the schedule/timeline
>>>> generated by llvm-mca.
>>>> Thanks in advance,
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev