[llvm-dev] [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca

Andrea Di Biagio via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 7 06:30:50 PDT 2019


In the absence of data dependencies, throughput of a block of code is
superiorly limited by the dispatch rate (i.e. our DispatchWidth), and the
availability of hardware resources.

DispatchWidth is the maximum number of micro opcodes that can be dispatched
to the out-of-order every cycle. That value inevitably affects the block
throughput. Example: if a block in input decodes to 4 micro-opcodes in
total, and the processor can only dispatch up to 2 opcodes per cycle, then
the maximum block throughput cannot exceed 0.5 (i.e. one block every two
cycles).

Block throughput is also constrained by the availability of hardware
resources.
Example: if we have 4 ADD micro-opcodes, and each opcode consumes 1cy of
ALU pipeline, then the block throughput is superiorly limited by N/4, where
N is the number of ALU pipelines available on the target, and 4 is the
number of ALU cycles consumed. So, if there is only 1 ALU pipeline, then
the block throughput is superiorly limited to 1/4 = 0.25 (blocks per cycle)

Back to the computation of the "Block Throughput".
It is statically computed as the reciprocal of the block throughput. As for
the normal instruction throughput, the computation doesn't take into
account operand dependencies. Therefore, we could say that it is computed
as the MAX of:
 - #MicroOpcodes of a block / DispatchWidth
 - #Consumed resource cycles / #Resources   [ for every resource kind ].

In the absence of loop-carried dependencies between different iterations,
the observed ‘uOps Per Cycle’ tends to a theoretical maximum throughput
which can be computed by dividing the total number of uOps of a block by
the Block RThroughput.

You can find more information about it in the llvm-mca docs under section
"How LLVM-MCA works".

I hope it helps!
-Andrea

On Fri, Jun 7, 2019 at 12:43 PM Tom Chen <cyt046 at gmail.com> wrote:

> Hi Andrea,
> So does this definition make sense for basic blocks with more than one
> instructions? E.g. how should one interpret a basic block with RThroughput
> of 2.3?
>
> On Fri, Jun 7, 2019 at 7:39 AM Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>
>> Hi Tom,
>>
>> Field 'Total Cycles' from the summary view simply reports the elapsed
>> number of cycles for the entire simulation.
>>
>> Rthroughput (from the "Instruction Info" view) is the reciprocal of the
>> instruction throughput.
>> Throughput is computed as the maximum number of instructions of a same
>> type that can be executed per clock cycle in the absence of operand
>> dependencies.
>>
>> Example (x86 - AMD Jaguar):
>>    ADD EAX, ESI
>>
>> The integer unit in Jaguar has two ALU pipelines. An ADD instruction can
>> issue to any of those pipelines. That means, two independent ADD can be
>> issue during a same cycle. Therefore, throughput is 2  (instructions per
>> cycle), and RThrougput (1/throughput) is 0.5.
>>
>> I hope it helps,
>> -Andrea
>>
>> On Thu, Jun 6, 2019 at 10:11 PM Tom Chen via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> What is the difference between the two? I thought "Rthroughput" is
>>> basically the number of cycles required to execute a single iteration at
>>> steady state, but this does not seem to match with the schedule/timeline
>>> generated by llvm-mca.
>>> Thanks in advance,
>>> Tom
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190607/4dad0a0e/attachment.html>


More information about the llvm-dev mailing list