[PATCH] D58728: [MCA] Highlight kernel bottlenecks in the summary view.

Wed Feb 27 10:59:51 PST 2019

andreadb created this revision.
andreadb added reviewers: mattd, RKSimon, courbet.
Herald added subscribers: jdoerfert, gbedwell, javed.absar.

This patch adds a new flag named `-bottleneck-analysis` to print out information about bottlenecks that affected the throughput.

MCA already knows how to identify and classify dynamic dispatch stalls. However, MCA doesn't know how to identify sources of bottlenecks that eventually lead to dispatch stalls.
The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput).

Backend pressure increases because of contention on processor resources.
>From a Scheduler point of view, backend pressure is a function of the number of uOps in the scheduler buffers.
We say that pressure in the Scheduler increases when the number of opcodes issued to the underlying piplelines is less than the number of opcode dispatched to the Scheduler during the same cycle.
Since buffer resources are limited, monotonic increases in pressure would eventually lead to a dispatch stall.

This patch teaches how to identify backend pressure increases caused by:

- unavailability of pipeline resources.
- data dependencies.

Data dependency can delay opcodes and therefore increase their time in the scheduler buffer.
Resource pressure caused by the temporary unavailability of pipeline resources can also delay execution of opcodes, and therefore increase in backend pressure.

Internally, the Scheduler classifies instructions into four sets:

- Dispatched
- Pending
- Ready
- Executing

Instructions are moved to the Ready set if they don't have to wait on data dependencies. Every cycle the scheduler attempts to issue instructions from the Ready set to the underlying pipelines. Otherwise, it reports to the ExecuteStage a "potential" increase backend pressure caused by unavailable pipeline resources.
The ExecuteStage notify a "backend pressure event" only if it sees that pressure in the buffers increased (i.e. the number of opcodes issued was less than the number of opcodes dispatched during that cycle).

The Scheduler reports a "potential" increase in pressure because of data dependencies if the Ready set is empty, and some instructions in the Peding set could have been issued in the absence of data dependencies.
Instructions in the Pending state have the nice property of being dependent only on instructions that have already started execution.

Pressure events are collected over time and notified to the listeners by the ExecuteStage.
The SummaryView observes those events, and generates a "bottleneck report" only if increases in backend pressure eventually caused backend stalls.

Example of bottleneck report:

  Cycles with backend pressure increase [ 99.89% ]
  Throughput Bottlenecks:
    Resource Pressure       [ 0.00% ]
    Data Dependencies:      [ 99.89% ]
     - Register Dependencies [ 0.00% ]
     - Memory Dependencies   [ 99.89% ]

About the time complexity:
The average slowdown tends to be in the range of ~5-6%.

Time complexity is a (linear) function of the number of instructions in the Scheduler::PendingSet. For memory intensive kernels, the slowdown can be significant if flag `-noalias=false` is specified. In the worst case scenario I have observed a slowdown of ~30% when flag `-noalias=false` was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries).

For now, this new analysis is opt-in from llvm-mca.
Users of MCA as a library can enable it by passing a flag to the constructor of ExecuteStage. For simplicity, users of the default pipeline can simply specify a new pipeline option.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
A follow up patch will extend the "scheduler-stats" view to also print out:

- the most problematic register dependencies (top 3)
- the most problematic memory dependencies (top 3)
- instructions mostly affected by bottlenecks caused by pipeline pressures (top 3).

That change plus this patch should fully address PR37494.

Let me know if okay to commit.

-Andrea

https://reviews.llvm.org/D58728

Files:
  docs/CommandGuide/llvm-mca.rst
  include/llvm/MCA/Context.h
  include/llvm/MCA/HWEventListener.h
  include/llvm/MCA/HardwareUnits/Scheduler.h
  include/llvm/MCA/Instruction.h
  include/llvm/MCA/Stages/ExecuteStage.h
  lib/MCA/Context.cpp
  lib/MCA/HardwareUnits/Scheduler.cpp
  lib/MCA/Stages/ExecuteStage.cpp
  test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s
  test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s
  test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s
  tools/llvm-mca/Views/SummaryView.cpp
  tools/llvm-mca/Views/SummaryView.h
  tools/llvm-mca/llvm-mca.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58728.188574.patch
Type: text/x-patch
Size: 35135 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190227/9f532037/attachment.bin>