[llvm] r355308 - [MCA] Highlight kernel bottlenecks in the summary view.

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 4 03:52:35 PST 2019


Author: adibiagio
Date: Mon Mar  4 03:52:34 2019
New Revision: 355308

URL: http://llvm.org/viewvc/llvm-project?rev=355308&view=rev
Log:
[MCA] Highlight kernel bottlenecks in the summary view.

This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).

>From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle.  Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.

This patch teaches how to identify situations where backend pressure increases
due to:
 - unavailable pipeline resources.
 - data dependencies.

Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".

Example of bottleneck report:

```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]
```

A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.

About the time complexity:

Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.

The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.

We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494

Differential Revision: https://reviews.llvm.org/D58728

Added:
    llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s
    llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s
    llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s
Modified:
    llvm/trunk/docs/CommandGuide/llvm-mca.rst
    llvm/trunk/include/llvm/MCA/Context.h
    llvm/trunk/include/llvm/MCA/HWEventListener.h
    llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h
    llvm/trunk/include/llvm/MCA/Instruction.h
    llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h
    llvm/trunk/lib/MCA/Context.cpp
    llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp
    llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp
    llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp
    llvm/trunk/tools/llvm-mca/Views/SummaryView.h
    llvm/trunk/tools/llvm-mca/llvm-mca.cpp

Modified: llvm/trunk/docs/CommandGuide/llvm-mca.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CommandGuide/llvm-mca.rst?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/docs/CommandGuide/llvm-mca.rst (original)
+++ llvm/trunk/docs/CommandGuide/llvm-mca.rst Mon Mar  4 03:52:34 2019
@@ -169,6 +169,12 @@ option specifies "``-``", then the outpu
   the theoretical uniform distribution of resource pressure for every
   instruction in sequence.
 
+.. option:: -bottleneck-analysis
+
+  Print information about bottlenecks that affect the throughput. This analysis
+  can be expensive, and it is disabled by default.  Bottlenecks are highlighted
+  in the summary view.
+
 
 EXIT STATUS
 -----------

Modified: llvm/trunk/include/llvm/MCA/Context.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MCA/Context.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MCA/Context.h (original)
+++ llvm/trunk/include/llvm/MCA/Context.h Mon Mar  4 03:52:34 2019
@@ -32,14 +32,16 @@ namespace mca {
 /// the pre-built "default" out-of-order pipeline.
 struct PipelineOptions {
   PipelineOptions(unsigned DW, unsigned RFS, unsigned LQS, unsigned SQS,
-                  bool NoAlias)
+                  bool NoAlias, bool ShouldEnableBottleneckAnalysis = false)
       : DispatchWidth(DW), RegisterFileSize(RFS), LoadQueueSize(LQS),
-        StoreQueueSize(SQS), AssumeNoAlias(NoAlias) {}
+        StoreQueueSize(SQS), AssumeNoAlias(NoAlias),
+        EnableBottleneckAnalysis(ShouldEnableBottleneckAnalysis) {}
   unsigned DispatchWidth;
   unsigned RegisterFileSize;
   unsigned LoadQueueSize;
   unsigned StoreQueueSize;
   bool AssumeNoAlias;
+  bool EnableBottleneckAnalysis;
 };
 
 class Context {

Modified: llvm/trunk/include/llvm/MCA/HWEventListener.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MCA/HWEventListener.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MCA/HWEventListener.h (original)
+++ llvm/trunk/include/llvm/MCA/HWEventListener.h Mon Mar  4 03:52:34 2019
@@ -125,6 +125,35 @@ public:
   const InstRef &IR;
 };
 
+// A HWPressureEvent describes an increase in backend pressure caused by
+// the presence of data dependencies or unavailability of pipeline resources.
+class HWPressureEvent {
+public:
+  enum GenericReason {
+    INVALID = 0,
+    // Scheduler was unable to issue all the ready instructions because some
+    // pipeline resources were unavailable.
+    RESOURCES,
+    // Instructions could not be issued because of register data dependencies.
+    REGISTER_DEPS,
+    // Instructions could not be issued because of memory dependencies.
+    MEMORY_DEPS
+  };
+
+  HWPressureEvent(GenericReason reason, ArrayRef<InstRef> Insts,
+                  uint64_t Mask = 0)
+      : Reason(reason), AffectedInstructions(Insts), ResourceMask(Mask) {}
+
+  // Reason for this increase in backend pressure.
+  GenericReason Reason;
+
+  // Instructions affected (i.e. delayed) by this increase in backend pressure.
+  ArrayRef<InstRef> AffectedInstructions;
+
+  // A mask of unavailable processor resources.
+  const uint64_t ResourceMask;
+};
+
 class HWEventListener {
 public:
   // Generic events generated by the pipeline.
@@ -133,6 +162,7 @@ public:
 
   virtual void onEvent(const HWInstructionEvent &Event) {}
   virtual void onEvent(const HWStallEvent &Event) {}
+  virtual void onEvent(const HWPressureEvent &Event) {}
 
   using ResourceRef = std::pair<uint64_t, uint64_t>;
   virtual void onResourceAvailable(const ResourceRef &RRef) {}

Modified: llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h (original)
+++ llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h Mon Mar  4 03:52:34 2019
@@ -180,8 +180,8 @@ public:
   /// Check if the instruction in 'IR' can be dispatched during this cycle.
   /// Return SC_AVAILABLE if both scheduler and LS resources are available.
   ///
-  /// This method internally sets field HadTokenStall based on the Scheduler
-  /// Status value.
+  /// This method is also responsible for setting field HadTokenStall if
+  /// IR cannot be dispatched to the Scheduler due to unavailable resources.
   Status isAvailable(const InstRef &IR);
 
   /// Reserves buffer and LSUnit queue resources that are necessary to issue
@@ -225,16 +225,25 @@ public:
   /// resources are not available.
   InstRef select();
 
-  /// Returns a mask of busy resources. Each bit of the mask identifies a unique
-  /// processor resource unit. In the absence of bottlenecks caused by resource
-  /// pressure, the mask value returned by this method is always zero.
-  uint64_t getBusyResourceUnits() const { return BusyResourceUnits; }
   bool arePipelinesFullyUsed() const {
     return !Resources->getAvailableProcResUnits();
   }
   bool isReadySetEmpty() const { return ReadySet.empty(); }
   bool isWaitSetEmpty() const { return WaitSet.empty(); }
 
+  /// This method is called by the ExecuteStage at the end of each cycle to
+  /// identify bottlenecks caused by data dependencies. Vector RegDeps is
+  /// populated by instructions that were not issued because of unsolved
+  /// register dependencies.  Vector MemDeps is populated by instructions that
+  /// were not issued because of unsolved memory dependencies.
+  void analyzeDataDependencies(SmallVectorImpl<InstRef> &RegDeps,
+                               SmallVectorImpl<InstRef> &MemDeps);
+
+  /// Returns a mask of busy resources, and populates vector Insts with
+  /// instructions that could not be issued to the underlying pipelines because
+  /// not all pipeline resources were available.
+  uint64_t analyzeResourcePressure(SmallVectorImpl<InstRef> &Insts);
+
   // Returns true if the dispatch logic couldn't dispatch a full group due to
   // unavailable scheduler and/or LS resources.
   bool hadTokenStall() const { return HadTokenStall; }

Modified: llvm/trunk/include/llvm/MCA/Instruction.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MCA/Instruction.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MCA/Instruction.h (original)
+++ llvm/trunk/include/llvm/MCA/Instruction.h Mon Mar  4 03:52:34 2019
@@ -448,7 +448,13 @@ class Instruction : public InstructionBa
   // Retire Unit token ID for this instruction.
   unsigned RCUTokenID;
 
+  // A bitmask of busy processor resource units.
+  // This field is set to zero only if execution is not delayed during this
+  // cycle because of unavailable pipeline resources.
   uint64_t CriticalResourceMask;
+
+  // An instruction identifier. This field is only set if execution is delayed
+  // by a memory dependency.
   unsigned CriticalMemDep;
 
 public:
@@ -499,12 +505,12 @@ public:
     Stage = IS_RETIRED;
   }
 
-  void updateCriticalResourceMask(uint64_t BusyResourceUnits) {
-    CriticalResourceMask |= BusyResourceUnits;
-  }
   uint64_t getCriticalResourceMask() const { return CriticalResourceMask; }
-  void setCriticalMemDep(unsigned IID) { CriticalMemDep = IID; }
   unsigned getCriticalMemDep() const { return CriticalMemDep; }
+  void setCriticalResourceMask(uint64_t ResourceMask) {
+    CriticalResourceMask = ResourceMask;
+  }
+  void setCriticalMemDep(unsigned IID) { CriticalMemDep = IID; }
 
   void cycleEvent();
 };

Modified: llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h (original)
+++ llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h Mon Mar  4 03:52:34 2019
@@ -28,6 +28,12 @@ namespace mca {
 class ExecuteStage final : public Stage {
   Scheduler &HWS;
 
+  unsigned NumDispatchedOpcodes;
+  unsigned NumIssuedOpcodes;
+
+  // True if this stage should notify listeners of HWPressureEvents.
+  bool EnablePressureEvents;
+
   Error issueInstruction(InstRef &IR);
 
   // Called at the beginning of each cycle to issue already dispatched
@@ -41,7 +47,10 @@ class ExecuteStage final : public Stage
   ExecuteStage &operator=(const ExecuteStage &Other) = delete;
 
 public:
-  ExecuteStage(Scheduler &S) : Stage(), HWS(S) {}
+  ExecuteStage(Scheduler &S) : ExecuteStage(S, false) {}
+  ExecuteStage(Scheduler &S, bool ShouldPerformBottleneckAnalysis)
+      : Stage(), HWS(S), NumDispatchedOpcodes(0), NumIssuedOpcodes(0),
+        EnablePressureEvents(ShouldPerformBottleneckAnalysis) {}
 
   // This stage works under the assumption that the Pipeline will eventually
   // execute a retire stage. We don't need to check if pipelines and/or
@@ -60,6 +69,7 @@ public:
   // Instructions that transitioned to the 'Executed' state are automatically
   // moved to the next stage (i.e. RetireStage).
   Error cycleStart() override;
+  Error cycleEnd() override;
   Error execute(InstRef &IR) override;
 
   void notifyInstructionIssued(

Modified: llvm/trunk/lib/MCA/Context.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MCA/Context.cpp?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/lib/MCA/Context.cpp (original)
+++ llvm/trunk/lib/MCA/Context.cpp Mon Mar  4 03:52:34 2019
@@ -42,7 +42,8 @@ Context::createDefaultPipeline(const Pip
   auto Fetch = llvm::make_unique<EntryStage>(SrcMgr);
   auto Dispatch = llvm::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,
                                                    *RCU, *PRF);
-  auto Execute = llvm::make_unique<ExecuteStage>(*HWS);
+  auto Execute =
+      llvm::make_unique<ExecuteStage>(*HWS, Opts.EnableBottleneckAnalysis);
   auto Retire = llvm::make_unique<RetireStage>(*RCU, *PRF);
 
   // Pass the ownership of all the hardware units to this Context.

Modified: llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp (original)
+++ llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp Mon Mar  4 03:52:34 2019
@@ -183,9 +183,9 @@ InstRef Scheduler::select() {
     InstRef &IR = ReadySet[I];
     if (QueueIndex == ReadySet.size() ||
         Strategy->compare(IR, ReadySet[QueueIndex])) {
-      const InstrDesc &D = IR.getInstruction()->getDesc();
-      uint64_t BusyResourceMask = Resources->checkAvailability(D);
-      IR.getInstruction()->updateCriticalResourceMask(BusyResourceMask);
+      Instruction &IS = *IR.getInstruction();
+      uint64_t BusyResourceMask = Resources->checkAvailability(IS.getDesc());
+      IS.setCriticalResourceMask(BusyResourceMask);
       BusyResourceUnits |= BusyResourceMask;
       if (!BusyResourceMask)
         QueueIndex = I;
@@ -227,6 +227,28 @@ void Scheduler::updateIssuedSet(SmallVec
   IssuedSet.resize(IssuedSet.size() - RemovedElements);
 }
 
+uint64_t Scheduler::analyzeResourcePressure(SmallVectorImpl<InstRef> &Insts) {
+  Insts.insert(Insts.end(), ReadySet.begin(), ReadySet.end());
+  return BusyResourceUnits;
+}
+
+void Scheduler::analyzeDataDependencies(SmallVectorImpl<InstRef> &RegDeps,
+                                        SmallVectorImpl<InstRef> &MemDeps) {
+  const auto EndIt = PendingSet.end() - NumDispatchedToThePendingSet;
+  for (InstRef &IR : make_range(PendingSet.begin(), EndIt)) {
+    Instruction &IS = *IR.getInstruction();
+    if (Resources->checkAvailability(IS.getDesc()))
+      continue;
+
+    if (IS.isReady() ||
+        (IS.isMemOp() && LSU.isReady(IR) != IR.getSourceIndex())) {
+      MemDeps.emplace_back(IR);
+    } else {
+      RegDeps.emplace_back(IR);
+    }
+  }
+}
+
 void Scheduler::cycleEvent(SmallVectorImpl<ResourceRef> &Freed,
                            SmallVectorImpl<InstRef> &Executed,
                            SmallVectorImpl<InstRef> &Ready) {

Modified: llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp (original)
+++ llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp Mon Mar  4 03:52:34 2019
@@ -54,6 +54,7 @@ Error ExecuteStage::issueInstruction(Ins
   SmallVector<std::pair<ResourceRef, ResourceCycles>, 4> Used;
   SmallVector<InstRef, 4> Ready;
   HWS.issueInstruction(IR, Used, Ready);
+  NumIssuedOpcodes += IR.getInstruction()->getDesc().NumMicroOps;
 
   notifyReservedOrReleasedBuffers(IR, /* Reserved */ false);
 
@@ -89,6 +90,8 @@ Error ExecuteStage::cycleStart() {
   SmallVector<InstRef, 4> Ready;
 
   HWS.cycleEvent(Freed, Executed, Ready);
+  NumDispatchedOpcodes = 0;
+  NumIssuedOpcodes = 0;
 
   for (const ResourceRef &RR : Freed)
     notifyResourceAvailable(RR);
@@ -106,6 +109,45 @@ Error ExecuteStage::cycleStart() {
   return issueReadyInstructions();
 }
 
+Error ExecuteStage::cycleEnd() {
+  if (!EnablePressureEvents)
+    return ErrorSuccess();
+
+  // Always conservatively report any backpressure events if the dispatch logic
+  // was stalled due to unavailable scheduler resources.
+  if (!HWS.hadTokenStall() && NumDispatchedOpcodes <= NumIssuedOpcodes)
+    return ErrorSuccess();
+
+  SmallVector<InstRef, 8> Insts;
+  uint64_t Mask = HWS.analyzeResourcePressure(Insts);
+  if (Mask) {
+    LLVM_DEBUG(dbgs() << "[E] Backpressure increased because of unavailable "
+                         "pipeline resources: "
+                      << format_hex(Mask, 16) << '\n');
+    HWPressureEvent Ev(HWPressureEvent::RESOURCES, Insts, Mask);
+    notifyEvent(Ev);
+    return ErrorSuccess();
+  }
+
+  SmallVector<InstRef, 8> RegDeps;
+  SmallVector<InstRef, 8> MemDeps;
+  HWS.analyzeDataDependencies(RegDeps, MemDeps);
+  if (RegDeps.size()) {
+    LLVM_DEBUG(
+        dbgs() << "[E] Backpressure increased by register dependencies\n");
+    HWPressureEvent Ev(HWPressureEvent::REGISTER_DEPS, RegDeps);
+    notifyEvent(Ev);
+  }
+
+  if (MemDeps.size()) {
+    LLVM_DEBUG(dbgs() << "[E] Backpressure increased by memory dependencies\n");
+    HWPressureEvent Ev(HWPressureEvent::MEMORY_DEPS, MemDeps);
+    notifyEvent(Ev);
+  }
+
+  return ErrorSuccess();
+}
+
 #ifndef NDEBUG
 static void verifyInstructionEliminated(const InstRef &IR) {
   const Instruction &Inst = *IR.getInstruction();
@@ -147,6 +189,7 @@ Error ExecuteStage::execute(InstRef &IR)
   // be released after MCIS is issued, and all the ResourceCycles for those
   // units have been consumed.
   bool IsReadyInstruction = HWS.dispatch(IR);
+  NumDispatchedOpcodes += IR.getInstruction()->getDesc().NumMicroOps;
   notifyReservedOrReleasedBuffers(IR, /* Reserved */ true);
   if (!IsReadyInstruction)
     return ErrorSuccess();

Added: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s?rev=355308&view=auto
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s (added)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s Mon Mar  4 03:52:34 2019
@@ -0,0 +1,85 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+add %eax, %ebx
+add %ebx, %ecx
+add %ecx, %edx
+add %edx, %eax
+
+# CHECK:      Iterations:        100
+# CHECK-NEXT: Instructions:      400
+# CHECK-NEXT: Total Cycles:      403
+# CHECK-NEXT: Total uOps:        400
+
+# CHECK:      Dispatch Width:    2
+# CHECK-NEXT: uOps Per Cycle:    0.99
+# CHECK-NEXT: IPC:               0.99
+# CHECK-NEXT: Block RThroughput: 2.0
+
+# CHECK:      Cycles with backend pressure increase [ 94.04% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT:   Resource Pressure       [ 0.00% ]
+# CHECK-NEXT:   Data Dependencies:      [ 94.04% ]
+# CHECK-NEXT:   - Register Dependencies [ 94.04% ]
+# CHECK-NEXT:   - Memory Dependencies   [ 0.00% ]
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      1     0.50                        addl	%eax, %ebx
+# CHECK-NEXT:  1      1     0.50                        addl	%ebx, %ecx
+# CHECK-NEXT:  1      1     0.50                        addl	%ecx, %edx
+# CHECK-NEXT:  1      1     0.50                        addl	%edx, %eax
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0]   - JALU0
+# CHECK-NEXT: [1]   - JALU1
+# CHECK-NEXT: [2]   - JDiv
+# CHECK-NEXT: [3]   - JFPA
+# CHECK-NEXT: [4]   - JFPM
+# CHECK-NEXT: [5]   - JFPU0
+# CHECK-NEXT: [6]   - JFPU1
+# CHECK-NEXT: [7]   - JLAGU
+# CHECK-NEXT: [8]   - JMul
+# CHECK-NEXT: [9]   - JSAGU
+# CHECK-NEXT: [10]  - JSTC
+# CHECK-NEXT: [11]  - JVALU0
+# CHECK-NEXT: [12]  - JVALU1
+# CHECK-NEXT: [13]  - JVIMUL
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]
+# CHECK-NEXT: 2.00   2.00    -      -      -      -      -      -      -      -      -      -      -      -
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
+# CHECK-NEXT:  -     1.00    -      -      -      -      -      -      -      -      -      -      -      -     addl	%eax, %ebx
+# CHECK-NEXT: 1.00    -      -      -      -      -      -      -      -      -      -      -      -      -     addl	%ebx, %ecx
+# CHECK-NEXT:  -     1.00    -      -      -      -      -      -      -      -      -      -      -      -     addl	%ecx, %edx
+# CHECK-NEXT: 1.00    -      -      -      -      -      -      -      -      -      -      -      -      -     addl	%edx, %eax
+
+# CHECK:      Timeline view:
+# CHECK-NEXT: Index     0123456
+
+# CHECK:      [0,0]     DeER ..   addl	%eax, %ebx
+# CHECK-NEXT: [0,1]     D=eER..   addl	%ebx, %ecx
+# CHECK-NEXT: [0,2]     .D=eER.   addl	%ecx, %edx
+# CHECK-NEXT: [0,3]     .D==eER   addl	%edx, %eax
+
+# CHECK:      Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK:            [0]    [1]    [2]    [3]
+# CHECK-NEXT: 0.     1     1.0    1.0    0.0       addl	%eax, %ebx
+# CHECK-NEXT: 1.     1     2.0    0.0    0.0       addl	%ebx, %ecx
+# CHECK-NEXT: 2.     1     2.0    0.0    0.0       addl	%ecx, %edx
+# CHECK-NEXT: 3.     1     3.0    0.0    0.0       addl	%edx, %eax

Added: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s?rev=355308&view=auto
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s (added)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s Mon Mar  4 03:52:34 2019
@@ -0,0 +1,72 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+vhaddps %xmm0, %xmm0, %xmm1
+
+# CHECK:      Iterations:        100
+# CHECK-NEXT: Instructions:      100
+# CHECK-NEXT: Total Cycles:      106
+# CHECK-NEXT: Total uOps:        100
+
+# CHECK:      Dispatch Width:    2
+# CHECK-NEXT: uOps Per Cycle:    0.94
+# CHECK-NEXT: IPC:               0.94
+# CHECK-NEXT: Block RThroughput: 1.0
+
+# CHECK:      Cycles with backend pressure increase [ 76.42% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT:   Resource Pressure       [ 76.42% ]
+# CHECK-NEXT:   - JFPA  [ 76.42% ]
+# CHECK-NEXT:   - JFPU0  [ 76.42% ]
+# CHECK-NEXT:   Data Dependencies:      [ 0.00% ]
+# CHECK-NEXT:   - Register Dependencies [ 0.00% ]
+# CHECK-NEXT:   - Memory Dependencies   [ 0.00% ]
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      4     1.00                        vhaddps	%xmm0, %xmm0, %xmm1
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0]   - JALU0
+# CHECK-NEXT: [1]   - JALU1
+# CHECK-NEXT: [2]   - JDiv
+# CHECK-NEXT: [3]   - JFPA
+# CHECK-NEXT: [4]   - JFPM
+# CHECK-NEXT: [5]   - JFPU0
+# CHECK-NEXT: [6]   - JFPU1
+# CHECK-NEXT: [7]   - JLAGU
+# CHECK-NEXT: [8]   - JMul
+# CHECK-NEXT: [9]   - JSAGU
+# CHECK-NEXT: [10]  - JSTC
+# CHECK-NEXT: [11]  - JVALU0
+# CHECK-NEXT: [12]  - JVALU1
+# CHECK-NEXT: [13]  - JVIMUL
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]
+# CHECK-NEXT:  -      -      -     1.00    -     1.00    -      -      -      -      -      -      -      -
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
+# CHECK-NEXT:  -      -      -     1.00    -     1.00    -      -      -      -      -      -      -      -     vhaddps	%xmm0, %xmm0, %xmm1
+
+# CHECK:      Timeline view:
+# CHECK-NEXT: Index     0123456
+
+# CHECK:      [0,0]     DeeeeER   vhaddps	%xmm0, %xmm0, %xmm1
+
+# CHECK:      Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK:            [0]    [1]    [2]    [3]
+# CHECK-NEXT: 0.     1     1.0    1.0    0.0       vhaddps	%xmm0, %xmm0, %xmm1

Added: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s?rev=355308&view=auto
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s (added)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s Mon Mar  4 03:52:34 2019
@@ -0,0 +1,106 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -noalias=false -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s | FileCheck %s
+
+vmovaps (%rsi), %xmm0
+vmovaps %xmm0, (%rdi)
+vmovaps 16(%rsi), %xmm0
+vmovaps %xmm0, 16(%rdi)
+vmovaps 32(%rsi), %xmm0
+vmovaps %xmm0, 32(%rdi)
+vmovaps 48(%rsi), %xmm0
+vmovaps %xmm0, 48(%rdi)
+
+# CHECK:      Iterations:        1500
+# CHECK-NEXT: Instructions:      12000
+# CHECK-NEXT: Total Cycles:      36003
+# CHECK-NEXT: Total uOps:        12000
+
+# CHECK:      Dispatch Width:    2
+# CHECK-NEXT: uOps Per Cycle:    0.33
+# CHECK-NEXT: IPC:               0.33
+# CHECK-NEXT: Block RThroughput: 4.0
+
+# CHECK:      Cycles with backend pressure increase [ 99.89% ]
+# CHECK-NEXT: Throughput Bottlenecks:
+# CHECK-NEXT:   Resource Pressure       [ 0.00% ]
+# CHECK-NEXT:   Data Dependencies:      [ 99.89% ]
+# CHECK-NEXT:   - Register Dependencies [ 0.00% ]
+# CHECK-NEXT:   - Memory Dependencies   [ 99.89% ]
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      5     1.00    *                   vmovaps	(%rsi), %xmm0
+# CHECK-NEXT:  1      1     1.00           *            vmovaps	%xmm0, (%rdi)
+# CHECK-NEXT:  1      5     1.00    *                   vmovaps	16(%rsi), %xmm0
+# CHECK-NEXT:  1      1     1.00           *            vmovaps	%xmm0, 16(%rdi)
+# CHECK-NEXT:  1      5     1.00    *                   vmovaps	32(%rsi), %xmm0
+# CHECK-NEXT:  1      1     1.00           *            vmovaps	%xmm0, 32(%rdi)
+# CHECK-NEXT:  1      5     1.00    *                   vmovaps	48(%rsi), %xmm0
+# CHECK-NEXT:  1      1     1.00           *            vmovaps	%xmm0, 48(%rdi)
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0]   - JALU0
+# CHECK-NEXT: [1]   - JALU1
+# CHECK-NEXT: [2]   - JDiv
+# CHECK-NEXT: [3]   - JFPA
+# CHECK-NEXT: [4]   - JFPM
+# CHECK-NEXT: [5]   - JFPU0
+# CHECK-NEXT: [6]   - JFPU1
+# CHECK-NEXT: [7]   - JLAGU
+# CHECK-NEXT: [8]   - JMul
+# CHECK-NEXT: [9]   - JSAGU
+# CHECK-NEXT: [10]  - JSTC
+# CHECK-NEXT: [11]  - JVALU0
+# CHECK-NEXT: [12]  - JVALU1
+# CHECK-NEXT: [13]  - JVIMUL
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]
+# CHECK-NEXT:  -      -      -     2.00   2.00   4.00   4.00   4.00    -     4.00   4.00    -      -      -
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
+# CHECK-NEXT:  -      -      -      -     1.00   1.00    -     1.00    -      -      -      -      -      -     vmovaps	(%rsi), %xmm0
+# CHECK-NEXT:  -      -      -      -      -      -     1.00    -      -     1.00   1.00    -      -      -     vmovaps	%xmm0, (%rdi)
+# CHECK-NEXT:  -      -      -     1.00    -     1.00    -     1.00    -      -      -      -      -      -     vmovaps	16(%rsi), %xmm0
+# CHECK-NEXT:  -      -      -      -      -      -     1.00    -      -     1.00   1.00    -      -      -     vmovaps	%xmm0, 16(%rdi)
+# CHECK-NEXT:  -      -      -      -     1.00   1.00    -     1.00    -      -      -      -      -      -     vmovaps	32(%rsi), %xmm0
+# CHECK-NEXT:  -      -      -      -      -      -     1.00    -      -     1.00   1.00    -      -      -     vmovaps	%xmm0, 32(%rdi)
+# CHECK-NEXT:  -      -      -     1.00    -     1.00    -     1.00    -      -      -      -      -      -     vmovaps	48(%rsi), %xmm0
+# CHECK-NEXT:  -      -      -      -      -      -     1.00    -      -     1.00   1.00    -      -      -     vmovaps	%xmm0, 48(%rdi)
+
+# CHECK:      Timeline view:
+# CHECK-NEXT:                     0123456789
+# CHECK-NEXT: Index     0123456789          0123456
+
+# CHECK:      [0,0]     DeeeeeER  .    .    .    ..   vmovaps	(%rsi), %xmm0
+# CHECK-NEXT: [0,1]     D=====eER .    .    .    ..   vmovaps	%xmm0, (%rdi)
+# CHECK-NEXT: [0,2]     .D=====eeeeeER .    .    ..   vmovaps	16(%rsi), %xmm0
+# CHECK-NEXT: [0,3]     .D==========eER.    .    ..   vmovaps	%xmm0, 16(%rdi)
+# CHECK-NEXT: [0,4]     . D==========eeeeeER.    ..   vmovaps	32(%rsi), %xmm0
+# CHECK-NEXT: [0,5]     . D===============eER    ..   vmovaps	%xmm0, 32(%rdi)
+# CHECK-NEXT: [0,6]     .  D===============eeeeeER.   vmovaps	48(%rsi), %xmm0
+# CHECK-NEXT: [0,7]     .  D====================eER   vmovaps	%xmm0, 48(%rdi)
+
+# CHECK:      Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK:            [0]    [1]    [2]    [3]
+# CHECK-NEXT: 0.     1     1.0    1.0    0.0       vmovaps	(%rsi), %xmm0
+# CHECK-NEXT: 1.     1     6.0    0.0    0.0       vmovaps	%xmm0, (%rdi)
+# CHECK-NEXT: 2.     1     6.0    0.0    0.0       vmovaps	16(%rsi), %xmm0
+# CHECK-NEXT: 3.     1     11.0   0.0    0.0       vmovaps	%xmm0, 16(%rdi)
+# CHECK-NEXT: 4.     1     11.0   0.0    0.0       vmovaps	32(%rsi), %xmm0
+# CHECK-NEXT: 5.     1     16.0   0.0    0.0       vmovaps	%xmm0, 32(%rdi)
+# CHECK-NEXT: 6.     1     16.0   0.0    0.0       vmovaps	48(%rsi), %xmm0
+# CHECK-NEXT: 7.     1     21.0   0.0    0.0       vmovaps	%xmm0, 48(%rdi)

Modified: llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp (original)
+++ llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp Mon Mar  4 03:52:34 2019
@@ -25,10 +25,14 @@ namespace mca {
 SummaryView::SummaryView(const MCSchedModel &Model, ArrayRef<MCInst> S,
                          unsigned Width)
     : SM(Model), Source(S), DispatchWidth(Width), LastInstructionIdx(0),
-      TotalCycles(0), NumMicroOps(0),
+      TotalCycles(0), NumMicroOps(0), BPI({0, 0, 0, 0}),
+      ResourcePressureDistribution(Model.getNumProcResourceKinds(), 0),
       ProcResourceUsage(Model.getNumProcResourceKinds(), 0),
       ProcResourceMasks(Model.getNumProcResourceKinds()),
-      ResIdx2ProcResID(Model.getNumProcResourceKinds(), 0) {
+      ResIdx2ProcResID(Model.getNumProcResourceKinds(), 0),
+      PressureIncreasedBecauseOfResources(false),
+      PressureIncreasedBecauseOfDataDependencies(false),
+      SeenStallCycles(false) {
   computeProcResourceMasks(SM, ProcResourceMasks);
   for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I) {
     unsigned Index = getResourceStateIndex(ProcResourceMasks[I]);
@@ -61,6 +65,98 @@ void SummaryView::onEvent(const HWInstru
   }
 }
 
+void SummaryView::onEvent(const HWPressureEvent &Event) {
+  assert(Event.Reason != HWPressureEvent::INVALID &&
+         "Unexpected invalid event!");
+
+  switch (Event.Reason) {
+  default:
+    break;
+
+  case HWPressureEvent::RESOURCES: {
+    PressureIncreasedBecauseOfResources = true;
+    ++BPI.ResourcePressureCycles;
+    uint64_t ResourceMask = Event.ResourceMask;
+    while (ResourceMask) {
+      uint64_t Current = ResourceMask & (-ResourceMask);
+      unsigned Index = getResourceStateIndex(Current);
+      unsigned ProcResID = ResIdx2ProcResID[Index];
+      const MCProcResourceDesc &PRDesc = *SM.getProcResource(ProcResID);
+      if (!PRDesc.SubUnitsIdxBegin) {
+        ResourcePressureDistribution[Index]++;
+        ResourceMask ^= Current;
+        continue;
+      }
+
+      for (unsigned I = 0, E = PRDesc.NumUnits; I < E; ++I) {
+        unsigned OtherProcResID = PRDesc.SubUnitsIdxBegin[I];
+        unsigned OtherMask = ProcResourceMasks[OtherProcResID];
+        ResourcePressureDistribution[getResourceStateIndex(OtherMask)]++;
+      }
+
+      ResourceMask ^= Current;
+    }
+  }
+
+  break;
+  case HWPressureEvent::REGISTER_DEPS:
+    PressureIncreasedBecauseOfDataDependencies = true;
+    ++BPI.RegisterDependencyCycles;
+    break;
+  case HWPressureEvent::MEMORY_DEPS:
+    PressureIncreasedBecauseOfDataDependencies = true;
+    ++BPI.MemoryDependencyCycles;
+    break;
+  }
+}
+
+void SummaryView::printBottleneckHints(raw_ostream &OS) const {
+  if (!SeenStallCycles || !BPI.PressureIncreaseCycles)
+    return;
+
+  double PressurePerCycle =
+      (double)BPI.PressureIncreaseCycles * 100 / TotalCycles;
+  double ResourcePressurePerCycle =
+      (double)BPI.ResourcePressureCycles * 100 / TotalCycles;
+  double DDPerCycle = (double)BPI.DataDependencyCycles * 100 / TotalCycles;
+  double RegDepPressurePerCycle =
+      (double)BPI.RegisterDependencyCycles * 100 / TotalCycles;
+  double MemDepPressurePerCycle =
+      (double)BPI.MemoryDependencyCycles * 100 / TotalCycles;
+
+  OS << "\nCycles with backend pressure increase [ "
+     << format("%.2f", floor((PressurePerCycle * 100) + 0.5) / 100) << "% ]";
+
+  OS << "\nThroughput Bottlenecks: "
+     << "\n  Resource Pressure       [ "
+     << format("%.2f", floor((ResourcePressurePerCycle * 100) + 0.5) / 100)
+     << "% ]";
+
+  if (BPI.PressureIncreaseCycles) {
+    for (unsigned I = 0, E = ResourcePressureDistribution.size(); I < E; ++I) {
+      if (ResourcePressureDistribution[I]) {
+        double Frequency =
+            (double)ResourcePressureDistribution[I] * 100 / TotalCycles;
+        unsigned Index = ResIdx2ProcResID[getResourceStateIndex(1ULL << I)];
+        const MCProcResourceDesc &PRDesc = *SM.getProcResource(Index);
+        OS << "\n  - " << PRDesc.Name << "  [ "
+           << format("%.2f", floor((Frequency * 100) + 0.5) / 100) << "% ]";
+      }
+    }
+  }
+
+  OS << "\n  Data Dependencies:      [ "
+     << format("%.2f", floor((DDPerCycle * 100) + 0.5) / 100) << "% ]";
+
+  OS << "\n  - Register Dependencies [ "
+     << format("%.2f", floor((RegDepPressurePerCycle * 100) + 0.5) / 100)
+     << "% ]";
+
+  OS << "\n  - Memory Dependencies   [ "
+     << format("%.2f", floor((MemDepPressurePerCycle * 100) + 0.5) / 100)
+     << "% ]\n\n";
+}
+
 void SummaryView::printView(raw_ostream &OS) const {
   unsigned Instructions = Source.size();
   unsigned Iterations = (LastInstructionIdx / Instructions) + 1;
@@ -85,6 +181,8 @@ void SummaryView::printView(raw_ostream
   TempStream << "\nBlock RThroughput: "
              << format("%.1f", floor((BlockRThroughput * 10) + 0.5) / 10)
              << '\n';
+
+  printBottleneckHints(TempStream);
   TempStream.flush();
   OS << Buffer;
 }

Modified: llvm/trunk/tools/llvm-mca/Views/SummaryView.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/Views/SummaryView.h?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/Views/SummaryView.h (original)
+++ llvm/trunk/tools/llvm-mca/Views/SummaryView.h Mon Mar  4 03:52:34 2019
@@ -45,6 +45,25 @@ class SummaryView : public View {
   unsigned TotalCycles;
   // The total number of micro opcodes contributed by a block of instructions.
   unsigned NumMicroOps;
+
+  struct BackPressureInfo {
+    // Cycles where backpressure increased.
+    unsigned PressureIncreaseCycles;
+    // Cycles where backpressure increased because of pipeline pressure.
+    unsigned ResourcePressureCycles;
+    // Cycles where backpressure increased because of data dependencies.
+    unsigned DataDependencyCycles;
+    // Cycles where backpressure increased because of register dependencies.
+    unsigned RegisterDependencyCycles;
+    // Cycles where backpressure increased because of memory dependencies.
+    unsigned MemoryDependencyCycles;
+  };
+  BackPressureInfo BPI;
+
+  // Resource pressure distribution. There is an element for every processor
+  // resource declared by the scheduling model. Quantities are number of cycles.
+  llvm::SmallVector<unsigned, 8> ResourcePressureDistribution;
+
   // For each processor resource, this vector stores the cumulative number of
   // resource cycles consumed by the analyzed code block.
   llvm::SmallVector<unsigned, 8> ProcResourceUsage;
@@ -58,18 +77,43 @@ class SummaryView : public View {
   // Used to map resource indices to actual processor resource IDs.
   llvm::SmallVector<unsigned, 8> ResIdx2ProcResID;
 
+  // True if resource pressure events were notified during this cycle.
+  bool PressureIncreasedBecauseOfResources;
+  bool PressureIncreasedBecauseOfDataDependencies;
+
+  // True if throughput was affected by dispatch stalls.
+  bool SeenStallCycles;
+
   // Compute the reciprocal throughput for the analyzed code block.
   // The reciprocal block throughput is computed as the MAX between:
   //   - NumMicroOps / DispatchWidth
   //   - Total Resource Cycles / #Units   (for every resource consumed).
   double getBlockRThroughput() const;
 
+  // Prints a bottleneck message to OS.
+  void printBottleneckHints(llvm::raw_ostream &OS) const;
+
 public:
   SummaryView(const llvm::MCSchedModel &Model, llvm::ArrayRef<llvm::MCInst> S,
               unsigned Width);
 
-  void onCycleEnd() override { ++TotalCycles; }
+  void onCycleEnd() override {
+    ++TotalCycles;
+    if (PressureIncreasedBecauseOfResources ||
+        PressureIncreasedBecauseOfDataDependencies) {
+      ++BPI.PressureIncreaseCycles;
+      if (PressureIncreasedBecauseOfDataDependencies)
+        ++BPI.DataDependencyCycles;
+      PressureIncreasedBecauseOfResources = false;
+      PressureIncreasedBecauseOfDataDependencies = false;
+    }
+  }
   void onEvent(const HWInstructionEvent &Event) override;
+  void onEvent(const HWStallEvent &Event) override {
+    SeenStallCycles = true;
+  }
+
+  void onEvent(const HWPressureEvent &Event) override;
 
   void printView(llvm::raw_ostream &OS) const override;
 };

Modified: llvm/trunk/tools/llvm-mca/llvm-mca.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/llvm-mca.cpp?rev=355308&r1=355307&r2=355308&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/llvm-mca.cpp (original)
+++ llvm/trunk/tools/llvm-mca/llvm-mca.cpp Mon Mar  4 03:52:34 2019
@@ -175,6 +175,11 @@ static cl::opt<bool>
                    cl::desc("Print all views including hardware statistics"),
                    cl::cat(ViewOptions), cl::init(false));
 
+static cl::opt<bool> EnableBottleneckAnalysis(
+    "bottleneck-analysis",
+    cl::desc("Enable bottleneck analysis (disabled by default)"),
+    cl::cat(ViewOptions), cl::init(false));
+
 namespace {
 
 const Target *getTarget(const char *ProgName) {
@@ -387,7 +392,8 @@ int main(int argc, char **argv) {
   mca::Context MCA(*MRI, *STI);
 
   mca::PipelineOptions PO(Width, RegisterFileSize, LoadQueueSize,
-                          StoreQueueSize, AssumeNoAlias);
+                          StoreQueueSize, AssumeNoAlias,
+                          EnableBottleneckAnalysis);
 
   // Number each region in the sequence.
   unsigned RegionIdx = 0;




More information about the llvm-commits mailing list