[llvm] 5578ec3 - [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.
Andrea Di Biagio via llvm-commits
llvm-commits at lists.llvm.org
Tue May 5 02:26:50 PDT 2020
Author: Andrea Di Biagio
Date: 2020-05-05T10:25:36+01:00
New Revision: 5578ec32f9c4fef46adce52a2e3d22bf409b3d2c
URL: https://github.com/llvm/llvm-project/commit/5578ec32f9c4fef46adce52a2e3d22bf409b3d2c
DIFF: https://github.com/llvm/llvm-project/commit/5578ec32f9c4fef46adce52a2e3d22bf409b3d2c.diff
LOG: [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.
This fixes a regression introduced by a very old commit 280ac1fd1dc35 (was
llvm-svn 361950).
Commit 280ac1fd1dc35 redesigned the logic in the LSUnit with the goal of
speeding up isReady() queries, and stabilising the LSUnit API (while also making
the load store unit more customisable).
The concept of MemoryGroup (effectively an alias set) was added by that commit
to better describe and track dependencies between memory operations. However,
that concept was not just used for alias dependencies, but it was also used for
describing memory "order" dependencies (enforced by the memory consistency
model).
Instructions of a same memory group were considered "equivalent" as in:
independent operations that can potentially execute in parallel. The problem
was that the cost of a dependency (in terms of number of cycles) should have
been different for "order" dependency. Instructions in an order dependency
simply have to have to wait until their predecessors are "issued" to an
underlying pipeline (rather than having to wait until predecessors have beeng
fully executed). For simple "order" dependencies, this was effectively
introducing an artificial delay on the "issue" of independent loads and stores.
This patch fixes the issue and adds a new test named 'independent-load-stores.s'
to a bunch of x86 targets. That test contains the reproducible posted by Fabian
Ritter on PR45793.
I had to rerun the update-mca-tests script on several files. To avoid expected
regressions on some Exynos tests, I have added a -noalias=false flag (to match
the old strict behavior on latencies).
Some tests for processor Barcelona are improved/fixed by this change and they
now show better results. In a few tests we were incorrectly counting the time
spent by instructions in a scheduler queue. In one case in particular we now
correctly see a store executed out of order. That test was affected by the same
underlying issue reported as PR45793.
Reviewers: mattd
Differential Revision: https://reviews.llvm.org/D79351
Added:
llvm/test/tools/llvm-mca/X86/BtVer2/independent-load-stores.s
llvm/test/tools/llvm-mca/X86/Haswell/independent-load-stores.s
llvm/test/tools/llvm-mca/X86/SkylakeClient/independent-load-stores.s
llvm/test/tools/llvm-mca/X86/SkylakeServer/independent-load-stores.s
Modified:
llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
llvm/lib/MCA/HardwareUnits/LSUnit.cpp
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s
llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s
llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s
llvm/test/tools/llvm-mca/AArch64/Exynos/store.s
llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s
llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s
llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s
Removed:
################################################################################
diff --git a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
index 9143adf0e97b..2f9b4ba8782d 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
@@ -40,7 +40,10 @@ class MemoryGroup {
unsigned NumInstructions;
unsigned NumExecuting;
unsigned NumExecuted;
- SmallVector<MemoryGroup *, 4> Succ;
+ // Successors that are in a order dependency with this group.
+ SmallVector<MemoryGroup *, 4> OrderSucc;
+ // Successors that are in a data dependency with this group.
+ SmallVector<MemoryGroup *, 4> DataSucc;
CriticalDependency CriticalPredecessor;
InstRef CriticalMemoryInstruction;
@@ -55,8 +58,9 @@ class MemoryGroup {
NumExecuted(0), CriticalPredecessor(), CriticalMemoryInstruction() {}
MemoryGroup(MemoryGroup &&) = default;
- ArrayRef<MemoryGroup *> getSuccessors() const { return Succ; }
- unsigned getNumSuccessors() const { return Succ.size(); }
+ size_t getNumSuccessors() const {
+ return OrderSucc.size() + DataSucc.size();
+ }
unsigned getNumPredecessors() const { return NumPredecessors; }
unsigned getNumExecutingPredecessors() const {
return NumExecutingPredecessors;
@@ -75,12 +79,22 @@ class MemoryGroup {
return CriticalPredecessor;
}
- void addSuccessor(MemoryGroup *Group) {
+ void addSuccessor(MemoryGroup *Group, bool IsDataDependent) {
+ // Do not need to add a dependency if there is no data
+ // dependency and all instructions from this group have been
+ // issued already.
+ if (!IsDataDependent && isExecuting())
+ return;
+
Group->NumPredecessors++;
assert(!isExecuted() && "Should have been removed!");
if (isExecuting())
- Group->onGroupIssued(CriticalMemoryInstruction);
- Succ.emplace_back(Group);
+ Group->onGroupIssued(CriticalMemoryInstruction, IsDataDependent);
+
+ if (IsDataDependent)
+ DataSucc.emplace_back(Group);
+ else
+ OrderSucc.emplace_back(Group);
}
bool isWaiting() const {
@@ -98,10 +112,13 @@ class MemoryGroup {
}
bool isExecuted() const { return NumInstructions == NumExecuted; }
- void onGroupIssued(const InstRef &IR) {
+ void onGroupIssued(const InstRef &IR, bool ShouldUpdateCriticalDep) {
assert(!isReady() && "Unexpected group-start event!");
NumExecutingPredecessors++;
+ if (!ShouldUpdateCriticalDep)
+ return;
+
unsigned Cycles = IR.getInstruction()->getCyclesLeft();
if (CriticalPredecessor.Cycles < Cycles) {
CriticalPredecessor.IID = IR.getSourceIndex();
@@ -133,8 +150,14 @@ class MemoryGroup {
return;
// Notify successors that this group started execution.
- for (MemoryGroup *MG : Succ)
- MG->onGroupIssued(CriticalMemoryInstruction);
+ for (MemoryGroup *MG : OrderSucc) {
+ MG->onGroupIssued(CriticalMemoryInstruction, false);
+ // Release the order dependency with this group.
+ MG->onGroupExecuted();
+ }
+
+ for (MemoryGroup *MG : DataSucc)
+ MG->onGroupIssued(CriticalMemoryInstruction, true);
}
void onInstructionExecuted() {
@@ -145,8 +168,8 @@ class MemoryGroup {
if (!isExecuted())
return;
- // Notify successors that this group has finished execution.
- for (MemoryGroup *MG : Succ)
+ // Notify data dependent successors that this group has finished execution.
+ for (MemoryGroup *MG : DataSucc)
MG->onGroupExecuted();
}
@@ -412,6 +435,7 @@ class LSUnit : public LSUnitBase {
unsigned CurrentLoadGroupID;
unsigned CurrentLoadBarrierGroupID;
unsigned CurrentStoreGroupID;
+ unsigned CurrentStoreBarrierGroupID;
public:
LSUnit(const MCSchedModel &SM)
@@ -420,7 +444,8 @@ class LSUnit : public LSUnitBase {
: LSUnit(SM, LQ, SQ, /* NoAlias */ false) {}
LSUnit(const MCSchedModel &SM, unsigned LQ, unsigned SQ, bool AssumeNoAlias)
: LSUnitBase(SM, LQ, SQ, AssumeNoAlias), CurrentLoadGroupID(0),
- CurrentLoadBarrierGroupID(0), CurrentStoreGroupID(0) {}
+ CurrentLoadBarrierGroupID(0), CurrentStoreGroupID(0),
+ CurrentStoreBarrierGroupID(0) {}
/// Returns LSU_AVAILABLE if there are enough load/store queue entries to
/// accomodate instruction IR.
diff --git a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
index 0ee084c7ce1a..e945e8cecce9 100644
--- a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
+++ b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
@@ -77,9 +77,6 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
acquireSQSlot();
if (Desc.MayStore) {
- // Always create a new group for store operations.
-
- // A store may not pass a previous store or store barrier.
unsigned NewGID = createMemoryGroup();
MemoryGroup &NewGroup = getGroup(NewGID);
NewGroup.addInstruction();
@@ -91,16 +88,32 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
MemoryGroup &IDom = getGroup(ImmediateLoadDominator);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << ImmediateLoadDominator
<< ") --> (" << NewGID << ")\n");
- IDom.addSuccessor(&NewGroup);
+ IDom.addSuccessor(&NewGroup, !assumeNoAlias());
+ }
+
+ // A store may not pass a previous store barrier.
+ if (CurrentStoreBarrierGroupID) {
+ MemoryGroup &StoreGroup = getGroup(CurrentStoreBarrierGroupID);
+ LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
+ << CurrentStoreBarrierGroupID
+ << ") --> (" << NewGID << ")\n");
+ StoreGroup.addSuccessor(&NewGroup, true);
}
- if (CurrentStoreGroupID) {
+
+ // A store may not pass a previous store.
+ if (CurrentStoreGroupID &&
+ (CurrentStoreGroupID != CurrentStoreBarrierGroupID)) {
MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID
<< ") --> (" << NewGID << ")\n");
- StoreGroup.addSuccessor(&NewGroup);
+ StoreGroup.addSuccessor(&NewGroup, !assumeNoAlias());
}
+
CurrentStoreGroupID = NewGID;
+ if (IsMemBarrier)
+ CurrentStoreBarrierGroupID = NewGID;
+
if (Desc.MayLoad) {
CurrentLoadGroupID = NewGID;
if (IsMemBarrier)
@@ -112,31 +125,59 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
assert(Desc.MayLoad && "Expected a load!");
- // Always create a new memory group if this is the first load of the sequence.
+ unsigned ImmediateLoadDominator =
+ std::max(CurrentLoadGroupID, CurrentLoadBarrierGroupID);
+
+ // A new load group is created if we are in one of the following situations:
+ // 1) This is a load barrier (by construction, a load barrier is always
+ // assigned to a
diff erent memory group).
+ // 2) There is no load in flight (by construction we always keep loads and
+ // stores into separate memory groups).
+ // 3) There is a load barrier in flight. This load depends on it.
+ // 4) There is an intervening store between the last load dispatched to the
+ // LSU and this load. We always create a new group even if this load
+ // does not alias the last dispatched store.
+ // 5) There is no intervening store and there is an active load group.
+ // However that group has already started execution, so we cannot add
+ // this load to it.
+ bool ShouldCreateANewGroup =
+ IsMemBarrier || !ImmediateLoadDominator ||
+ CurrentLoadBarrierGroupID == ImmediateLoadDominator ||
+ ImmediateLoadDominator <= CurrentStoreGroupID ||
+ getGroup(ImmediateLoadDominator).isExecuting();
- // A load may not pass a previous store unless flag 'NoAlias' is set.
- // A load may pass a previous load.
- // A younger load cannot pass a older load barrier.
- // A load barrier cannot pass a older load.
- bool ShouldCreateANewGroup = !CurrentLoadGroupID || IsMemBarrier ||
- CurrentLoadGroupID <= CurrentStoreGroupID ||
- CurrentLoadGroupID <= CurrentLoadBarrierGroupID;
if (ShouldCreateANewGroup) {
unsigned NewGID = createMemoryGroup();
MemoryGroup &NewGroup = getGroup(NewGID);
NewGroup.addInstruction();
+ // A load may not pass a previous store or store barrier
+ // unless flag 'NoAlias' is set.
if (!assumeNoAlias() && CurrentStoreGroupID) {
- MemoryGroup &StGroup = getGroup(CurrentStoreGroupID);
+ MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID
<< ") --> (" << NewGID << ")\n");
- StGroup.addSuccessor(&NewGroup);
+ StoreGroup.addSuccessor(&NewGroup, true);
}
- if (CurrentLoadBarrierGroupID) {
- MemoryGroup &LdGroup = getGroup(CurrentLoadBarrierGroupID);
- LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentLoadBarrierGroupID
- << ") --> (" << NewGID << ")\n");
- LdGroup.addSuccessor(&NewGroup);
+
+ // A load barrier may not pass a previous load or load barrier.
+ if (IsMemBarrier) {
+ if (ImmediateLoadDominator) {
+ MemoryGroup &LoadGroup = getGroup(ImmediateLoadDominator);
+ LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
+ << ImmediateLoadDominator
+ << ") --> (" << NewGID << ")\n");
+ LoadGroup.addSuccessor(&NewGroup, true);
+ }
+ } else {
+ // A younger load cannot pass a older load barrier.
+ if (CurrentLoadBarrierGroupID) {
+ MemoryGroup &LoadGroup = getGroup(CurrentLoadBarrierGroupID);
+ LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
+ << CurrentLoadBarrierGroupID
+ << ") --> (" << NewGID << ")\n");
+ LoadGroup.addSuccessor(&NewGroup, true);
+ }
}
CurrentLoadGroupID = NewGID;
@@ -145,6 +186,7 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
return NewGID;
}
+ // A load may pass a previous load.
MemoryGroup &Group = getGroup(CurrentLoadGroupID);
Group.addInstruction();
return CurrentLoadGroupID;
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s
index 0cd5b6ef9b7d..cf0e1bff34f7 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st1.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
st1 {v0.s}[0], [sp]
st1 {v0.2s}, [sp]
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s
index 94ac16da6d84..b4d2b582e143 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st2.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
st2 {v0.s, v1.s}[0], [sp]
st2 {v0.2s, v1.2s}, [sp]
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s
index 564e408c4d14..29f8079acd8c 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st3.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
st3 {v0.s, v1.s, v2.s}[0], [sp]
st3 {v0.2s, v1.2s, v2.2s}, [sp]
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s
index 37283f973bc4..7aa69b0f34d2 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/asimd-st4.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -mtriple=aarch64-linux-gnu -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
st4 {v0.s, v1.s, v2.s, v3.s}[0], [sp]
st4 {v0.2s, v1.2s, v2.2s, v3.2s}, [sp]
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s
index 55d1d60252b7..5b7004b817ff 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/float-store.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
stur d0, [sp, #2]
stur q0, [sp, #16]
diff --git a/llvm/test/tools/llvm-mca/AArch64/Exynos/store.s b/llvm/test/tools/llvm-mca/AArch64/Exynos/store.s
index b86cdac50e6e..3c7d412995be 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Exynos/store.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Exynos/store.s
@@ -1,7 +1,7 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M3
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m4 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M4
-# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m5 -resource-pressure=false < %s | FileCheck %s -check-prefixes=ALL,M5
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m3 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M3
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m4 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M4
+# RUN: llvm-mca -march=aarch64 -mcpu=exynos-m5 -resource-pressure=false -noalias=false < %s | FileCheck %s -check-prefixes=ALL,M5
stur x0, [sp, #8]
strb w0, [sp], #1
diff --git a/llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s b/llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s
index adf6c10d7493..b600e387459f 100644
--- a/llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s
+++ b/llvm/test/tools/llvm-mca/X86/Barcelona/load-store-throughput.s
@@ -47,12 +47,12 @@ movaps %xmm3, (%rbx)
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 208
+# CHECK-NEXT: Total Cycles: 207
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.92
-# CHECK-NEXT: IPC: 1.92
+# CHECK-NEXT: uOps Per Cycle: 1.93
+# CHECK-NEXT: IPC: 1.93
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -72,22 +72,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (71.0%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (16.3%)
-# CHECK-NEXT: 2, 148 (71.2%)
-# CHECK-NEXT: 4, 26 (12.5%)
+# CHECK-NEXT: 0, 33 (15.9%)
+# CHECK-NEXT: 2, 148 (71.5%)
+# CHECK-NEXT: 4, 26 (12.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 10 (4.8%)
-# CHECK-NEXT: 2, 195 (93.8%)
+# CHECK-NEXT: 0, 7 (3.4%)
+# CHECK-NEXT: 2, 200 (96.6%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -116,16 +115,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movb %spl, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movb (%rcx), %bpl
-# CHECK-NEXT: - - - - - - 0.95 0.05 movb (%rdx), %sil
-# CHECK-NEXT: - - - - 1.00 - 0.05 0.95 movb %dil, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movb (%rdx), %sil
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movb %dil, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movb %spl, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movb (%rcx), %bpl
-# CHECK-NEXT: [0,2] D=eeeeeER. movb (%rdx), %sil
-# CHECK-NEXT: [0,3] D======eER movb %dil, (%rbx)
+# CHECK: [0,0] DeER . . movb %spl, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movb (%rcx), %bpl
+# CHECK-NEXT: [0,2] D=eeeeeER movb (%rdx), %sil
+# CHECK-NEXT: [0,3] D=eE----R movb %dil, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -137,19 +136,19 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movb %spl, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movb (%rcx), %bpl
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movb (%rdx), %sil
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movb %dil, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 4.0 movb %dil, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.0 <total>
# CHECK: [1] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 208
+# CHECK-NEXT: Total Cycles: 207
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.92
-# CHECK-NEXT: IPC: 1.92
+# CHECK-NEXT: uOps Per Cycle: 1.93
+# CHECK-NEXT: IPC: 1.93
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -169,22 +168,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (71.0%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (16.3%)
-# CHECK-NEXT: 2, 148 (71.2%)
-# CHECK-NEXT: 4, 26 (12.5%)
+# CHECK-NEXT: 0, 33 (15.9%)
+# CHECK-NEXT: 2, 148 (71.5%)
+# CHECK-NEXT: 4, 26 (12.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 10 (4.8%)
-# CHECK-NEXT: 2, 195 (93.8%)
+# CHECK-NEXT: 0, 7 (3.4%)
+# CHECK-NEXT: 2, 200 (96.6%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -213,16 +211,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movw %sp, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movw (%rcx), %bp
-# CHECK-NEXT: - - - - - - 0.95 0.05 movw (%rdx), %si
-# CHECK-NEXT: - - - - 1.00 - 0.05 0.95 movw %di, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movw (%rdx), %si
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movw %di, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movw %sp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movw (%rcx), %bp
-# CHECK-NEXT: [0,2] D=eeeeeER. movw (%rdx), %si
-# CHECK-NEXT: [0,3] D======eER movw %di, (%rbx)
+# CHECK: [0,0] DeER . . movw %sp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movw (%rcx), %bp
+# CHECK-NEXT: [0,2] D=eeeeeER movw (%rdx), %si
+# CHECK-NEXT: [0,3] D=eE----R movw %di, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -234,19 +232,19 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movw %sp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movw (%rcx), %bp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movw (%rdx), %si
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movw %di, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 4.0 movw %di, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.0 <total>
# CHECK: [2] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 208
+# CHECK-NEXT: Total Cycles: 207
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.92
-# CHECK-NEXT: IPC: 1.92
+# CHECK-NEXT: uOps Per Cycle: 1.93
+# CHECK-NEXT: IPC: 1.93
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -266,22 +264,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (71.0%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (16.3%)
-# CHECK-NEXT: 2, 148 (71.2%)
-# CHECK-NEXT: 4, 26 (12.5%)
+# CHECK-NEXT: 0, 33 (15.9%)
+# CHECK-NEXT: 2, 148 (71.5%)
+# CHECK-NEXT: 4, 26 (12.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 10 (4.8%)
-# CHECK-NEXT: 2, 195 (93.8%)
+# CHECK-NEXT: 0, 7 (3.4%)
+# CHECK-NEXT: 2, 200 (96.6%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -310,16 +307,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movl %esp, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movl (%rcx), %ebp
-# CHECK-NEXT: - - - - - - 0.95 0.05 movl (%rdx), %esi
-# CHECK-NEXT: - - - - 1.00 - 0.05 0.95 movl %edi, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movl (%rdx), %esi
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movl %edi, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movl %esp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movl (%rcx), %ebp
-# CHECK-NEXT: [0,2] D=eeeeeER. movl (%rdx), %esi
-# CHECK-NEXT: [0,3] D======eER movl %edi, (%rbx)
+# CHECK: [0,0] DeER . . movl %esp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movl (%rcx), %ebp
+# CHECK-NEXT: [0,2] D=eeeeeER movl (%rdx), %esi
+# CHECK-NEXT: [0,3] D=eE----R movl %edi, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -331,19 +328,19 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movl %esp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movl (%rcx), %ebp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movl (%rdx), %esi
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movl %edi, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 4.0 movl %edi, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.0 <total>
# CHECK: [3] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 208
+# CHECK-NEXT: Total Cycles: 207
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.92
-# CHECK-NEXT: IPC: 1.92
+# CHECK-NEXT: uOps Per Cycle: 1.93
+# CHECK-NEXT: IPC: 1.93
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -363,22 +360,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (71.0%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (16.3%)
-# CHECK-NEXT: 2, 148 (71.2%)
-# CHECK-NEXT: 4, 26 (12.5%)
+# CHECK-NEXT: 0, 33 (15.9%)
+# CHECK-NEXT: 2, 148 (71.5%)
+# CHECK-NEXT: 4, 26 (12.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 10 (4.8%)
-# CHECK-NEXT: 2, 195 (93.8%)
+# CHECK-NEXT: 0, 7 (3.4%)
+# CHECK-NEXT: 2, 200 (96.6%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -407,16 +403,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movq %rsp, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movq (%rcx), %rbp
-# CHECK-NEXT: - - - - - - 0.95 0.05 movq (%rdx), %rsi
-# CHECK-NEXT: - - - - 1.00 - 0.05 0.95 movq %rdi, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movq (%rdx), %rsi
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movq %rdi, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movq %rsp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movq (%rcx), %rbp
-# CHECK-NEXT: [0,2] D=eeeeeER. movq (%rdx), %rsi
-# CHECK-NEXT: [0,3] D======eER movq %rdi, (%rbx)
+# CHECK: [0,0] DeER . . movq %rsp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movq (%rcx), %rbp
+# CHECK-NEXT: [0,2] D=eeeeeER movq (%rdx), %rsi
+# CHECK-NEXT: [0,3] D=eE----R movq %rdi, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -428,19 +424,19 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movq %rsp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movq (%rcx), %rbp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movq %rdi, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 4.0 movq %rdi, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.0 <total>
# CHECK: [4] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 208
+# CHECK-NEXT: Total Cycles: 207
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.92
-# CHECK-NEXT: IPC: 1.92
+# CHECK-NEXT: uOps Per Cycle: 1.93
+# CHECK-NEXT: IPC: 1.93
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -460,22 +456,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (71.0%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (16.3%)
-# CHECK-NEXT: 2, 148 (71.2%)
-# CHECK-NEXT: 4, 26 (12.5%)
+# CHECK-NEXT: 0, 33 (15.9%)
+# CHECK-NEXT: 2, 148 (71.5%)
+# CHECK-NEXT: 4, 26 (12.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 10 (4.8%)
-# CHECK-NEXT: 2, 195 (93.8%)
+# CHECK-NEXT: 0, 7 (3.4%)
+# CHECK-NEXT: 2, 200 (96.6%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -504,16 +499,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movd %mm0, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movd (%rcx), %mm1
-# CHECK-NEXT: - - - - - - 0.95 0.05 movd (%rdx), %mm2
-# CHECK-NEXT: - - - - 1.00 - 0.05 0.95 movd %mm3, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movd (%rdx), %mm2
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movd %mm3, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movd %mm0, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movd (%rcx), %mm1
-# CHECK-NEXT: [0,2] D=eeeeeER. movd (%rdx), %mm2
-# CHECK-NEXT: [0,3] D======eER movd %mm3, (%rbx)
+# CHECK: [0,0] DeER . . movd %mm0, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movd (%rcx), %mm1
+# CHECK-NEXT: [0,2] D=eeeeeER movd (%rdx), %mm2
+# CHECK-NEXT: [0,3] D=eE----R movd %mm3, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -525,19 +520,19 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movd (%rcx), %mm1
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movd (%rdx), %mm2
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movd %mm3, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 4.0 movd %mm3, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.0 <total>
# CHECK: [5] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 209
+# CHECK-NEXT: Total Cycles: 208
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
-# CHECK-NEXT: uOps Per Cycle: 1.91
-# CHECK-NEXT: IPC: 1.91
+# CHECK-NEXT: uOps Per Cycle: 1.92
+# CHECK-NEXT: IPC: 1.92
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -557,22 +552,21 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.3%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 147 (70.7%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 35 (16.7%)
-# CHECK-NEXT: 2, 148 (70.8%)
-# CHECK-NEXT: 4, 26 (12.4%)
+# CHECK-NEXT: 0, 34 (16.3%)
+# CHECK-NEXT: 2, 148 (71.2%)
+# CHECK-NEXT: 4, 26 (12.5%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 3 (1.4%)
-# CHECK-NEXT: 1, 12 (5.7%)
-# CHECK-NEXT: 2, 194 (92.8%)
+# CHECK-NEXT: 0, 8 (3.8%)
+# CHECK-NEXT: 2, 200 (96.2%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -601,17 +595,16 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
# CHECK-NEXT: - - - - 1.00 - - 1.00 movaps %xmm0, (%rax)
# CHECK-NEXT: - - - - - - 1.00 - movaps (%rcx), %xmm1
-# CHECK-NEXT: - - - - - - 0.94 0.06 movaps (%rdx), %xmm2
-# CHECK-NEXT: - - - - 1.00 - 0.06 0.94 movaps %xmm3, (%rbx)
+# CHECK-NEXT: - - - - - - - 1.00 movaps (%rdx), %xmm2
+# CHECK-NEXT: - - - - 1.00 - 1.00 - movaps %xmm3, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: 0
# CHECK-NEXT: Index 0123456789
-# CHECK: [0,0] DeER . . movaps %xmm0, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeeER . movaps (%rcx), %xmm1
-# CHECK-NEXT: [0,2] D=eeeeeeER. movaps (%rdx), %xmm2
-# CHECK-NEXT: [0,3] D=======eER movaps %xmm3, (%rbx)
+# CHECK: [0,0] DeER . . movaps %xmm0, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeeER. movaps (%rcx), %xmm1
+# CHECK-NEXT: [0,2] D=eeeeeeER movaps (%rdx), %xmm2
+# CHECK-NEXT: [0,3] D=eE-----R movaps %xmm3, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -623,5 +616,5 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movaps %xmm0, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movaps (%rcx), %xmm1
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movaps (%rdx), %xmm2
-# CHECK-NEXT: 3. 1 8.0 0.0 0.0 movaps %xmm3, (%rbx)
-# CHECK-NEXT: 1 3.0 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 2.0 0.0 5.0 movaps %xmm3, (%rbx)
+# CHECK-NEXT: 1 1.5 1.0 1.3 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s b/llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
index 08a9c4730226..7d1fb6c24630 100644
--- a/llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
+++ b/llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s
@@ -135,10 +135,10 @@ movaps %xmm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movb %spl, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movb %bpl, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movb %sil, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movb %dil, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movb %bpl, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movb %sil, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movb %dil, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [1] Code Region
@@ -232,10 +232,10 @@ movaps %xmm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movw %sp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movw %bp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movw %si, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movw %di, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movw %bp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movw %si, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movw %di, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [2] Code Region
@@ -329,10 +329,10 @@ movaps %xmm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movl %esp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movl %ebp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movl %esi, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movl %edi, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movl %ebp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movl %esi, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movl %edi, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [3] Code Region
@@ -426,10 +426,10 @@ movaps %xmm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movq %rsp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movq %rbp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movq %rsi, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movq %rdi, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movq %rbp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movq %rsi, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movq %rdi, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [4] Code Region
@@ -620,7 +620,7 @@ movaps %xmm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movaps %xmm0, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movaps %xmm1, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movaps %xmm2, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movaps %xmm3, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movaps %xmm1, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movaps %xmm2, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movaps %xmm3, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s b/llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
index f326028e12ab..4b8f9e7e06cd 100644
--- a/llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
+++ b/llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s
@@ -72,23 +72,24 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 257 (84.0%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 256 (83.7%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (11.1%)
-# CHECK-NEXT: 1, 172 (56.2%)
-# CHECK-NEXT: 2, 86 (28.1%)
+# CHECK-NEXT: 0, 35 (11.4%)
+# CHECK-NEXT: 1, 171 (55.9%)
+# CHECK-NEXT: 2, 85 (27.8%)
+# CHECK-NEXT: 3, 1 (0.3%)
# CHECK-NEXT: 4, 14 (4.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 5 (1.6%)
-# CHECK-NEXT: 1, 202 (66.0%)
-# CHECK-NEXT: 2, 99 (32.4%)
+# CHECK-NEXT: 0, 6 (2.0%)
+# CHECK-NEXT: 1, 200 (65.4%)
+# CHECK-NEXT: 2, 100 (32.7%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -99,8 +100,8 @@ movaps %xmm3, (%rbx)
# CHECK: [1] [2] [3] [4]
# CHECK-NEXT: PdEX 36 40 40
# CHECK-NEXT: PdFPU 0 0 64
-# CHECK-NEXT: PdLoad 19 22 40
-# CHECK-NEXT: PdStore 20 23 24
+# CHECK-NEXT: PdLoad 21 24 40
+# CHECK-NEXT: PdStore 18 21 24
# CHECK: Resources:
# CHECK-NEXT: [0.0] - PdAGLU01
@@ -133,18 +134,18 @@ movaps %xmm3, (%rbx)
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
-# CHECK-NEXT: 0.96 0.04 - - - - - - - - - - - - - - - - - - - - 1.00 movb %spl, (%rax)
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - - - - - - - - - 1.00 movb %spl, (%rax)
# CHECK-NEXT: 2.00 - - - - - - - - - - - - - - - - - - - 2.00 - - movb (%rcx), %bpl
# CHECK-NEXT: - 2.00 - - - - - - - - - - - - - - - - - 2.00 - - - movb (%rdx), %sil
-# CHECK-NEXT: 0.04 0.96 - - - - - - - - - - - - - - - - - - - - 1.00 movb %dil, (%rbx)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - - - - - - - - - 1.00 movb %dil, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movb %spl, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movb (%rcx), %bpl
-# CHECK-NEXT: [0,2] D=eeeeeER. movb (%rdx), %sil
-# CHECK-NEXT: [0,3] D======eER movb %dil, (%rbx)
+# CHECK: [0,0] DeER . . movb %spl, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movb (%rcx), %bpl
+# CHECK-NEXT: [0,2] D=eeeeeER movb (%rdx), %sil
+# CHECK-NEXT: [0,3] D==eE---R movb %dil, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -156,8 +157,8 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movb %spl, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movb (%rcx), %bpl
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movb (%rdx), %sil
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movb %dil, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movb %dil, (%rbx)
+# CHECK-NEXT: 1 1.8 1.3 0.8 <total>
# CHECK: [1] Code Region
@@ -188,23 +189,24 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 257 (84.0%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 256 (83.7%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (11.1%)
-# CHECK-NEXT: 1, 172 (56.2%)
-# CHECK-NEXT: 2, 86 (28.1%)
+# CHECK-NEXT: 0, 35 (11.4%)
+# CHECK-NEXT: 1, 171 (55.9%)
+# CHECK-NEXT: 2, 85 (27.8%)
+# CHECK-NEXT: 3, 1 (0.3%)
# CHECK-NEXT: 4, 14 (4.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 5 (1.6%)
-# CHECK-NEXT: 1, 202 (66.0%)
-# CHECK-NEXT: 2, 99 (32.4%)
+# CHECK-NEXT: 0, 6 (2.0%)
+# CHECK-NEXT: 1, 200 (65.4%)
+# CHECK-NEXT: 2, 100 (32.7%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -215,8 +217,8 @@ movaps %xmm3, (%rbx)
# CHECK: [1] [2] [3] [4]
# CHECK-NEXT: PdEX 36 40 40
# CHECK-NEXT: PdFPU 0 0 64
-# CHECK-NEXT: PdLoad 19 22 40
-# CHECK-NEXT: PdStore 20 23 24
+# CHECK-NEXT: PdLoad 21 24 40
+# CHECK-NEXT: PdStore 18 21 24
# CHECK: Resources:
# CHECK-NEXT: [0.0] - PdAGLU01
@@ -249,18 +251,18 @@ movaps %xmm3, (%rbx)
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
-# CHECK-NEXT: 0.96 0.04 - - - - - - - - - - - - - - - - - - - - 1.00 movw %sp, (%rax)
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - - - - - - - - - 1.00 movw %sp, (%rax)
# CHECK-NEXT: 2.00 - - - - - - - - - - - - - - - - - - - 2.00 - - movw (%rcx), %bp
# CHECK-NEXT: - 2.00 - - - - - - - - - - - - - - - - - 2.00 - - - movw (%rdx), %si
-# CHECK-NEXT: 0.04 0.96 - - - - - - - - - - - - - - - - - - - - 1.00 movw %di, (%rbx)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - - - - - - - - - 1.00 movw %di, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movw %sp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movw (%rcx), %bp
-# CHECK-NEXT: [0,2] D=eeeeeER. movw (%rdx), %si
-# CHECK-NEXT: [0,3] D======eER movw %di, (%rbx)
+# CHECK: [0,0] DeER . . movw %sp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movw (%rcx), %bp
+# CHECK-NEXT: [0,2] D=eeeeeER movw (%rdx), %si
+# CHECK-NEXT: [0,3] D==eE---R movw %di, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -272,8 +274,8 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movw %sp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movw (%rcx), %bp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movw (%rdx), %si
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movw %di, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movw %di, (%rbx)
+# CHECK-NEXT: 1 1.8 1.3 0.8 <total>
# CHECK: [2] Code Region
@@ -304,23 +306,24 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 257 (84.0%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 256 (83.7%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (11.1%)
-# CHECK-NEXT: 1, 172 (56.2%)
-# CHECK-NEXT: 2, 86 (28.1%)
+# CHECK-NEXT: 0, 35 (11.4%)
+# CHECK-NEXT: 1, 171 (55.9%)
+# CHECK-NEXT: 2, 85 (27.8%)
+# CHECK-NEXT: 3, 1 (0.3%)
# CHECK-NEXT: 4, 14 (4.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 5 (1.6%)
-# CHECK-NEXT: 1, 202 (66.0%)
-# CHECK-NEXT: 2, 99 (32.4%)
+# CHECK-NEXT: 0, 6 (2.0%)
+# CHECK-NEXT: 1, 200 (65.4%)
+# CHECK-NEXT: 2, 100 (32.7%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -331,8 +334,8 @@ movaps %xmm3, (%rbx)
# CHECK: [1] [2] [3] [4]
# CHECK-NEXT: PdEX 36 40 40
# CHECK-NEXT: PdFPU 0 0 64
-# CHECK-NEXT: PdLoad 19 22 40
-# CHECK-NEXT: PdStore 20 23 24
+# CHECK-NEXT: PdLoad 21 24 40
+# CHECK-NEXT: PdStore 18 21 24
# CHECK: Resources:
# CHECK-NEXT: [0.0] - PdAGLU01
@@ -365,18 +368,18 @@ movaps %xmm3, (%rbx)
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
-# CHECK-NEXT: 0.96 0.04 - - - - - - - - - - - - - - - - - - - - 1.00 movl %esp, (%rax)
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - - - - - - - - - 1.00 movl %esp, (%rax)
# CHECK-NEXT: 2.00 - - - - - - - - - - - - - - - - - - - 2.00 - - movl (%rcx), %ebp
# CHECK-NEXT: - 2.00 - - - - - - - - - - - - - - - - - 2.00 - - - movl (%rdx), %esi
-# CHECK-NEXT: 0.04 0.96 - - - - - - - - - - - - - - - - - - - - 1.00 movl %edi, (%rbx)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - - - - - - - - - 1.00 movl %edi, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movl %esp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movl (%rcx), %ebp
-# CHECK-NEXT: [0,2] D=eeeeeER. movl (%rdx), %esi
-# CHECK-NEXT: [0,3] D======eER movl %edi, (%rbx)
+# CHECK: [0,0] DeER . . movl %esp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movl (%rcx), %ebp
+# CHECK-NEXT: [0,2] D=eeeeeER movl (%rdx), %esi
+# CHECK-NEXT: [0,3] D==eE---R movl %edi, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -388,8 +391,8 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movl %esp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movl (%rcx), %ebp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movl (%rdx), %esi
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movl %edi, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movl %edi, (%rbx)
+# CHECK-NEXT: 1 1.8 1.3 0.8 <total>
# CHECK: [3] Code Region
@@ -420,23 +423,24 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 257 (84.0%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 256 (83.7%)
# CHECK-NEXT: LQ - Load queue full: 0
# CHECK-NEXT: SQ - Store queue full: 0
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 34 (11.1%)
-# CHECK-NEXT: 1, 172 (56.2%)
-# CHECK-NEXT: 2, 86 (28.1%)
+# CHECK-NEXT: 0, 35 (11.4%)
+# CHECK-NEXT: 1, 171 (55.9%)
+# CHECK-NEXT: 2, 85 (27.8%)
+# CHECK-NEXT: 3, 1 (0.3%)
# CHECK-NEXT: 4, 14 (4.6%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 5 (1.6%)
-# CHECK-NEXT: 1, 202 (66.0%)
-# CHECK-NEXT: 2, 99 (32.4%)
+# CHECK-NEXT: 0, 6 (2.0%)
+# CHECK-NEXT: 1, 200 (65.4%)
+# CHECK-NEXT: 2, 100 (32.7%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -447,8 +451,8 @@ movaps %xmm3, (%rbx)
# CHECK: [1] [2] [3] [4]
# CHECK-NEXT: PdEX 36 40 40
# CHECK-NEXT: PdFPU 0 0 64
-# CHECK-NEXT: PdLoad 19 22 40
-# CHECK-NEXT: PdStore 20 23 24
+# CHECK-NEXT: PdLoad 21 24 40
+# CHECK-NEXT: PdStore 18 21 24
# CHECK: Resources:
# CHECK-NEXT: [0.0] - PdAGLU01
@@ -481,18 +485,18 @@ movaps %xmm3, (%rbx)
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
-# CHECK-NEXT: 0.96 0.04 - - - - - - - - - - - - - - - - - - - - 1.00 movq %rsp, (%rax)
+# CHECK-NEXT: - 1.00 - - - - - - - - - - - - - - - - - - - - 1.00 movq %rsp, (%rax)
# CHECK-NEXT: 2.00 - - - - - - - - - - - - - - - - - - - 2.00 - - movq (%rcx), %rbp
# CHECK-NEXT: - 2.00 - - - - - - - - - - - - - - - - - 2.00 - - - movq (%rdx), %rsi
-# CHECK-NEXT: 0.04 0.96 - - - - - - - - - - - - - - - - - - - - 1.00 movq %rdi, (%rbx)
+# CHECK-NEXT: 1.00 - - - - - - - - - - - - - - - - - - - - - 1.00 movq %rdi, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movq %rsp, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movq (%rcx), %rbp
-# CHECK-NEXT: [0,2] D=eeeeeER. movq (%rdx), %rsi
-# CHECK-NEXT: [0,3] D======eER movq %rdi, (%rbx)
+# CHECK: [0,0] DeER . . movq %rsp, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movq (%rcx), %rbp
+# CHECK-NEXT: [0,2] D=eeeeeER movq (%rdx), %rsi
+# CHECK-NEXT: [0,3] D==eE---R movq %rdi, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -504,14 +508,14 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movq %rsp, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movq (%rcx), %rbp
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movq %rdi, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movq %rdi, (%rbx)
+# CHECK-NEXT: 1 1.8 1.3 0.8 <total>
# CHECK: [4] Code Region
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 554
+# CHECK-NEXT: Total Cycles: 553
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 4
@@ -536,24 +540,24 @@ movaps %xmm3, (%rbx)
# CHECK: Dynamic Dispatch Stall Cycles:
# CHECK-NEXT: RAT - Register unavailable: 0
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
-# CHECK-NEXT: SCHEDQ - Scheduler full: 55 (9.9%)
+# CHECK-NEXT: SCHEDQ - Scheduler full: 57 (10.3%)
# CHECK-NEXT: LQ - Load queue full: 0
-# CHECK-NEXT: SQ - Store queue full: 437 (78.9%)
+# CHECK-NEXT: SQ - Store queue full: 432 (78.1%)
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
# CHECK-NEXT: [# dispatched], [# cycles]
-# CHECK-NEXT: 0, 365 (65.9%)
+# CHECK-NEXT: 0, 364 (65.8%)
# CHECK-NEXT: 1, 88 (15.9%)
-# CHECK-NEXT: 2, 3 (0.5%)
-# CHECK-NEXT: 3, 86 (15.5%)
-# CHECK-NEXT: 4, 12 (2.2%)
+# CHECK-NEXT: 2, 4 (0.7%)
+# CHECK-NEXT: 3, 84 (15.2%)
+# CHECK-NEXT: 4, 13 (2.4%)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 253 (45.7%)
-# CHECK-NEXT: 1, 202 (36.5%)
-# CHECK-NEXT: 2, 99 (17.9%)
+# CHECK-NEXT: 0, 253 (45.8%)
+# CHECK-NEXT: 1, 200 (36.2%)
+# CHECK-NEXT: 2, 100 (18.1%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -599,18 +603,17 @@ movaps %xmm3, (%rbx)
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)
-# CHECK-NEXT: 1.53 1.47 - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1
-# CHECK-NEXT: 1.47 1.53 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1
+# CHECK-NEXT: 1.50 1.50 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2
# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: 0
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeeER. . movd %mm0, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movd (%rcx), %mm1
-# CHECK-NEXT: [0,2] D=eeeeeER . movd (%rdx), %mm2
-# CHECK-NEXT: [0,3] D======eeER movd %mm3, (%rbx)
+# CHECK: [0,0] DeeER. . movd %mm0, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movd (%rcx), %mm1
+# CHECK-NEXT: [0,2] D=eeeeeER movd (%rdx), %mm2
+# CHECK-NEXT: [0,3] D===eeE-R movd %mm3, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -622,8 +625,8 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movd (%rcx), %mm1
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movd (%rdx), %mm2
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movd %mm3, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 4.0 1.0 1.0 movd %mm3, (%rbx)
+# CHECK-NEXT: 1 2.0 1.3 0.3 <total>
# CHECK: [5] Code Region
@@ -668,9 +671,9 @@ movaps %xmm3, (%rbx)
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
# CHECK-NEXT: [# issued], [# cycles]
-# CHECK-NEXT: 0, 104 (25.7%)
-# CHECK-NEXT: 1, 202 (49.9%)
-# CHECK-NEXT: 2, 99 (24.4%)
+# CHECK-NEXT: 0, 105 (25.9%)
+# CHECK-NEXT: 1, 200 (49.4%)
+# CHECK-NEXT: 2, 100 (24.7%)
# CHECK: Scheduler's queue usage:
# CHECK-NEXT: [1] Resource name.
@@ -679,10 +682,10 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: [4] Total number of buffer entries.
# CHECK: [1] [2] [3] [4]
-# CHECK-NEXT: PdEX 37 40 40
-# CHECK-NEXT: PdFPU 37 40 64
-# CHECK-NEXT: PdLoad 19 22 40
-# CHECK-NEXT: PdStore 20 22 24
+# CHECK-NEXT: PdEX 36 40 40
+# CHECK-NEXT: PdFPU 36 40 64
+# CHECK-NEXT: PdLoad 20 23 40
+# CHECK-NEXT: PdStore 19 21 24
# CHECK: Resources:
# CHECK-NEXT: [0.0] - PdAGLU01
@@ -721,12 +724,12 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movaps %xmm3, (%rbx)
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456789
+# CHECK-NEXT: Index 012345678
-# CHECK: [0,0] DeER . . movaps %xmm0, (%rax)
-# CHECK-NEXT: [0,1] DeeeeeER . movaps (%rcx), %xmm1
-# CHECK-NEXT: [0,2] D=eeeeeER. movaps (%rdx), %xmm2
-# CHECK-NEXT: [0,3] D======eER movaps %xmm3, (%rbx)
+# CHECK: [0,0] DeER . . movaps %xmm0, (%rax)
+# CHECK-NEXT: [0,1] DeeeeeER. movaps (%rcx), %xmm1
+# CHECK-NEXT: [0,2] D=eeeeeER movaps (%rdx), %xmm2
+# CHECK-NEXT: [0,3] D===eE--R movaps %xmm3, (%rbx)
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -738,5 +741,5 @@ movaps %xmm3, (%rbx)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movaps %xmm0, (%rax)
# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movaps (%rcx), %xmm1
# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movaps (%rdx), %xmm2
-# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movaps %xmm3, (%rbx)
-# CHECK-NEXT: 1 2.8 1.0 0.0 <total>
+# CHECK-NEXT: 3. 1 4.0 2.0 2.0 movaps %xmm3, (%rbx)
+# CHECK-NEXT: 1 2.0 1.5 0.5 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s b/llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s
index fb96ce5d7561..e1753784c7e6 100644
--- a/llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s
+++ b/llvm/test/tools/llvm-mca/X86/BdVer2/memcpy-like-test.s
@@ -101,9 +101,9 @@ vmovaps %xmm0, 48(%rdi)
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps (%rsi), %xmm0
# CHECK-NEXT: 1. 1 7.0 1.0 0.0 vmovaps %xmm0, (%rdi)
# CHECK-NEXT: 2. 1 1.0 1.0 2.0 vmovaps 16(%rsi), %xmm0
-# CHECK-NEXT: 3. 1 8.0 0.0 0.0 vmovaps %xmm0, 16(%rdi)
+# CHECK-NEXT: 3. 1 8.0 1.0 0.0 vmovaps %xmm0, 16(%rdi)
# CHECK-NEXT: 4. 1 3.0 3.0 0.0 vmovaps 32(%rsi), %xmm0
# CHECK-NEXT: 5. 1 9.0 1.0 0.0 vmovaps %xmm0, 32(%rdi)
# CHECK-NEXT: 6. 1 3.0 3.0 2.0 vmovaps 48(%rsi), %xmm0
-# CHECK-NEXT: 7. 1 10.0 0.0 0.0 vmovaps %xmm0, 48(%rdi)
-# CHECK-NEXT: 1 5.3 1.3 0.5 <total>
+# CHECK-NEXT: 7. 1 10.0 1.0 0.0 vmovaps %xmm0, 48(%rdi)
+# CHECK-NEXT: 1 5.3 1.5 0.5 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s b/llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
index 067301b06a51..c00b1c92a544 100644
--- a/llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
+++ b/llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s
@@ -159,10 +159,10 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movb %spl, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movb %bpl, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movb %sil, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movb %dil, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movb %bpl, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movb %sil, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movb %dil, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [1] Code Region
@@ -273,10 +273,10 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movw %sp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movw %bp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movw %si, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movw %di, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movw %bp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movw %si, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movw %di, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [2] Code Region
@@ -387,10 +387,10 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movl %esp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movl %ebp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movl %esi, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movl %edi, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movl %ebp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movl %esi, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movl %edi, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [3] Code Region
@@ -501,10 +501,10 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movq %rsp, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movq %rbp, (%rcx)
-# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movq %rsi, (%rdx)
-# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movq %rdi, (%rbx)
-# CHECK-NEXT: 1 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movq %rbp, (%rcx)
+# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movq %rsi, (%rdx)
+# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movq %rdi, (%rbx)
+# CHECK-NEXT: 1 2.5 1.0 0.0 <total>
# CHECK: [4] Code Region
@@ -732,10 +732,10 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movaps %xmm0, (%rax)
-# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movaps %xmm1, (%rcx)
-# CHECK-NEXT: 2. 1 4.0 1.0 0.0 movaps %xmm2, (%rdx)
-# CHECK-NEXT: 3. 1 5.0 0.0 0.0 movaps %xmm3, (%rbx)
-# CHECK-NEXT: 1 3.0 0.5 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movaps %xmm1, (%rcx)
+# CHECK-NEXT: 2. 1 4.0 2.0 0.0 movaps %xmm2, (%rdx)
+# CHECK-NEXT: 3. 1 5.0 1.0 0.0 movaps %xmm3, (%rbx)
+# CHECK-NEXT: 1 3.0 1.3 0.0 <total>
# CHECK: [6] Code Region
@@ -846,7 +846,7 @@ vmovaps %ymm3, (%rbx)
# CHECK: [0] [1] [2] [3]
# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps %ymm0, (%rax)
-# CHECK-NEXT: 1. 1 2.0 1.0 0.0 vmovaps %ymm1, (%rcx)
-# CHECK-NEXT: 2. 1 35.0 33.0 0.0 vmovaps %ymm2, (%rdx)
-# CHECK-NEXT: 3. 1 36.0 1.0 0.0 vmovaps %ymm3, (%rbx)
-# CHECK-NEXT: 1 18.5 9.0 0.0 <total>
+# CHECK-NEXT: 1. 1 2.0 2.0 0.0 vmovaps %ymm1, (%rcx)
+# CHECK-NEXT: 2. 1 35.0 34.0 0.0 vmovaps %ymm2, (%rdx)
+# CHECK-NEXT: 3. 1 36.0 2.0 0.0 vmovaps %ymm3, (%rbx)
+# CHECK-NEXT: 1 18.5 9.8 0.0 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/independent-load-stores.s b/llvm/test/tools/llvm-mca/X86/BtVer2/independent-load-stores.s
new file mode 100644
index 000000000000..bd202b604458
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/independent-load-stores.s
@@ -0,0 +1,146 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 < %s | FileCheck %s -check-prefixes=ALL,NOALIAS
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 -noalias=false < %s | FileCheck %s -check-prefixes=ALL,YESALIAS
+
+ addq $44, 64(%r14)
+ addq $44, 128(%r14)
+ addq $44, 192(%r14)
+ addq $44, 256(%r14)
+ addq $44, 320(%r14)
+ addq $44, 384(%r14)
+ addq $44, 448(%r14)
+ addq $44, 512(%r14)
+ addq $44, 576(%r14)
+ addq $44, 640(%r14)
+
+# ALL: Iterations: 100
+# ALL-NEXT: Instructions: 1000
+
+# NOALIAS-NEXT: Total Cycles: 1008
+# YESALIAS-NEXT: Total Cycles: 6003
+
+# ALL-NEXT: Total uOps: 1000
+
+# ALL: Dispatch Width: 2
+
+# NOALIAS-NEXT: uOps Per Cycle: 0.99
+# NOALIAS-NEXT: IPC: 0.99
+
+# YESALIAS-NEXT: uOps Per Cycle: 0.17
+# YESALIAS-NEXT: IPC: 0.17
+
+# ALL-NEXT: Block RThroughput: 10.0
+
+# ALL: Instruction Info:
+# ALL-NEXT: [1]: #uOps
+# ALL-NEXT: [2]: Latency
+# ALL-NEXT: [3]: RThroughput
+# ALL-NEXT: [4]: MayLoad
+# ALL-NEXT: [5]: MayStore
+# ALL-NEXT: [6]: HasSideEffects (U)
+
+# ALL: [1] [2] [3] [4] [5] [6] Instructions:
+# ALL-NEXT: 1 6 1.00 * * addq $44, 64(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 128(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 192(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 256(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 320(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 384(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 448(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 512(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 576(%r14)
+# ALL-NEXT: 1 6 1.00 * * addq $44, 640(%r14)
+
+# ALL: Resources:
+# ALL-NEXT: [0] - JALU0
+# ALL-NEXT: [1] - JALU1
+# ALL-NEXT: [2] - JDiv
+# ALL-NEXT: [3] - JFPA
+# ALL-NEXT: [4] - JFPM
+# ALL-NEXT: [5] - JFPU0
+# ALL-NEXT: [6] - JFPU1
+# ALL-NEXT: [7] - JLAGU
+# ALL-NEXT: [8] - JMul
+# ALL-NEXT: [9] - JSAGU
+# ALL-NEXT: [10] - JSTC
+# ALL-NEXT: [11] - JVALU0
+# ALL-NEXT: [12] - JVALU1
+# ALL-NEXT: [13] - JVIMUL
+
+# ALL: Resource pressure per iteration:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
+# ALL-NEXT: 5.00 5.00 - - - - - 10.00 - 10.00 - - - -
+
+# ALL: Resource pressure by instruction:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
+# ALL-NEXT: - 1.00 - - - - - 1.00 - 1.00 - - - - addq $44, 64(%r14)
+# ALL-NEXT: 1.00 - - - - - - 1.00 - 1.00 - - - - addq $44, 128(%r14)
+# ALL-NEXT: - 1.00 - - - - - 1.00 - 1.00 - - - - addq $44, 192(%r14)
+# ALL-NEXT: 1.00 - - - - - - 1.00 - 1.00 - - - - addq $44, 256(%r14)
+# ALL-NEXT: - 1.00 - - - - - 1.00 - 1.00 - - - - addq $44, 320(%r14)
+# ALL-NEXT: 1.00 - - - - - - 1.00 - 1.00 - - - - addq $44, 384(%r14)
+# ALL-NEXT: - 1.00 - - - - - 1.00 - 1.00 - - - - addq $44, 448(%r14)
+# ALL-NEXT: 1.00 - - - - - - 1.00 - 1.00 - - - - addq $44, 512(%r14)
+# ALL-NEXT: - 1.00 - - - - - 1.00 - 1.00 - - - - addq $44, 576(%r14)
+# ALL-NEXT: 1.00 - - - - - - 1.00 - 1.00 - - - - addq $44, 640(%r14)
+
+# ALL: Timeline view:
+
+# NOALIAS-NEXT: 01234567
+# NOALIAS-NEXT: Index 0123456789
+
+# YESALIAS-NEXT: 0123456789 0123456789 0123456789
+# YESALIAS-NEXT: Index 0123456789 0123456789 0123456789 012
+
+# NOALIAS: [0,0] DeeeeeeER . . . addq $44, 64(%r14)
+# NOALIAS-NEXT: [0,1] D=eeeeeeER. . . addq $44, 128(%r14)
+# NOALIAS-NEXT: [0,2] .D=eeeeeeER . . addq $44, 192(%r14)
+# NOALIAS-NEXT: [0,3] .D==eeeeeeER . . addq $44, 256(%r14)
+# NOALIAS-NEXT: [0,4] . D==eeeeeeER . . addq $44, 320(%r14)
+# NOALIAS-NEXT: [0,5] . D===eeeeeeER . . addq $44, 384(%r14)
+# NOALIAS-NEXT: [0,6] . D===eeeeeeER. . addq $44, 448(%r14)
+# NOALIAS-NEXT: [0,7] . D====eeeeeeER . addq $44, 512(%r14)
+# NOALIAS-NEXT: [0,8] . D====eeeeeeER. addq $44, 576(%r14)
+# NOALIAS-NEXT: [0,9] . D=====eeeeeeER addq $44, 640(%r14)
+
+# YESALIAS: [0,0] DeeeeeeER . . . . . . . . . . . . addq $44, 64(%r14)
+# YESALIAS-NEXT: [0,1] D======eeeeeeER. . . . . . . . . . . addq $44, 128(%r14)
+# YESALIAS-NEXT: [0,2] .D===========eeeeeeER . . . . . . . . . addq $44, 192(%r14)
+# YESALIAS-NEXT: [0,3] .D=================eeeeeeER . . . . . . . . addq $44, 256(%r14)
+# YESALIAS-NEXT: [0,4] . D======================eeeeeeER . . . . . . . addq $44, 320(%r14)
+# YESALIAS-NEXT: [0,5] . D============================eeeeeeER . . . . . . addq $44, 384(%r14)
+# YESALIAS-NEXT: [0,6] . D=================================eeeeeeER. . . . . addq $44, 448(%r14)
+# YESALIAS-NEXT: [0,7] . D=======================================eeeeeeER . . . addq $44, 512(%r14)
+# YESALIAS-NEXT: [0,8] . D============================================eeeeeeER . . addq $44, 576(%r14)
+# YESALIAS-NEXT: [0,9] . D==================================================eeeeeeER addq $44, 640(%r14)
+
+# ALL: Average Wait times (based on the timeline view):
+# ALL-NEXT: [0]: Executions
+# ALL-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# ALL-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# ALL-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# ALL: [0] [1] [2] [3]
+# ALL-NEXT: 0. 1 1.0 1.0 0.0 addq $44, 64(%r14)
+
+# NOALIAS-NEXT: 1. 1 2.0 1.0 0.0 addq $44, 128(%r14)
+# NOALIAS-NEXT: 2. 1 2.0 1.0 0.0 addq $44, 192(%r14)
+# NOALIAS-NEXT: 3. 1 3.0 1.0 0.0 addq $44, 256(%r14)
+# NOALIAS-NEXT: 4. 1 3.0 1.0 0.0 addq $44, 320(%r14)
+# NOALIAS-NEXT: 5. 1 4.0 1.0 0.0 addq $44, 384(%r14)
+# NOALIAS-NEXT: 6. 1 4.0 1.0 0.0 addq $44, 448(%r14)
+# NOALIAS-NEXT: 7. 1 5.0 1.0 0.0 addq $44, 512(%r14)
+# NOALIAS-NEXT: 8. 1 5.0 1.0 0.0 addq $44, 576(%r14)
+# NOALIAS-NEXT: 9. 1 6.0 1.0 0.0 addq $44, 640(%r14)
+# NOALIAS-NEXT: 1 3.5 1.0 0.0 <total>
+
+# YESALIAS-NEXT: 1. 1 7.0 0.0 0.0 addq $44, 128(%r14)
+# YESALIAS-NEXT: 2. 1 12.0 0.0 0.0 addq $44, 192(%r14)
+# YESALIAS-NEXT: 3. 1 18.0 0.0 0.0 addq $44, 256(%r14)
+# YESALIAS-NEXT: 4. 1 23.0 0.0 0.0 addq $44, 320(%r14)
+# YESALIAS-NEXT: 5. 1 29.0 0.0 0.0 addq $44, 384(%r14)
+# YESALIAS-NEXT: 6. 1 34.0 0.0 0.0 addq $44, 448(%r14)
+# YESALIAS-NEXT: 7. 1 40.0 0.0 0.0 addq $44, 512(%r14)
+# YESALIAS-NEXT: 8. 1 45.0 0.0 0.0 addq $44, 576(%r14)
+# YESALIAS-NEXT: 9. 1 51.0 0.0 0.0 addq $44, 640(%r14)
+# YESALIAS-NEXT: 1 26.0 0.1 0.0 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s b/llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s
index 64b6490861c2..691f530be7b0 100644
--- a/llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s
+++ b/llvm/test/tools/llvm-mca/X86/BtVer2/xadd.s
@@ -21,12 +21,12 @@ imul %ecx, %ecx
# CHECK: Iterations: 2
# CHECK-NEXT: Instructions: 10
-# CHECK-NEXT: Total Cycles: 27
+# CHECK-NEXT: Total Cycles: 24
# CHECK-NEXT: Total uOps: 16
# CHECK: Dispatch Width: 2
-# CHECK-NEXT: uOps Per Cycle: 0.59
-# CHECK-NEXT: IPC: 0.37
+# CHECK-NEXT: uOps Per Cycle: 0.67
+# CHECK-NEXT: IPC: 0.42
# CHECK-NEXT: Block RThroughput: 4.0
# CHECK: Instruction Info:
@@ -74,18 +74,18 @@ imul %ecx, %ecx
# CHECK: Timeline view:
# CHECK-NEXT: 0123456789
-# CHECK-NEXT: Index 0123456789 0123456
-
-# CHECK: [0,0] DeeeeeeeeeeeER . . .. xaddl %ecx, (%rsp)
-# CHECK-NEXT: [0,1] . D=eE-------R . . .. addl %ecx, %ecx
-# CHECK-NEXT: [0,2] . D==eE-------R. . .. addl %ecx, %ecx
-# CHECK-NEXT: [0,3] . D==eeeE----R. . .. imull %ecx, %ecx
-# CHECK-NEXT: [0,4] . D=====eeeE--R . .. imull %ecx, %ecx
-# CHECK-NEXT: [1,0] . D=======eeeeeeeeeeeER.. xaddl %ecx, (%rsp)
-# CHECK-NEXT: [1,1] . .D========eE-------R.. addl %ecx, %ecx
-# CHECK-NEXT: [1,2] . .D=========eE-------R. addl %ecx, %ecx
-# CHECK-NEXT: [1,3] . . D=========eeeE----R. imull %ecx, %ecx
-# CHECK-NEXT: [1,4] . . D============eeeE--R imull %ecx, %ecx
+# CHECK-NEXT: Index 0123456789 0123
+
+# CHECK: [0,0] DeeeeeeeeeeeER . . . xaddl %ecx, (%rsp)
+# CHECK-NEXT: [0,1] . D=eE-------R . . . addl %ecx, %ecx
+# CHECK-NEXT: [0,2] . D==eE-------R. . . addl %ecx, %ecx
+# CHECK-NEXT: [0,3] . D==eeeE----R. . . imull %ecx, %ecx
+# CHECK-NEXT: [0,4] . D=====eeeE--R . . imull %ecx, %ecx
+# CHECK-NEXT: [1,0] . D====eeeeeeeeeeeER . xaddl %ecx, (%rsp)
+# CHECK-NEXT: [1,1] . .D=====eE-------R . addl %ecx, %ecx
+# CHECK-NEXT: [1,2] . .D======eE-------R. addl %ecx, %ecx
+# CHECK-NEXT: [1,3] . . D======eeeE----R. imull %ecx, %ecx
+# CHECK-NEXT: [1,4] . . D=========eeeE--R imull %ecx, %ecx
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -94,12 +94,12 @@ imul %ecx, %ecx
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 4.5 0.5 0.0 xaddl %ecx, (%rsp)
-# CHECK-NEXT: 1. 2 5.5 0.0 7.0 addl %ecx, %ecx
-# CHECK-NEXT: 2. 2 6.5 0.0 7.0 addl %ecx, %ecx
-# CHECK-NEXT: 3. 2 6.5 0.0 4.0 imull %ecx, %ecx
-# CHECK-NEXT: 4. 2 9.5 0.0 2.0 imull %ecx, %ecx
-# CHECK-NEXT: 2 6.5 0.1 4.0 <total>
+# CHECK-NEXT: 0. 2 3.0 0.5 0.0 xaddl %ecx, (%rsp)
+# CHECK-NEXT: 1. 2 4.0 0.0 7.0 addl %ecx, %ecx
+# CHECK-NEXT: 2. 2 5.0 0.0 7.0 addl %ecx, %ecx
+# CHECK-NEXT: 3. 2 5.0 0.0 4.0 imull %ecx, %ecx
+# CHECK-NEXT: 4. 2 8.0 0.0 2.0 imull %ecx, %ecx
+# CHECK-NEXT: 2 5.0 0.1 4.0 <total>
# CHECK: [1] Code Region
diff --git a/llvm/test/tools/llvm-mca/X86/Haswell/independent-load-stores.s b/llvm/test/tools/llvm-mca/X86/Haswell/independent-load-stores.s
new file mode 100644
index 000000000000..3988ce880d0b
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/Haswell/independent-load-stores.s
@@ -0,0 +1,142 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mcpu=haswell -timeline -timeline-max-iterations=1 < %s | FileCheck %s -check-prefixes=ALL,NOALIAS
+# RUN: llvm-mca -mcpu=haswell -timeline -timeline-max-iterations=1 -noalias=false < %s | FileCheck %s -check-prefixes=ALL,YESALIAS
+
+ addq $44, 64(%r14)
+ addq $44, 128(%r14)
+ addq $44, 192(%r14)
+ addq $44, 256(%r14)
+ addq $44, 320(%r14)
+ addq $44, 384(%r14)
+ addq $44, 448(%r14)
+ addq $44, 512(%r14)
+ addq $44, 576(%r14)
+ addq $44, 640(%r14)
+
+# ALL: Iterations: 100
+# ALL-NEXT: Instructions: 1000
+
+# NOALIAS-NEXT: Total Cycles: 1009
+# YESALIAS-NEXT: Total Cycles: 7003
+
+# ALL-NEXT: Total uOps: 3000
+
+# ALL: Dispatch Width: 4
+
+# NOALIAS-NEXT: uOps Per Cycle: 2.97
+# NOALIAS-NEXT: IPC: 0.99
+
+# YESALIAS-NEXT: uOps Per Cycle: 0.43
+# YESALIAS-NEXT: IPC: 0.14
+
+# ALL-NEXT: Block RThroughput: 10.0
+
+# ALL: Instruction Info:
+# ALL-NEXT: [1]: #uOps
+# ALL-NEXT: [2]: Latency
+# ALL-NEXT: [3]: RThroughput
+# ALL-NEXT: [4]: MayLoad
+# ALL-NEXT: [5]: MayStore
+# ALL-NEXT: [6]: HasSideEffects (U)
+
+# ALL: [1] [2] [3] [4] [5] [6] Instructions:
+# ALL-NEXT: 3 7 1.00 * * addq $44, 64(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 128(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 192(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 256(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 320(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 384(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 448(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 512(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 576(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 640(%r14)
+
+# ALL: Resources:
+# ALL-NEXT: [0] - HWDivider
+# ALL-NEXT: [1] - HWFPDivider
+# ALL-NEXT: [2] - HWPort0
+# ALL-NEXT: [3] - HWPort1
+# ALL-NEXT: [4] - HWPort2
+# ALL-NEXT: [5] - HWPort3
+# ALL-NEXT: [6] - HWPort4
+# ALL-NEXT: [7] - HWPort5
+# ALL-NEXT: [8] - HWPort6
+# ALL-NEXT: [9] - HWPort7
+
+# ALL: Resource pressure per iteration:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
+# ALL-NEXT: - - 2.50 2.50 6.66 6.67 10.00 2.50 2.50 6.67
+
+# ALL: Resource pressure by instruction:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 64(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 128(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 192(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 256(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.66 1.00 - 0.50 0.67 addq $44, 320(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.67 1.00 0.50 - 0.66 addq $44, 384(%r14)
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 448(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 512(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 576(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 640(%r14)
+
+# ALL: Timeline view:
+
+# NOALIAS-NEXT: 012345678
+# NOALIAS-NEXT: Index 0123456789
+
+# YESALIAS-NEXT: 0123456789 0123456789 0123456789 012
+# YESALIAS-NEXT: Index 0123456789 0123456789 0123456789 0123456789
+
+# NOALIAS: [0,0] DeeeeeeeER. . . addq $44, 64(%r14)
+# NOALIAS-NEXT: [0,1] .DeeeeeeeER . . addq $44, 128(%r14)
+# NOALIAS-NEXT: [0,2] . DeeeeeeeER . . addq $44, 192(%r14)
+# NOALIAS-NEXT: [0,3] . DeeeeeeeER . . addq $44, 256(%r14)
+# NOALIAS-NEXT: [0,4] . DeeeeeeeER . . addq $44, 320(%r14)
+# NOALIAS-NEXT: [0,5] . DeeeeeeeER. . addq $44, 384(%r14)
+# NOALIAS-NEXT: [0,6] . .DeeeeeeeER . addq $44, 448(%r14)
+# NOALIAS-NEXT: [0,7] . . DeeeeeeeER . addq $44, 512(%r14)
+# NOALIAS-NEXT: [0,8] . . DeeeeeeeER. addq $44, 576(%r14)
+# NOALIAS-NEXT: [0,9] . . DeeeeeeeER addq $44, 640(%r14)
+
+# YESALIAS: [0,0] DeeeeeeeER. . . . . . . . . . . . . . addq $44, 64(%r14)
+# YESALIAS-NEXT: [0,1] .D======eeeeeeeER . . . . . . . . . . . . addq $44, 128(%r14)
+# YESALIAS-NEXT: [0,2] . D============eeeeeeeER . . . . . . . . . . . addq $44, 192(%r14)
+# YESALIAS-NEXT: [0,3] . D==================eeeeeeeER . . . . . . . . . addq $44, 256(%r14)
+# YESALIAS-NEXT: [0,4] . D========================eeeeeeeER . . . . . . . . addq $44, 320(%r14)
+# YESALIAS-NEXT: [0,5] . D==============================eeeeeeeER. . . . . . . addq $44, 384(%r14)
+# YESALIAS-NEXT: [0,6] . .D====================================eeeeeeeER . . . . . addq $44, 448(%r14)
+# YESALIAS-NEXT: [0,7] . . D==========================================eeeeeeeER . . . . addq $44, 512(%r14)
+# YESALIAS-NEXT: [0,8] . . D================================================eeeeeeeER . . addq $44, 576(%r14)
+# YESALIAS-NEXT: [0,9] . . D======================================================eeeeeeeER addq $44, 640(%r14)
+
+# ALL: Average Wait times (based on the timeline view):
+# ALL-NEXT: [0]: Executions
+# ALL-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# ALL-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# ALL-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# ALL: [0] [1] [2] [3]
+# ALL-NEXT: 0. 1 1.0 1.0 0.0 addq $44, 64(%r14)
+
+# NOALIAS-NEXT: 1. 1 1.0 1.0 0.0 addq $44, 128(%r14)
+# NOALIAS-NEXT: 2. 1 1.0 1.0 0.0 addq $44, 192(%r14)
+# NOALIAS-NEXT: 3. 1 1.0 1.0 0.0 addq $44, 256(%r14)
+# NOALIAS-NEXT: 4. 1 1.0 1.0 0.0 addq $44, 320(%r14)
+# NOALIAS-NEXT: 5. 1 1.0 1.0 0.0 addq $44, 384(%r14)
+# NOALIAS-NEXT: 6. 1 1.0 1.0 0.0 addq $44, 448(%r14)
+# NOALIAS-NEXT: 7. 1 1.0 1.0 0.0 addq $44, 512(%r14)
+# NOALIAS-NEXT: 8. 1 1.0 1.0 0.0 addq $44, 576(%r14)
+# NOALIAS-NEXT: 9. 1 1.0 1.0 0.0 addq $44, 640(%r14)
+# NOALIAS-NEXT: 1 1.0 1.0 0.0 <total>
+
+# YESALIAS-NEXT: 1. 1 7.0 0.0 0.0 addq $44, 128(%r14)
+# YESALIAS-NEXT: 2. 1 13.0 0.0 0.0 addq $44, 192(%r14)
+# YESALIAS-NEXT: 3. 1 19.0 0.0 0.0 addq $44, 256(%r14)
+# YESALIAS-NEXT: 4. 1 25.0 0.0 0.0 addq $44, 320(%r14)
+# YESALIAS-NEXT: 5. 1 31.0 0.0 0.0 addq $44, 384(%r14)
+# YESALIAS-NEXT: 6. 1 37.0 0.0 0.0 addq $44, 448(%r14)
+# YESALIAS-NEXT: 7. 1 43.0 0.0 0.0 addq $44, 512(%r14)
+# YESALIAS-NEXT: 8. 1 49.0 0.0 0.0 addq $44, 576(%r14)
+# YESALIAS-NEXT: 9. 1 55.0 0.0 0.0 addq $44, 640(%r14)
+# YESALIAS-NEXT: 1 28.0 0.1 0.0 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/SkylakeClient/independent-load-stores.s b/llvm/test/tools/llvm-mca/X86/SkylakeClient/independent-load-stores.s
new file mode 100644
index 000000000000..03d7bcd079a3
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/SkylakeClient/independent-load-stores.s
@@ -0,0 +1,142 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=skylake -timeline -timeline-max-iterations=1 < %s | FileCheck %s -check-prefixes=ALL,NOALIAS
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=skylake -timeline -timeline-max-iterations=1 -noalias=false < %s | FileCheck %s -check-prefixes=ALL,YESALIAS
+
+ addq $44, 64(%r14)
+ addq $44, 128(%r14)
+ addq $44, 192(%r14)
+ addq $44, 256(%r14)
+ addq $44, 320(%r14)
+ addq $44, 384(%r14)
+ addq $44, 448(%r14)
+ addq $44, 512(%r14)
+ addq $44, 576(%r14)
+ addq $44, 640(%r14)
+
+# ALL: Iterations: 100
+# ALL-NEXT: Instructions: 1000
+
+# NOALIAS-NEXT: Total Cycles: 1009
+# YESALIAS-NEXT: Total Cycles: 7003
+
+# ALL-NEXT: Total uOps: 3000
+
+# ALL: Dispatch Width: 6
+
+# NOALIAS-NEXT: uOps Per Cycle: 2.97
+# NOALIAS-NEXT: IPC: 0.99
+
+# YESALIAS-NEXT: uOps Per Cycle: 0.43
+# YESALIAS-NEXT: IPC: 0.14
+
+# ALL-NEXT: Block RThroughput: 10.0
+
+# ALL: Instruction Info:
+# ALL-NEXT: [1]: #uOps
+# ALL-NEXT: [2]: Latency
+# ALL-NEXT: [3]: RThroughput
+# ALL-NEXT: [4]: MayLoad
+# ALL-NEXT: [5]: MayStore
+# ALL-NEXT: [6]: HasSideEffects (U)
+
+# ALL: [1] [2] [3] [4] [5] [6] Instructions:
+# ALL-NEXT: 3 7 1.00 * * addq $44, 64(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 128(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 192(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 256(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 320(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 384(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 448(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 512(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 576(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 640(%r14)
+
+# ALL: Resources:
+# ALL-NEXT: [0] - SKLDivider
+# ALL-NEXT: [1] - SKLFPDivider
+# ALL-NEXT: [2] - SKLPort0
+# ALL-NEXT: [3] - SKLPort1
+# ALL-NEXT: [4] - SKLPort2
+# ALL-NEXT: [5] - SKLPort3
+# ALL-NEXT: [6] - SKLPort4
+# ALL-NEXT: [7] - SKLPort5
+# ALL-NEXT: [8] - SKLPort6
+# ALL-NEXT: [9] - SKLPort7
+
+# ALL: Resource pressure per iteration:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
+# ALL-NEXT: - - 2.50 2.50 6.66 6.67 10.00 2.50 2.50 6.67
+
+# ALL: Resource pressure by instruction:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 64(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 128(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 192(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 256(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.66 1.00 - 0.50 0.67 addq $44, 320(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.67 1.00 0.50 - 0.66 addq $44, 384(%r14)
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 448(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 512(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 576(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 640(%r14)
+
+# ALL: Timeline view:
+
+# NOALIAS-NEXT: 012345678
+# NOALIAS-NEXT: Index 0123456789
+
+# YESALIAS-NEXT: 0123456789 0123456789 0123456789 012
+# YESALIAS-NEXT: Index 0123456789 0123456789 0123456789 0123456789
+
+# NOALIAS: [0,0] DeeeeeeeER. . . addq $44, 64(%r14)
+# NOALIAS-NEXT: [0,1] D=eeeeeeeER . . addq $44, 128(%r14)
+# NOALIAS-NEXT: [0,2] .D=eeeeeeeER . . addq $44, 192(%r14)
+# NOALIAS-NEXT: [0,3] .D==eeeeeeeER . . addq $44, 256(%r14)
+# NOALIAS-NEXT: [0,4] . D==eeeeeeeER . . addq $44, 320(%r14)
+# NOALIAS-NEXT: [0,5] . D===eeeeeeeER. . addq $44, 384(%r14)
+# NOALIAS-NEXT: [0,6] . D===eeeeeeeER . addq $44, 448(%r14)
+# NOALIAS-NEXT: [0,7] . D====eeeeeeeER . addq $44, 512(%r14)
+# NOALIAS-NEXT: [0,8] . D====eeeeeeeER. addq $44, 576(%r14)
+# NOALIAS-NEXT: [0,9] . D=====eeeeeeeER addq $44, 640(%r14)
+
+# YESALIAS: [0,0] DeeeeeeeER. . . . . . . . . . . . . . addq $44, 64(%r14)
+# YESALIAS-NEXT: [0,1] D=======eeeeeeeER . . . . . . . . . . . . addq $44, 128(%r14)
+# YESALIAS-NEXT: [0,2] .D=============eeeeeeeER . . . . . . . . . . . addq $44, 192(%r14)
+# YESALIAS-NEXT: [0,3] .D====================eeeeeeeER . . . . . . . . . addq $44, 256(%r14)
+# YESALIAS-NEXT: [0,4] . D==========================eeeeeeeER . . . . . . . . addq $44, 320(%r14)
+# YESALIAS-NEXT: [0,5] . D=================================eeeeeeeER. . . . . . . addq $44, 384(%r14)
+# YESALIAS-NEXT: [0,6] . D=======================================eeeeeeeER . . . . . addq $44, 448(%r14)
+# YESALIAS-NEXT: [0,7] . D==============================================eeeeeeeER . . . . addq $44, 512(%r14)
+# YESALIAS-NEXT: [0,8] . D====================================================eeeeeeeER . . addq $44, 576(%r14)
+# YESALIAS-NEXT: [0,9] . D===========================================================eeeeeeeER addq $44, 640(%r14)
+
+# ALL: Average Wait times (based on the timeline view):
+# ALL-NEXT: [0]: Executions
+# ALL-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# ALL-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# ALL-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# ALL: [0] [1] [2] [3]
+# ALL-NEXT: 0. 1 1.0 1.0 0.0 addq $44, 64(%r14)
+
+# NOALIAS-NEXT: 1. 1 2.0 1.0 0.0 addq $44, 128(%r14)
+# NOALIAS-NEXT: 2. 1 2.0 1.0 0.0 addq $44, 192(%r14)
+# NOALIAS-NEXT: 3. 1 3.0 1.0 0.0 addq $44, 256(%r14)
+# NOALIAS-NEXT: 4. 1 3.0 1.0 0.0 addq $44, 320(%r14)
+# NOALIAS-NEXT: 5. 1 4.0 1.0 0.0 addq $44, 384(%r14)
+# NOALIAS-NEXT: 6. 1 4.0 1.0 0.0 addq $44, 448(%r14)
+# NOALIAS-NEXT: 7. 1 5.0 1.0 0.0 addq $44, 512(%r14)
+# NOALIAS-NEXT: 8. 1 5.0 1.0 0.0 addq $44, 576(%r14)
+# NOALIAS-NEXT: 9. 1 6.0 1.0 0.0 addq $44, 640(%r14)
+# NOALIAS-NEXT: 1 3.5 1.0 0.0 <total>
+
+# YESALIAS-NEXT: 1. 1 8.0 0.0 0.0 addq $44, 128(%r14)
+# YESALIAS-NEXT: 2. 1 14.0 0.0 0.0 addq $44, 192(%r14)
+# YESALIAS-NEXT: 3. 1 21.0 0.0 0.0 addq $44, 256(%r14)
+# YESALIAS-NEXT: 4. 1 27.0 0.0 0.0 addq $44, 320(%r14)
+# YESALIAS-NEXT: 5. 1 34.0 0.0 0.0 addq $44, 384(%r14)
+# YESALIAS-NEXT: 6. 1 40.0 0.0 0.0 addq $44, 448(%r14)
+# YESALIAS-NEXT: 7. 1 47.0 0.0 0.0 addq $44, 512(%r14)
+# YESALIAS-NEXT: 8. 1 53.0 0.0 0.0 addq $44, 576(%r14)
+# YESALIAS-NEXT: 9. 1 60.0 0.0 0.0 addq $44, 640(%r14)
+# YESALIAS-NEXT: 1 30.5 0.1 0.0 <total>
diff --git a/llvm/test/tools/llvm-mca/X86/SkylakeServer/independent-load-stores.s b/llvm/test/tools/llvm-mca/X86/SkylakeServer/independent-load-stores.s
new file mode 100644
index 000000000000..4ebdee99ad6b
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/SkylakeServer/independent-load-stores.s
@@ -0,0 +1,142 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=skylake-avx512 -timeline -timeline-max-iterations=1 < %s | FileCheck %s -check-prefixes=ALL,NOALIAS
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=skylake-avx512 -timeline -timeline-max-iterations=1 -noalias=false < %s | FileCheck %s -check-prefixes=ALL,YESALIAS
+
+ addq $44, 64(%r14)
+ addq $44, 128(%r14)
+ addq $44, 192(%r14)
+ addq $44, 256(%r14)
+ addq $44, 320(%r14)
+ addq $44, 384(%r14)
+ addq $44, 448(%r14)
+ addq $44, 512(%r14)
+ addq $44, 576(%r14)
+ addq $44, 640(%r14)
+
+# ALL: Iterations: 100
+# ALL-NEXT: Instructions: 1000
+
+# NOALIAS-NEXT: Total Cycles: 1009
+# YESALIAS-NEXT: Total Cycles: 7003
+
+# ALL-NEXT: Total uOps: 3000
+
+# ALL: Dispatch Width: 6
+
+# NOALIAS-NEXT: uOps Per Cycle: 2.97
+# NOALIAS-NEXT: IPC: 0.99
+
+# YESALIAS-NEXT: uOps Per Cycle: 0.43
+# YESALIAS-NEXT: IPC: 0.14
+
+# ALL-NEXT: Block RThroughput: 10.0
+
+# ALL: Instruction Info:
+# ALL-NEXT: [1]: #uOps
+# ALL-NEXT: [2]: Latency
+# ALL-NEXT: [3]: RThroughput
+# ALL-NEXT: [4]: MayLoad
+# ALL-NEXT: [5]: MayStore
+# ALL-NEXT: [6]: HasSideEffects (U)
+
+# ALL: [1] [2] [3] [4] [5] [6] Instructions:
+# ALL-NEXT: 3 7 1.00 * * addq $44, 64(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 128(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 192(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 256(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 320(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 384(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 448(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 512(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 576(%r14)
+# ALL-NEXT: 3 7 1.00 * * addq $44, 640(%r14)
+
+# ALL: Resources:
+# ALL-NEXT: [0] - SKXDivider
+# ALL-NEXT: [1] - SKXFPDivider
+# ALL-NEXT: [2] - SKXPort0
+# ALL-NEXT: [3] - SKXPort1
+# ALL-NEXT: [4] - SKXPort2
+# ALL-NEXT: [5] - SKXPort3
+# ALL-NEXT: [6] - SKXPort4
+# ALL-NEXT: [7] - SKXPort5
+# ALL-NEXT: [8] - SKXPort6
+# ALL-NEXT: [9] - SKXPort7
+
+# ALL: Resource pressure per iteration:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
+# ALL-NEXT: - - 2.50 2.50 6.66 6.67 10.00 2.50 2.50 6.67
+
+# ALL: Resource pressure by instruction:
+# ALL-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 64(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 128(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 192(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 256(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.66 1.00 - 0.50 0.67 addq $44, 320(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.67 1.00 0.50 - 0.66 addq $44, 384(%r14)
+# ALL-NEXT: - - - 0.50 0.66 0.67 1.00 - 0.50 0.67 addq $44, 448(%r14)
+# ALL-NEXT: - - 0.50 - 0.67 0.66 1.00 0.50 - 0.67 addq $44, 512(%r14)
+# ALL-NEXT: - - - 0.50 0.67 0.67 1.00 - 0.50 0.66 addq $44, 576(%r14)
+# ALL-NEXT: - - 0.50 - 0.66 0.67 1.00 0.50 - 0.67 addq $44, 640(%r14)
+
+# ALL: Timeline view:
+
+# NOALIAS-NEXT: 012345678
+# NOALIAS-NEXT: Index 0123456789
+
+# YESALIAS-NEXT: 0123456789 0123456789 0123456789 012
+# YESALIAS-NEXT: Index 0123456789 0123456789 0123456789 0123456789
+
+# NOALIAS: [0,0] DeeeeeeeER. . . addq $44, 64(%r14)
+# NOALIAS-NEXT: [0,1] D=eeeeeeeER . . addq $44, 128(%r14)
+# NOALIAS-NEXT: [0,2] .D=eeeeeeeER . . addq $44, 192(%r14)
+# NOALIAS-NEXT: [0,3] .D==eeeeeeeER . . addq $44, 256(%r14)
+# NOALIAS-NEXT: [0,4] . D==eeeeeeeER . . addq $44, 320(%r14)
+# NOALIAS-NEXT: [0,5] . D===eeeeeeeER. . addq $44, 384(%r14)
+# NOALIAS-NEXT: [0,6] . D===eeeeeeeER . addq $44, 448(%r14)
+# NOALIAS-NEXT: [0,7] . D====eeeeeeeER . addq $44, 512(%r14)
+# NOALIAS-NEXT: [0,8] . D====eeeeeeeER. addq $44, 576(%r14)
+# NOALIAS-NEXT: [0,9] . D=====eeeeeeeER addq $44, 640(%r14)
+
+# YESALIAS: [0,0] DeeeeeeeER. . . . . . . . . . . . . . addq $44, 64(%r14)
+# YESALIAS-NEXT: [0,1] D=======eeeeeeeER . . . . . . . . . . . . addq $44, 128(%r14)
+# YESALIAS-NEXT: [0,2] .D=============eeeeeeeER . . . . . . . . . . . addq $44, 192(%r14)
+# YESALIAS-NEXT: [0,3] .D====================eeeeeeeER . . . . . . . . . addq $44, 256(%r14)
+# YESALIAS-NEXT: [0,4] . D==========================eeeeeeeER . . . . . . . . addq $44, 320(%r14)
+# YESALIAS-NEXT: [0,5] . D=================================eeeeeeeER. . . . . . . addq $44, 384(%r14)
+# YESALIAS-NEXT: [0,6] . D=======================================eeeeeeeER . . . . . addq $44, 448(%r14)
+# YESALIAS-NEXT: [0,7] . D==============================================eeeeeeeER . . . . addq $44, 512(%r14)
+# YESALIAS-NEXT: [0,8] . D====================================================eeeeeeeER . . addq $44, 576(%r14)
+# YESALIAS-NEXT: [0,9] . D===========================================================eeeeeeeER addq $44, 640(%r14)
+
+# ALL: Average Wait times (based on the timeline view):
+# ALL-NEXT: [0]: Executions
+# ALL-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# ALL-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# ALL-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# ALL: [0] [1] [2] [3]
+# ALL-NEXT: 0. 1 1.0 1.0 0.0 addq $44, 64(%r14)
+
+# NOALIAS-NEXT: 1. 1 2.0 1.0 0.0 addq $44, 128(%r14)
+# NOALIAS-NEXT: 2. 1 2.0 1.0 0.0 addq $44, 192(%r14)
+# NOALIAS-NEXT: 3. 1 3.0 1.0 0.0 addq $44, 256(%r14)
+# NOALIAS-NEXT: 4. 1 3.0 1.0 0.0 addq $44, 320(%r14)
+# NOALIAS-NEXT: 5. 1 4.0 1.0 0.0 addq $44, 384(%r14)
+# NOALIAS-NEXT: 6. 1 4.0 1.0 0.0 addq $44, 448(%r14)
+# NOALIAS-NEXT: 7. 1 5.0 1.0 0.0 addq $44, 512(%r14)
+# NOALIAS-NEXT: 8. 1 5.0 1.0 0.0 addq $44, 576(%r14)
+# NOALIAS-NEXT: 9. 1 6.0 1.0 0.0 addq $44, 640(%r14)
+# NOALIAS-NEXT: 1 3.5 1.0 0.0 <total>
+
+# YESALIAS-NEXT: 1. 1 8.0 0.0 0.0 addq $44, 128(%r14)
+# YESALIAS-NEXT: 2. 1 14.0 0.0 0.0 addq $44, 192(%r14)
+# YESALIAS-NEXT: 3. 1 21.0 0.0 0.0 addq $44, 256(%r14)
+# YESALIAS-NEXT: 4. 1 27.0 0.0 0.0 addq $44, 320(%r14)
+# YESALIAS-NEXT: 5. 1 34.0 0.0 0.0 addq $44, 384(%r14)
+# YESALIAS-NEXT: 6. 1 40.0 0.0 0.0 addq $44, 448(%r14)
+# YESALIAS-NEXT: 7. 1 47.0 0.0 0.0 addq $44, 512(%r14)
+# YESALIAS-NEXT: 8. 1 53.0 0.0 0.0 addq $44, 576(%r14)
+# YESALIAS-NEXT: 9. 1 60.0 0.0 0.0 addq $44, 640(%r14)
+# YESALIAS-NEXT: 1 30.5 0.1 0.0 <total>
More information about the llvm-commits
mailing list