[llvm] r335113 - [llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
Andrea Di Biagio via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 20 03:08:11 PDT 2018
Author: adibiagio
Date: Wed Jun 20 03:08:11 2018
New Revision: 335113
URL: http://llvm.org/viewvc/llvm-project?rev=335113&view=rev
Log:
[llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.
On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits". Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.
This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register. The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.
I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2. Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.
Differential Revision: https://reviews.llvm.org/D48225
Modified:
llvm/trunk/include/llvm/MC/MCInstrAnalysis.h
llvm/trunk/lib/MC/MCInstrAnalysis.cpp
llvm/trunk/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s
llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s
llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-1.s
llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-2.s
llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-3.s
llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-1.s
llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-2.s
llvm/trunk/tools/llvm-mca/InstrBuilder.cpp
llvm/trunk/tools/llvm-mca/InstrBuilder.h
llvm/trunk/tools/llvm-mca/Instruction.h
llvm/trunk/tools/llvm-mca/RegisterFile.cpp
llvm/trunk/tools/llvm-mca/llvm-mca.cpp
Modified: llvm/trunk/include/llvm/MC/MCInstrAnalysis.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/MC/MCInstrAnalysis.h?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/include/llvm/MC/MCInstrAnalysis.h (original)
+++ llvm/trunk/include/llvm/MC/MCInstrAnalysis.h Wed Jun 20 03:08:11 2018
@@ -22,6 +22,8 @@
namespace llvm {
+class MCRegisterInfo;
+
class MCInstrAnalysis {
protected:
friend class Target;
@@ -60,6 +62,31 @@ public:
return Info->get(Inst.getOpcode()).isTerminator();
}
+ /// Returns true if at least one of the register writes performed by
+ /// \param Inst implicitly clears the upper portion of all super-registers.
+ ///
+ /// Example: on X86-64, a write to EAX implicitly clears the upper half of
+ /// RAX. Also (still on x86) an XMM write perfomed by an AVX 128-bit
+ /// instruction implicitly clears the upper portion of the correspondent
+ /// YMM register.
+ ///
+ /// This method also updates an APInt which is used as mask of register
+ /// writes. There is one bit for every explicit/implicit write performed by
+ /// the instruction. If a write implicitly clears its super-registers, then
+ /// the corresponding bit is set (vic. the corresponding bit is cleared).
+ ///
+ /// The first bits in the APint are related to explicit writes. The remaining
+ /// bits are related to implicit writes. The sequence of writes follows the
+ /// machine operand sequence. For implicit writes, the sequence is defined by
+ /// the MCInstrDesc.
+ ///
+ /// The assumption is that the bit-width of the APInt is correctly set by
+ /// the caller. The default implementation conservatively assumes that none of
+ /// the writes clears the upper portion of a super-register.
+ virtual bool clearsSuperRegisters(const MCRegisterInfo &MRI,
+ const MCInst &Inst,
+ APInt &Writes) const;
+
/// Given a branch instruction try to get the address the branch
/// targets. Return true on success, and the address in Target.
virtual bool
Modified: llvm/trunk/lib/MC/MCInstrAnalysis.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/MC/MCInstrAnalysis.cpp?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/lib/MC/MCInstrAnalysis.cpp (original)
+++ llvm/trunk/lib/MC/MCInstrAnalysis.cpp Wed Jun 20 03:08:11 2018
@@ -8,6 +8,8 @@
//===----------------------------------------------------------------------===//
#include "llvm/MC/MCInstrAnalysis.h"
+
+#include "llvm/ADT/APInt.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"
@@ -15,6 +17,13 @@
using namespace llvm;
+bool MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,
+ const MCInst &Inst,
+ APInt &Writes) const {
+ Writes.clearAllBits();
+ return false;
+}
+
bool MCInstrAnalysis::evaluateBranch(const MCInst &Inst, uint64_t Addr,
uint64_t Size, uint64_t &Target) const {
if (Inst.getNumOperands() == 0 ||
Modified: llvm/trunk/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp (original)
+++ llvm/trunk/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp Wed Jun 20 03:08:11 2018
@@ -14,7 +14,9 @@
#include "X86MCTargetDesc.h"
#include "InstPrinter/X86ATTInstPrinter.h"
#include "InstPrinter/X86IntelInstPrinter.h"
+#include "X86BaseInfo.h"
#include "X86MCAsmInfo.h"
+#include "llvm/ADT/APInt.h"
#include "llvm/ADT/Triple.h"
#include "llvm/DebugInfo/CodeView/CodeView.h"
#include "llvm/MC/MCInstrAnalysis.h"
@@ -293,8 +295,79 @@ static MCRelocationInfo *createX86MCRelo
return llvm::createMCRelocationInfo(TheTriple, Ctx);
}
+namespace llvm {
+namespace X86_MC {
+
+class X86MCInstrAnalysis : public MCInstrAnalysis {
+ X86MCInstrAnalysis(const X86MCInstrAnalysis &) = delete;
+ X86MCInstrAnalysis &operator=(const X86MCInstrAnalysis &) = delete;
+ virtual ~X86MCInstrAnalysis() = default;
+
+public:
+ X86MCInstrAnalysis(const MCInstrInfo *MCII) : MCInstrAnalysis(MCII) {}
+
+ bool clearsSuperRegisters(const MCRegisterInfo &MRI, const MCInst &Inst,
+ APInt &Mask) const override;
+};
+
+bool X86MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,
+ const MCInst &Inst,
+ APInt &Mask) const {
+ const MCInstrDesc &Desc = Info->get(Inst.getOpcode());
+ unsigned NumDefs = Desc.getNumDefs();
+ unsigned NumImplicitDefs = Desc.getNumImplicitDefs();
+ assert(Mask.getBitWidth() == NumDefs + NumImplicitDefs &&
+ "Unexpected number of bits in the mask!");
+
+ bool HasVEX = (Desc.TSFlags & X86II::EncodingMask) == X86II::VEX;
+ bool HasEVEX = (Desc.TSFlags & X86II::EncodingMask) == X86II::EVEX;
+ bool HasXOP = (Desc.TSFlags & X86II::EncodingMask) == X86II::XOP;
+
+ const MCRegisterClass &GR32RC = MRI.getRegClass(X86::GR32RegClassID);
+ const MCRegisterClass &VR128XRC = MRI.getRegClass(X86::VR128XRegClassID);
+ const MCRegisterClass &VR256XRC = MRI.getRegClass(X86::VR256XRegClassID);
+
+ auto ClearsSuperReg = [=](unsigned RegID) {
+ // On X86-64, a general purpose integer register is viewed as a 64-bit
+ // register internal to the processor.
+ // An update to the lower 32 bits of a 64 bit integer register is
+ // architecturally defined to zero extend the upper 32 bits.
+ if (GR32RC.contains(RegID))
+ return true;
+
+ // Early exit if this instruction has no vex/evex/xop prefix.
+ if (!HasEVEX && !HasVEX && !HasXOP)
+ return false;
+
+ // All VEX and EVEX encoded instructions are defined to zero the high bits
+ // of the destination register up to VLMAX (i.e. the maximum vector register
+ // width pertaining to the instruction).
+ // We assume the same behavior for XOP instructions too.
+ return VR128XRC.contains(RegID) || VR256XRC.contains(RegID);
+ };
+
+ Mask.clearAllBits();
+ for (unsigned I = 0, E = NumDefs; I < E; ++I) {
+ const MCOperand &Op = Inst.getOperand(I);
+ if (ClearsSuperReg(Op.getReg()))
+ Mask.setBit(I);
+ }
+
+ for (unsigned I = 0, E = NumImplicitDefs; I < E; ++I) {
+ const MCPhysReg Reg = Desc.getImplicitDefs()[I];
+ if (ClearsSuperReg(Reg))
+ Mask.setBit(NumDefs + I);
+ }
+
+ return Mask.getBoolValue();
+}
+
+} // end of namespace X86_MC
+
+} // end of namespace llvm
+
static MCInstrAnalysis *createX86MCInstrAnalysis(const MCInstrInfo *Info) {
- return new MCInstrAnalysis(Info);
+ return new X86_MC::X86MCInstrAnalysis(Info);
}
// Force static initialization.
Modified: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-1.s Wed Jun 20 03:08:11 2018
@@ -3,7 +3,7 @@
## Sets register RAX.
imulq $5, %rcx, %rax
-
+
## Kills the previous definition of RAX.
## The upper portion of RAX is cleared.
lzcnt %ecx, %eax
@@ -15,9 +15,9 @@ bsf %rax, %rcx
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 400
-# CHECK-NEXT: Total Cycles: 1203
+# CHECK-NEXT: Total Cycles: 704
# CHECK-NEXT: Dispatch Width: 2
-# CHECK-NEXT: IPC: 0.33
+# CHECK-NEXT: IPC: 0.57
# CHECK-NEXT: Block RThroughput: 6.0
# CHECK: Instruction Info:
@@ -35,17 +35,17 @@ bsf %rax, %rcx
# CHECK-NEXT: 8 5 2.00 bsfq %rax, %rcx
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789
-# CHECK-NEXT: Index 0123456789 0123456
+# CHECK-NEXT: 01234567
+# CHECK-NEXT: Index 0123456789
-# CHECK: [0,0] DeeeeeeER . . . .. imulq $5, %rcx, %rax
-# CHECK-NEXT: [0,1] .DeE----R . . . .. lzcntl %ecx, %eax
-# CHECK-NEXT: [0,2] .D=====eER. . . .. andq %rcx, %rax
-# CHECK-NEXT: [0,3] . D=====eeeeeER. . .. bsfq %rax, %rcx
-# CHECK-NEXT: [1,0] . .D======eeeeeeER .. imulq $5, %rcx, %rax
-# CHECK-NEXT: [1,1] . . D=====eE-----R .. lzcntl %ecx, %eax
-# CHECK-NEXT: [1,2] . . D===========eER .. andq %rcx, %rax
-# CHECK-NEXT: [1,3] . . D===========eeeeeER bsfq %rax, %rcx
+# CHECK: [0,0] DeeeeeeER . . . imulq $5, %rcx, %rax
+# CHECK-NEXT: [0,1] .DeE----R . . . lzcntl %ecx, %eax
+# CHECK-NEXT: [0,2] .D=eE----R. . . andq %rcx, %rax
+# CHECK-NEXT: [0,3] . D=eeeeeER . . bsfq %rax, %rcx
+# CHECK-NEXT: [1,0] . .D==eeeeeeER. imulq $5, %rcx, %rax
+# CHECK-NEXT: [1,1] . . D=eE-----R. lzcntl %ecx, %eax
+# CHECK-NEXT: [1,2] . . D==eE-----R andq %rcx, %rax
+# CHECK-NEXT: [1,3] . . D==eeeeeER bsfq %rax, %rcx
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -54,7 +54,7 @@ bsf %rax, %rcx
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 4.0 0.5 0.0 imulq $5, %rcx, %rax
-# CHECK-NEXT: 1. 2 3.5 0.5 4.5 lzcntl %ecx, %eax
-# CHECK-NEXT: 2. 2 9.0 0.0 0.0 andq %rcx, %rax
-# CHECK-NEXT: 3. 2 9.0 0.0 0.0 bsfq %rax, %rcx
+# CHECK-NEXT: 0. 2 2.0 0.5 0.0 imulq $5, %rcx, %rax
+# CHECK-NEXT: 1. 2 1.5 0.5 4.5 lzcntl %ecx, %eax
+# CHECK-NEXT: 2. 2 2.5 0.0 4.5 andq %rcx, %rax
+# CHECK-NEXT: 3. 2 2.5 0.0 0.0 bsfq %rax, %rcx
Modified: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/clear-super-register-2.s Wed Jun 20 03:08:11 2018
@@ -33,9 +33,9 @@ vandps %xmm4, %xmm1, %xmm0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 1800
-# CHECK-NEXT: Total Cycles: 7003
+# CHECK-NEXT: Total Cycles: 3811
# CHECK-NEXT: Dispatch Width: 2
-# CHECK-NEXT: IPC: 0.26
+# CHECK-NEXT: IPC: 0.47
# CHECK-NEXT: Block RThroughput: 38.0
# CHECK: Instruction Info:
@@ -67,27 +67,31 @@ vandps %xmm4, %xmm1, %xmm0
# CHECK-NEXT: 1 1 0.50 vandps %xmm4, %xmm1, %xmm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789
# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789
-# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER . . . . . . . vdivps %ymm0, %ymm1, %ymm3
-# CHECK-NEXT: [0,1] .DeeeE----------------------------------R . . . . . . . vaddps %xmm0, %xmm1, %xmm3
-# CHECK-NEXT: [0,2] . D====================================eeeER . . . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,3] . D=====================================eeeER . . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,4] . D======================================eeeER . . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,5] . D=======================================eeeER. . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,6] . .D========================================eeeER . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,7] . . D=========================================eeeER . . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,8] . . D==========================================eeeER . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,9] . . D===========================================eeeER . . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,10] . . D============================================eeeER. . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,11] . . .D=============================================eeeER . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,12] . . . D==============================================eeeER . . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,13] . . . D===============================================eeeER . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,14] . . . D================================================eeeER . . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,15] . . . D=================================================eeeER. . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,16] . . . .D==================================================eeeER . vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: [0,17] . . . . D====================================================eER . vandps %xmm4, %xmm1, %xmm0
+# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER . . . . . . . . vdivps %ymm0, %ymm1, %ymm3
+# CHECK-NEXT: [0,1] .DeeeE----------------------------------R . . . . . . . . vaddps %xmm0, %xmm1, %xmm3
+# CHECK-NEXT: [0,2] . D==eeeE--------------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,3] . D===eeeE------------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,4] . D====eeeE-----------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,5] . D=====eeeE---------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,6] . .D======eeeE--------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,7] . . D=======eeeE------------------------R . . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,8] . . D========eeeE-----------------------R. . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,9] . . D=========eeeE---------------------R. . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,10] . . D==========eeeE--------------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,11] . . .D===========eeeE------------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,12] . . . D============eeeE-----------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,13] . . . D=============eeeE---------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,14] . . . D==============eeeE--------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,15] . . . D===============eeeE------------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,16] . . . .D================eeeE-----------R . . . . . . . vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [0,17] . . . . D==================eE----------R . . . . . . . vandps %xmm4, %xmm1, %xmm0
+# CHECK-NEXT: [1,0] . . . . D====================eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER. vdivps %ymm0, %ymm1, %ymm3
+# CHECK-NEXT: [1,1] . . . . D=================eeeE-------------------------------------R. vaddps %xmm0, %xmm1, %xmm3
+# CHECK-NEXT: [1,2] . . . . D===================eeeE-----------------------------------R vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: [1,3] . . . . .D====================eeeE---------------------------------R vaddps %ymm3, %ymm1, %ymm4
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -96,21 +100,21 @@ vandps %xmm4, %xmm1, %xmm0
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vdivps %ymm0, %ymm1, %ymm3
-# CHECK-NEXT: 1. 1 1.0 1.0 34.0 vaddps %xmm0, %xmm1, %xmm3
-# CHECK-NEXT: 2. 1 37.0 0.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 3. 1 38.0 2.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 4. 1 39.0 4.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 5. 1 40.0 6.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 6. 1 41.0 8.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 7. 1 42.0 10.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 8. 1 43.0 12.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 9. 1 44.0 14.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 10. 1 45.0 16.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 11. 1 46.0 18.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 12. 1 47.0 20.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 13. 1 48.0 22.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 14. 1 49.0 24.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 15. 1 50.0 26.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 16. 1 51.0 28.0 0.0 vaddps %ymm3, %ymm1, %ymm4
-# CHECK-NEXT: 17. 1 53.0 0.0 0.0 vandps %xmm4, %xmm1, %xmm0
+# CHECK-NEXT: 0. 2 11.0 1.5 0.0 vdivps %ymm0, %ymm1, %ymm3
+# CHECK-NEXT: 1. 2 9.5 0.5 35.5 vaddps %xmm0, %xmm1, %xmm3
+# CHECK-NEXT: 2. 2 11.5 0.0 33.5 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 3. 2 12.5 2.0 31.5 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 4. 1 5.0 4.0 29.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 5. 1 6.0 6.0 27.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 6. 1 7.0 7.0 26.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 7. 1 8.0 8.0 24.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 8. 1 9.0 9.0 23.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 9. 1 10.0 10.0 21.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 10. 1 11.0 11.0 20.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 11. 1 12.0 12.0 18.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 12. 1 13.0 13.0 17.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 13. 1 14.0 14.0 15.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 14. 1 15.0 15.0 14.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 15. 1 16.0 16.0 12.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 16. 1 17.0 17.0 11.0 vaddps %ymm3, %ymm1, %ymm4
+# CHECK-NEXT: 17. 1 19.0 0.0 10.0 vandps %xmm4, %xmm1, %xmm0
Modified: llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-1.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-1.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-1.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-1.s Wed Jun 20 03:08:11 2018
@@ -10,9 +10,9 @@
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
-# CHECK-NEXT: Total Cycles: 2103
+# CHECK-NEXT: Total Cycles: 318
# CHECK-NEXT: Dispatch Width: 4
-# CHECK-NEXT: IPC: 0.29
+# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Instruction Info:
@@ -55,21 +55,21 @@
# CHECK-NEXT: - - - 1.00 - - - - vaddps %xmm4, %xmm5, %xmm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789
-# CHECK-NEXT: Index 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
-# CHECK: [0,0] DeeeeeER . . . . . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [0,1] DeeeE--R . . . . . . . . vaddps %xmm1, %xmm1, %xmm2
-# CHECK-NEXT: [0,2] D=====eeeeeER . . . . . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [0,3] D==========eeeER . . . . . . vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: [0,4] .D============eeeeeER . . . . . vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: [0,5] .D=================eeeER . . . . . vaddps %xmm4, %xmm5, %xmm0
-# CHECK-NEXT: [1,0] .D====================eeeeeER . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [1,1] .DeeeE----------------------R . . . . vaddps %xmm1, %xmm1, %xmm2
-# CHECK-NEXT: [1,2] . D========================eeeeeER . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [1,3] . D=============================eeeER . . vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: [1,4] . D================================eeeeeER . vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: [1,5] . D=====================================eeeER vaddps %xmm4, %xmm5, %xmm0
+# CHECK: [0,0] DeeeeeER . . . .. vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [0,1] DeeeE--R . . . .. vaddps %xmm1, %xmm1, %xmm2
+# CHECK-NEXT: [0,2] D===eeeeeER . . .. vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [0,3] D========eeeER . . .. vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: [0,4] .D==========eeeeeER . .. vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: [0,5] .D===============eeeER .. vaddps %xmm4, %xmm5, %xmm0
+# CHECK-NEXT: [1,0] .D==================eeeeeER vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [1,1] .DeeeE--------------------R vaddps %xmm1, %xmm1, %xmm2
+# CHECK-NEXT: [1,2] . D==eeeeeE---------------R vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [1,3] . D=======eeeE------------R vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: [1,4] . D==========eeeeeE-------R vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: [1,5] . D===============eeeE----R vaddps %xmm4, %xmm5, %xmm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -78,9 +78,9 @@
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 11.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: 1. 2 1.0 1.0 12.0 vaddps %xmm1, %xmm1, %xmm2
-# CHECK-NEXT: 2. 2 15.5 0.0 0.0 vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: 3. 2 20.5 0.0 0.0 vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: 4. 2 23.0 0.0 0.0 vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: 5. 2 28.0 0.0 0.0 vaddps %xmm4, %xmm5, %xmm0
+# CHECK-NEXT: 0. 2 10.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: 1. 2 1.0 1.0 11.0 vaddps %xmm1, %xmm1, %xmm2
+# CHECK-NEXT: 2. 2 3.5 0.0 7.5 vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: 3. 2 8.5 0.0 6.0 vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: 4. 2 11.0 0.0 3.5 vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: 5. 2 16.0 0.0 2.0 vaddps %xmm4, %xmm5, %xmm0
Modified: llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-2.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-2.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-2.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-2.s Wed Jun 20 03:08:11 2018
@@ -10,9 +10,9 @@
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
-# CHECK-NEXT: Total Cycles: 2103
+# CHECK-NEXT: Total Cycles: 318
# CHECK-NEXT: Dispatch Width: 4
-# CHECK-NEXT: IPC: 0.29
+# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Instruction Info:
@@ -55,21 +55,21 @@
# CHECK-NEXT: - - - 1.00 - - - - vaddps %xmm4, %xmm5, %xmm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789
-# CHECK-NEXT: Index 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
-# CHECK: [0,0] DeeeeeER . . . . . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [0,1] DeeeE--R . . . . . . . . vaddps %ymm1, %ymm1, %ymm2
-# CHECK-NEXT: [0,2] D=====eeeeeER . . . . . . . vmulps %zmm2, %zmm3, %zmm4
-# CHECK-NEXT: [0,3] D==========eeeER . . . . . . vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: [0,4] .D============eeeeeER . . . . . vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: [0,5] .D=================eeeER . . . . . vaddps %xmm4, %xmm5, %xmm0
-# CHECK-NEXT: [1,0] .D====================eeeeeER . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [1,1] .DeeeE----------------------R . . . . vaddps %ymm1, %ymm1, %ymm2
-# CHECK-NEXT: [1,2] . D========================eeeeeER . . . vmulps %zmm2, %zmm3, %zmm4
-# CHECK-NEXT: [1,3] . D=============================eeeER . . vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: [1,4] . D================================eeeeeER . vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: [1,5] . D=====================================eeeER vaddps %xmm4, %xmm5, %xmm0
+# CHECK: [0,0] DeeeeeER . . . .. vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [0,1] DeeeE--R . . . .. vaddps %ymm1, %ymm1, %ymm2
+# CHECK-NEXT: [0,2] D===eeeeeER . . .. vmulps %zmm2, %zmm3, %zmm4
+# CHECK-NEXT: [0,3] D========eeeER . . .. vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: [0,4] .D==========eeeeeER . .. vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: [0,5] .D===============eeeER .. vaddps %xmm4, %xmm5, %xmm0
+# CHECK-NEXT: [1,0] .D==================eeeeeER vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [1,1] .DeeeE--------------------R vaddps %ymm1, %ymm1, %ymm2
+# CHECK-NEXT: [1,2] . D==eeeeeE---------------R vmulps %zmm2, %zmm3, %zmm4
+# CHECK-NEXT: [1,3] . D=======eeeE------------R vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: [1,4] . D==========eeeeeE-------R vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: [1,5] . D===============eeeE----R vaddps %xmm4, %xmm5, %xmm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -78,9 +78,9 @@
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 11.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: 1. 2 1.0 1.0 12.0 vaddps %ymm1, %ymm1, %ymm2
-# CHECK-NEXT: 2. 2 15.5 0.0 0.0 vmulps %zmm2, %zmm3, %zmm4
-# CHECK-NEXT: 3. 2 20.5 0.0 0.0 vaddps %xmm4, %xmm5, %xmm6
-# CHECK-NEXT: 4. 2 23.0 0.0 0.0 vmulps %xmm6, %xmm3, %xmm4
-# CHECK-NEXT: 5. 2 28.0 0.0 0.0 vaddps %xmm4, %xmm5, %xmm0
+# CHECK-NEXT: 0. 2 10.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: 1. 2 1.0 1.0 11.0 vaddps %ymm1, %ymm1, %ymm2
+# CHECK-NEXT: 2. 2 3.5 0.0 7.5 vmulps %zmm2, %zmm3, %zmm4
+# CHECK-NEXT: 3. 2 8.5 0.0 6.0 vaddps %xmm4, %xmm5, %xmm6
+# CHECK-NEXT: 4. 2 11.0 0.0 3.5 vmulps %xmm6, %xmm3, %xmm4
+# CHECK-NEXT: 5. 2 16.0 0.0 2.0 vaddps %xmm4, %xmm5, %xmm0
Modified: llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-3.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-3.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-3.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/Generic/avx512-super-registers-3.s Wed Jun 20 03:08:11 2018
@@ -10,9 +10,9 @@
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
-# CHECK-NEXT: Total Cycles: 2103
+# CHECK-NEXT: Total Cycles: 318
# CHECK-NEXT: Dispatch Width: 4
-# CHECK-NEXT: IPC: 0.29
+# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Instruction Info:
@@ -55,21 +55,21 @@
# CHECK-NEXT: - - - 1.00 - - - - vaddps %xmm4, %xmm20, %xmm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789
-# CHECK-NEXT: Index 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
-# CHECK: [0,0] DeeeeeER . . . . . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [0,1] DeeeE--R . . . . . . . . vaddps %xmm16, %xmm17, %xmm2
-# CHECK-NEXT: [0,2] D=====eeeeeER . . . . . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [0,3] D==========eeeER . . . . . . vaddps %xmm4, %xmm18, %xmm6
-# CHECK-NEXT: [0,4] .D============eeeeeER . . . . . vmulps %xmm6, %xmm19, %xmm4
-# CHECK-NEXT: [0,5] .D=================eeeER . . . . . vaddps %xmm4, %xmm20, %xmm0
-# CHECK-NEXT: [1,0] .D====================eeeeeER . . . . vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: [1,1] .DeeeE----------------------R . . . . vaddps %xmm16, %xmm17, %xmm2
-# CHECK-NEXT: [1,2] . D========================eeeeeER . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [1,3] . D=============================eeeER . . vaddps %xmm4, %xmm18, %xmm6
-# CHECK-NEXT: [1,4] . D================================eeeeeER . vmulps %xmm6, %xmm19, %xmm4
-# CHECK-NEXT: [1,5] . D=====================================eeeER vaddps %xmm4, %xmm20, %xmm0
+# CHECK: [0,0] DeeeeeER . . . .. vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [0,1] DeeeE--R . . . .. vaddps %xmm16, %xmm17, %xmm2
+# CHECK-NEXT: [0,2] D===eeeeeER . . .. vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [0,3] D========eeeER . . .. vaddps %xmm4, %xmm18, %xmm6
+# CHECK-NEXT: [0,4] .D==========eeeeeER . .. vmulps %xmm6, %xmm19, %xmm4
+# CHECK-NEXT: [0,5] .D===============eeeER .. vaddps %xmm4, %xmm20, %xmm0
+# CHECK-NEXT: [1,0] .D==================eeeeeER vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: [1,1] .DeeeE--------------------R vaddps %xmm16, %xmm17, %xmm2
+# CHECK-NEXT: [1,2] . D==eeeeeE---------------R vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [1,3] . D=======eeeE------------R vaddps %xmm4, %xmm18, %xmm6
+# CHECK-NEXT: [1,4] . D==========eeeeeE-------R vmulps %xmm6, %xmm19, %xmm4
+# CHECK-NEXT: [1,5] . D===============eeeE----R vaddps %xmm4, %xmm20, %xmm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -78,9 +78,9 @@
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 11.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
-# CHECK-NEXT: 1. 2 1.0 1.0 12.0 vaddps %xmm16, %xmm17, %xmm2
-# CHECK-NEXT: 2. 2 15.5 0.0 0.0 vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: 3. 2 20.5 0.0 0.0 vaddps %xmm4, %xmm18, %xmm6
-# CHECK-NEXT: 4. 2 23.0 0.0 0.0 vmulps %xmm6, %xmm19, %xmm4
-# CHECK-NEXT: 5. 2 28.0 0.0 0.0 vaddps %xmm4, %xmm20, %xmm0
+# CHECK-NEXT: 0. 2 10.0 0.5 0.0 vmulps %zmm0, %zmm1, %zmm2
+# CHECK-NEXT: 1. 2 1.0 1.0 11.0 vaddps %xmm16, %xmm17, %xmm2
+# CHECK-NEXT: 2. 2 3.5 0.0 7.5 vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: 3. 2 8.5 0.0 6.0 vaddps %xmm4, %xmm18, %xmm6
+# CHECK-NEXT: 4. 2 11.0 0.0 3.5 vmulps %xmm6, %xmm19, %xmm4
+# CHECK-NEXT: 5. 2 16.0 0.0 2.0 vaddps %xmm4, %xmm20, %xmm0
Modified: llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-1.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-1.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-1.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-1.s Wed Jun 20 03:08:11 2018
@@ -10,9 +10,9 @@
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
-# CHECK-NEXT: Total Cycles: 2103
+# CHECK-NEXT: Total Cycles: 318
# CHECK-NEXT: Dispatch Width: 4
-# CHECK-NEXT: IPC: 0.29
+# CHECK-NEXT: IPC: 1.89
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Instruction Info:
@@ -55,21 +55,21 @@
# CHECK-NEXT: - - - 1.00 - - - - vaddps %ymm4, %ymm5, %ymm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789
-# CHECK-NEXT: Index 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 0123456
-# CHECK: [0,0] DeeeeeER . . . . . . . . vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: [0,1] DeeeE--R . . . . . . . . vfrczpd %xmm1, %xmm2
-# CHECK-NEXT: [0,2] D=====eeeeeER . . . . . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [0,3] D==========eeeER . . . . . . vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: [0,4] .D============eeeeeER . . . . . vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: [0,5] .D=================eeeER . . . . . vaddps %ymm4, %ymm5, %ymm0
-# CHECK-NEXT: [1,0] .D====================eeeeeER . . . . vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: [1,1] .DeeeE----------------------R . . . . vfrczpd %xmm1, %xmm2
-# CHECK-NEXT: [1,2] . D========================eeeeeER . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [1,3] . D=============================eeeER . . vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: [1,4] . D================================eeeeeER . vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: [1,5] . D=====================================eeeER vaddps %ymm4, %ymm5, %ymm0
+# CHECK: [0,0] DeeeeeER . . . .. vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: [0,1] DeeeE--R . . . .. vfrczpd %xmm1, %xmm2
+# CHECK-NEXT: [0,2] D===eeeeeER . . .. vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [0,3] D========eeeER . . .. vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: [0,4] .D==========eeeeeER . .. vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: [0,5] .D===============eeeER .. vaddps %ymm4, %ymm5, %ymm0
+# CHECK-NEXT: [1,0] .D==================eeeeeER vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: [1,1] .DeeeE--------------------R vfrczpd %xmm1, %xmm2
+# CHECK-NEXT: [1,2] . D==eeeeeE---------------R vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [1,3] . D=======eeeE------------R vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: [1,4] . D==========eeeeeE-------R vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: [1,5] . D===============eeeE----R vaddps %ymm4, %ymm5, %ymm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -78,9 +78,9 @@
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 11.0 0.5 0.0 vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: 1. 2 1.0 1.0 12.0 vfrczpd %xmm1, %xmm2
-# CHECK-NEXT: 2. 2 15.5 0.0 0.0 vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: 3. 2 20.5 0.0 0.0 vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: 4. 2 23.0 0.0 0.0 vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: 5. 2 28.0 0.0 0.0 vaddps %ymm4, %ymm5, %ymm0
+# CHECK-NEXT: 0. 2 10.0 0.5 0.0 vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: 1. 2 1.0 1.0 11.0 vfrczpd %xmm1, %xmm2
+# CHECK-NEXT: 2. 2 3.5 0.0 7.5 vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: 3. 2 8.5 0.0 6.0 vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: 4. 2 11.0 0.0 3.5 vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: 5. 2 16.0 0.0 2.0 vaddps %ymm4, %ymm5, %ymm0
Modified: llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-2.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-2.s?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-2.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/Generic/xop-super-registers-2.s Wed Jun 20 03:08:11 2018
@@ -10,9 +10,9 @@
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 600
-# CHECK-NEXT: Total Cycles: 2103
+# CHECK-NEXT: Total Cycles: 316
# CHECK-NEXT: Dispatch Width: 4
-# CHECK-NEXT: IPC: 0.29
+# CHECK-NEXT: IPC: 1.90
# CHECK-NEXT: Block RThroughput: 3.0
# CHECK: Instruction Info:
@@ -55,21 +55,21 @@
# CHECK-NEXT: - - - 1.00 - - - - vaddps %ymm4, %ymm5, %ymm0
# CHECK: Timeline view:
-# CHECK-NEXT: 0123456789 0123456789
-# CHECK-NEXT: Index 0123456789 0123456789 01234
+# CHECK-NEXT: 0123456789
+# CHECK-NEXT: Index 0123456789 01234
-# CHECK: [0,0] DeeeeeER . . . . . . . . vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: [0,1] DeE----R . . . . . . . . vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
-# CHECK-NEXT: [0,2] D=====eeeeeER . . . . . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [0,3] D==========eeeER . . . . . . vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: [0,4] .D============eeeeeER . . . . . vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: [0,5] .D=================eeeER . . . . . vaddps %ymm4, %ymm5, %ymm0
-# CHECK-NEXT: [1,0] .D====================eeeeeER . . . . vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: [1,1] .DeE------------------------R . . . . vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
-# CHECK-NEXT: [1,2] . D========================eeeeeER . . . vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: [1,3] . D=============================eeeER . . vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: [1,4] . D================================eeeeeER . vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: [1,5] . D=====================================eeeER vaddps %ymm4, %ymm5, %ymm0
+# CHECK: [0,0] DeeeeeER . . . . vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: [0,1] DeE----R . . . . vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
+# CHECK-NEXT: [0,2] D=eeeeeER . . . . vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [0,3] D======eeeER . . . vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: [0,4] .D========eeeeeER . . vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: [0,5] .D=============eeeER. . vaddps %ymm4, %ymm5, %ymm0
+# CHECK-NEXT: [1,0] .D================eeeeeER vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: [1,1] .DeE--------------------R vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
+# CHECK-NEXT: [1,2] . DeeeeeE---------------R vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: [1,3] . D=====eeeE------------R vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: [1,4] . D========eeeeeE-------R vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: [1,5] . D=============eeeE----R vaddps %ymm4, %ymm5, %ymm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -78,9 +78,9 @@
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 11.0 0.5 0.0 vmulps %ymm0, %ymm1, %ymm2
-# CHECK-NEXT: 1. 2 1.0 1.0 14.0 vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
-# CHECK-NEXT: 2. 2 15.5 0.0 0.0 vmulps %ymm2, %ymm3, %ymm4
-# CHECK-NEXT: 3. 2 20.5 0.0 0.0 vaddps %ymm4, %ymm5, %ymm6
-# CHECK-NEXT: 4. 2 23.0 0.0 0.0 vmulps %ymm6, %ymm3, %ymm4
-# CHECK-NEXT: 5. 2 28.0 0.0 0.0 vaddps %ymm4, %ymm5, %ymm0
+# CHECK-NEXT: 0. 2 9.0 0.5 0.0 vmulps %ymm0, %ymm1, %ymm2
+# CHECK-NEXT: 1. 2 1.0 1.0 12.0 vpermil2pd $16, %xmm3, %xmm5, %xmm1, %xmm2
+# CHECK-NEXT: 2. 2 1.5 0.0 7.5 vmulps %ymm2, %ymm3, %ymm4
+# CHECK-NEXT: 3. 2 6.5 0.0 6.0 vaddps %ymm4, %ymm5, %ymm6
+# CHECK-NEXT: 4. 2 9.0 0.0 3.5 vmulps %ymm6, %ymm3, %ymm4
+# CHECK-NEXT: 5. 2 14.0 0.0 2.0 vaddps %ymm4, %ymm5, %ymm0
Modified: llvm/trunk/tools/llvm-mca/InstrBuilder.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/InstrBuilder.cpp?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/InstrBuilder.cpp (original)
+++ llvm/trunk/tools/llvm-mca/InstrBuilder.cpp Wed Jun 20 03:08:11 2018
@@ -13,6 +13,7 @@
//===----------------------------------------------------------------------===//
#include "InstrBuilder.h"
+#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/MC/MCInst.h"
#include "llvm/Support/Debug.h"
@@ -158,23 +159,6 @@ static void populateWrites(InstrDesc &ID
const MCInstrDesc &MCDesc,
const MCSchedClassDesc &SCDesc,
const MCSubtargetInfo &STI) {
- // Set if writes through this opcode may update super registers.
- // TODO: on x86-64, a 4 byte write of a general purpose register always
- // fully updates the super-register.
- // More in general, (at least on x86) not all register writes perform
- // a partial (super-)register update.
- // For example, an AVX instruction that writes on a XMM register implicitly
- // zeroes the upper half of every aliasing super-register.
- //
- // For now, we pessimistically assume that writes are all potentially
- // partial register updates. This is a good default for most targets, execept
- // for those like x86 which implement a special semantic for certain opcodes.
- // At least on x86, this may lead to an inaccurate prediction of the
- // instruction level parallelism.
- bool FullyUpdatesSuperRegisters = false;
-
- // Now Populate Writes.
-
// This algorithm currently works under the strong (and potentially incorrect)
// assumption that information related to register def/uses can be obtained
// from MCInstrDesc.
@@ -275,7 +259,6 @@ static void populateWrites(InstrDesc &ID
Write.Latency = ID.MaxLatency;
Write.SClassOrWriteResourceID = 0;
}
- Write.FullyUpdatesSuperRegs = FullyUpdatesSuperRegisters;
Write.IsOptionalDef = false;
LLVM_DEBUG({
dbgs() << "\t\tOpIdx=" << Write.OpIndex << ", Latency=" << Write.Latency
@@ -488,16 +471,35 @@ InstrBuilder::createInstruction(const MC
NewIS->getUses().emplace_back(llvm::make_unique<ReadState>(RD, RegID));
}
+ // Early exit if there are no writes.
+ if (D.Writes.empty())
+ return NewIS;
+
+ // Track register writes that implicitly clear the upper portion of the
+ // underlying super-registers using an APInt.
+ APInt WriteMask(D.Writes.size(), 0);
+
+ // Now query the MCInstrAnalysis object to obtain information about which
+ // register writes implicitly clear the upper portion of a super-register.
+ MCIA.clearsSuperRegisters(MRI, MCI, WriteMask);
+
// Initialize writes.
+ unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {
unsigned RegID =
WD.OpIndex == -1 ? WD.RegisterID : MCI.getOperand(WD.OpIndex).getReg();
// Check if this is a optional definition that references NoReg.
- if (WD.IsOptionalDef && !RegID)
+ if (WD.IsOptionalDef && !RegID) {
+ ++WriteIndex;
continue;
+ }
assert(RegID && "Expected a valid register ID!");
- NewIS->getDefs().emplace_back(llvm::make_unique<WriteState>(WD, RegID));
+ APInt CurrWriteMask = WriteMask & (1 << WriteIndex);
+ bool UpdatesSuperRegisters = CurrWriteMask.getBoolValue();
+ NewIS->getDefs().emplace_back(
+ llvm::make_unique<WriteState>(WD, RegID, UpdatesSuperRegisters));
+ ++WriteIndex;
}
return NewIS;
Modified: llvm/trunk/tools/llvm-mca/InstrBuilder.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/InstrBuilder.h?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/InstrBuilder.h (original)
+++ llvm/trunk/tools/llvm-mca/InstrBuilder.h Wed Jun 20 03:08:11 2018
@@ -17,7 +17,9 @@
#include "Instruction.h"
#include "Support.h"
+#include "llvm/MC/MCInstrAnalysis.h"
#include "llvm/MC/MCInstrInfo.h"
+#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
namespace mca {
@@ -37,6 +39,8 @@ class DispatchUnit;
class InstrBuilder {
const llvm::MCSubtargetInfo &STI;
const llvm::MCInstrInfo &MCII;
+ const llvm::MCRegisterInfo &MRI;
+ const llvm::MCInstrAnalysis &MCIA;
llvm::SmallVector<uint64_t, 8> ProcResourceMasks;
llvm::DenseMap<unsigned short, std::unique_ptr<const InstrDesc>> Descriptors;
@@ -48,8 +52,10 @@ class InstrBuilder {
InstrBuilder &operator=(const InstrBuilder &) = delete;
public:
- InstrBuilder(const llvm::MCSubtargetInfo &sti, const llvm::MCInstrInfo &mcii)
- : STI(sti), MCII(mcii),
+ InstrBuilder(const llvm::MCSubtargetInfo &sti, const llvm::MCInstrInfo &mcii,
+ const llvm::MCRegisterInfo &mri,
+ const llvm::MCInstrAnalysis &mcia)
+ : STI(sti), MCII(mcii), MRI(mri), MCIA(mcia),
ProcResourceMasks(STI.getSchedModel().getNumProcResourceKinds()) {
computeProcResourceMasks(STI.getSchedModel(), ProcResourceMasks);
}
Modified: llvm/trunk/tools/llvm-mca/Instruction.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/Instruction.h?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/Instruction.h (original)
+++ llvm/trunk/tools/llvm-mca/Instruction.h Wed Jun 20 03:08:11 2018
@@ -70,11 +70,6 @@ struct WriteDescriptor {
// This field is set to a value different than zero only if this
// is an implicit definition.
unsigned RegisterID;
- // True if this write generates a partial update of a super-registers.
- // On X86, this flag is set by byte/word writes on GPR registers. Also,
- // a write of an XMM register only partially updates the corresponding
- // YMM super-register if the write is associated to a legacy SSE instruction.
- bool FullyUpdatesSuperRegs;
// Instruction itineraries would set this field to the SchedClass ID.
// Otherwise, it defaults to the WriteResourceID from the MCWriteLatencyEntry
// element associated to this write.
@@ -129,6 +124,10 @@ class WriteState {
// field RegisterID from WD.
unsigned RegisterID;
+ // True if this write implicitly clears the upper portion of RegisterID's
+ // super-registers.
+ bool ClearsSuperRegs;
+
// A list of dependent reads. Users is a set of dependent
// reads. A dependent read is added to the set only if CyclesLeft
// is "unknown". As soon as CyclesLeft is 'known', each user in the set
@@ -138,8 +137,10 @@ class WriteState {
std::set<std::pair<ReadState *, int>> Users;
public:
- WriteState(const WriteDescriptor &Desc, unsigned RegID)
- : WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID) {}
+ WriteState(const WriteDescriptor &Desc, unsigned RegID,
+ bool clearsSuperRegs = false)
+ : WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID),
+ ClearsSuperRegs(clearsSuperRegs) {}
WriteState(const WriteState &Other) = delete;
WriteState &operator=(const WriteState &Other) = delete;
@@ -148,7 +149,7 @@ public:
unsigned getRegisterID() const { return RegisterID; }
void addUser(ReadState *Use, int ReadAdvance);
- bool fullyUpdatesSuperRegs() const { return WD.FullyUpdatesSuperRegs; }
+ bool clearsSuperRegisters() const { return ClearsSuperRegs; }
// On every cycle, update CyclesLeft and notify dependent users.
void cycleEvent();
Modified: llvm/trunk/tools/llvm-mca/RegisterFile.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/RegisterFile.cpp?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/RegisterFile.cpp (original)
+++ llvm/trunk/tools/llvm-mca/RegisterFile.cpp Wed Jun 20 03:08:11 2018
@@ -138,7 +138,7 @@ void RegisterFile::addRegisterWrite(Writ
allocatePhysRegs(Mapping.second, UsedPhysRegs);
// If this is a partial update, then we are done.
- if (!WS.fullyUpdatesSuperRegs())
+ if (!WS.clearsSuperRegisters())
return;
for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)
@@ -149,7 +149,7 @@ void RegisterFile::removeRegisterWrite(c
MutableArrayRef<unsigned> FreedPhysRegs,
bool ShouldFreePhysRegs) {
unsigned RegID = WS.getRegisterID();
- bool ShouldInvalidateSuperRegs = WS.fullyUpdatesSuperRegs();
+ bool ShouldInvalidateSuperRegs = WS.clearsSuperRegisters();
assert(RegID != 0 && "Invalidating an already invalid register?");
assert(WS.getCyclesLeft() != -512 &&
Modified: llvm/trunk/tools/llvm-mca/llvm-mca.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-mca/llvm-mca.cpp?rev=335113&r1=335112&r2=335113&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-mca/llvm-mca.cpp (original)
+++ llvm/trunk/tools/llvm-mca/llvm-mca.cpp Wed Jun 20 03:08:11 2018
@@ -388,6 +388,9 @@ int main(int argc, char **argv) {
std::unique_ptr<MCInstrInfo> MCII(TheTarget->createMCInstrInfo());
+ std::unique_ptr<MCInstrAnalysis> MCIA(
+ TheTarget->createMCInstrAnalysis(MCII.get()));
+
if (!MCPU.compare("native"))
MCPU = llvm::sys::getHostCPUName();
@@ -457,7 +460,7 @@ int main(int argc, char **argv) {
Width = DispatchWidth;
// Create an instruction builder.
- mca::InstrBuilder IB(*STI, *MCII);
+ mca::InstrBuilder IB(*STI, *MCII, *MRI, *MCIA);
// Number each region in the sequence.
unsigned RegionIdx = 0;
More information about the llvm-commits
mailing list