[llvm-branch-commits] [llvm] [AMDGPU][Scheduler] Use MIR-level rematerializer in rematerialization stage (PR #189491)
Lucas Ramirez via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Mon Mar 30 14:46:28 PDT 2026
https://github.com/lucas-rami created https://github.com/llvm/llvm-project/pull/189491
This makes the scheduler's rematerialization stage use the target-independent rematerializer. Previously duplicate logic is deleted, and restrictions are put in place in the stage so that the same constraints as before apply on rematerializable registers (as the rematerializer is able to expose many more rematerialization opportunities than what the stage can track at the moment). Consequently it is not expected that this change improves performance overall, but it is a first step toward being able to use the rematerializer's more advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with otherwise equivalent score are decided by their corresponding register's index handle in the rematerializer (previously the pointer to their state object's value). This is determined by the rematerializer's register collection order, which is different from the stage's old register collection order. This is the cause of all test changes but one, and should not be detrimental to performance in real cases.
- To support rollback, the stage now uses the rematerializer's rollback listener instead of its previous ad-hoc method (setting the opcode of rematerialized MIs to a DBG_VALUE, and their registers to the sentinel). This is the source of test changes in `machine-scheduler-sink-trivial-remats-debug.mir`. The new rollback mechanism completely removes the behavior tested by `misched-remat-revert.ll` so the test is deleted.
>From 878214f97873c5ee3aeb15915513b1c5c516a2c5 Mon Sep 17 00:00:00 2001
From: Lucas Ramirez <lucas.rami at proton.me>
Date: Mon, 30 Mar 2026 21:22:04 +0000
Subject: [PATCH] [AMDGPU][Scheduler] Use MIR-level rematerializer in
rematerialization stage
This makes the scheduler's rematerialization stage use the
target-independent rematerializer. Previosuly duplicate logic is
deleted, and restrictions are put in place in the stage so that the
same cosntraints as before apply on rematerializable registers (as the
rematerializer is able to expose many more rematerialization
opportunities than what the stage can track at the moment).
Consequently it is not expected that this change improves performance
overall, but it is a first step toward being able to use the
rematerializer's more advanced capabilities during scheduling.
This is *not* a NFC for 2 reasons.
- Score equalities between two rematerialization candidates with
otherwise equivalent score are decided by their corresponding
register's index handle in the rematerializer (previously the pointer
to their state object's value). This is determined by the
rematerializer's register collection order, which is different from
the stage's old register collection order. This is the cause of all
test changes but one, and should not be detrimental to performance in
real cases.
- To support rollback, the stage now uses the rematerializer's rollback
listener instead of its previous ad-hoc method (setting the opcode of
rematerialized MIs to a DBG_VALUE, and their registers to the
sentinel). This is the source of test changes in
`machine-scheduler-sink-trivial-remats-debug.mir`. The new rollback
mechanism completely removes the behavior tested by
`misched-remat-revert.ll` so the test is deleted.
---
llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp | 393 ++----
llvm/lib/Target/AMDGPU/GCNSchedStrategy.h | 121 +-
...ine-scheduler-sink-trivial-remats-attr.mir | 38 +-
...ne-scheduler-sink-trivial-remats-debug.mir | 12 +-
.../CodeGen/AMDGPU/misched-remat-revert.ll | 577 ---------
.../AMDGPU/sched_mfma_rewrite_copies.mir | 1102 ++++++++---------
.../AMDGPU/sched_mfma_rewrite_cost.mir | 72 +-
.../AMDGPU/sched_mfma_rewrite_diff_types.mir | 34 +-
8 files changed, 778 insertions(+), 1571 deletions(-)
delete mode 100644 llvm/test/CodeGen/AMDGPU/misched-remat-revert.ll
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 78b450c8814d9..274513a030b33 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -37,6 +37,7 @@
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/RegisterClassInfo.h"
+#include "llvm/CodeGen/Rematerializer.h"
#include "llvm/MC/LaneBitmask.h"
#include "llvm/MC/MCSchedule.h"
#include "llvm/MC/TargetRegistry.h"
@@ -1452,23 +1453,6 @@ bool PreRARematStage::initGCNSchedStage() {
if (!GCNSchedStage::initGCNSchedStage() || DAG.Regions.size() <= 1)
return false;
- // Maps all MIs (except lone terminators, which are not part of any region) to
- // their parent region. Non-lone terminators are considered part of the region
- // they delimitate.
- DenseMap<MachineInstr *, unsigned> MIRegion(MF.getInstructionCount());
-
- // Before performing any IR modification record the parent region of each MI
- // and the parent MBB of each region.
- const unsigned NumRegions = DAG.Regions.size();
- for (unsigned I = 0; I < NumRegions; ++I) {
- RegionBoundaries Region = DAG.Regions[I];
- for (auto MI = Region.first; MI != Region.second; ++MI)
- MIRegion.insert({&*MI, I});
- MachineBasicBlock *ParentMBB = Region.first->getParent();
- if (Region.second != ParentMBB->end())
- MIRegion.insert({&*Region.second, I});
- }
-
#ifndef NDEBUG
auto PrintTargetRegions = [&]() -> void {
if (TargetRegions.none()) {
@@ -1479,31 +1463,6 @@ bool PreRARematStage::initGCNSchedStage() {
for (unsigned I : TargetRegions.set_bits())
dbgs() << REMAT_PREFIX << " [" << I << "] " << RPTargets[I] << '\n';
};
- auto PrintCandidate = [&](const ScoredRemat &Cand) -> Printable {
- return Printable([&, Cand](raw_ostream &OS) {
- // Concatenate all region numbers in which the register is unused and
- // live-through.
- const RematReg &Remat = *Cand.Remat;
- bool HasLiveThroughRegion = false;
- OS << '[' << Remat.DefRegion << " -";
- for (unsigned I = 0; I < NumRegions; ++I) {
- if (!Cand.UnpredictableRPSave[I]) {
- if (HasLiveThroughRegion) {
- OS << ',';
- } else {
- OS << "- ";
- HasLiveThroughRegion = true;
- }
- OS << I;
- }
- }
- if (HasLiveThroughRegion)
- OS << " -";
- OS << "-> " << Remat.UseRegion << "] ";
- Remat.DefMI->print(OS, /*IsStandalone=*/true, /*SkipOpers=*/false,
- /*SkipDebugLoc=*/false, /*AddNewLine=*/false);
- });
- };
#endif
// Set an objective for the stage based on current RP in each region.
@@ -1527,36 +1486,68 @@ bool PreRARematStage::initGCNSchedStage() {
PrintTargetRegions();
});
- // Collect all rematerializable registers in the function, then create a
- // corresponding scored rematerialization candidate for each one.
- if (!collectRematRegs(MIRegion)) {
+ // We need up-to-date live-out info. to query live-out register masks in
+ // regions containing rematerializable instructions.
+ DAG.RegionLiveOuts.buildLiveRegMap();
+
+ if (!Remater.analyze()) {
REMAT_DEBUG(dbgs() << "No rematerializable registers\n");
return false;
}
const ScoredRemat::FreqInfo FreqInfo(MF, DAG);
+
+ // Set of registers already marked for potential remterialization; used to
+ // avoid rematerialization chains.
+ SmallSet<Register, 4> MarkedRegs;
+ auto IsMarkedForRemat = [&MarkedRegs](const MachineOperand &MO) -> bool {
+ return MO.isReg() && MarkedRegs.contains(MO.getReg());
+ };
+
+ // Collect candidates. We have more restrictions on what we can track here
+ // compared to the rematerializer.
SmallVector<ScoredRemat, 8> Candidates;
- Candidates.reserve(RematRegs.size());
SmallVector<unsigned> CandidateOrder, NewCandidateOrder;
- for (RematReg &Remat : RematRegs) {
- ScoredRemat &Candidate = Candidates.emplace_back(&Remat, FreqInfo, DAG);
- if (Candidate.update(TargetRegions, RPTargets, FreqInfo, !TargetOcc))
+ for (unsigned RegIdx = 0, E = Remater.getNumRegs(); RegIdx < E; ++RegIdx) {
+ const Rematerializer::Reg &CandReg = Remater.getReg(RegIdx);
+
+ // Single user only.
+ unsigned NumUsers = 0;
+ for (const auto &[_, RegionUses] : CandReg.Uses)
+ NumUsers += RegionUses.size();
+ if (NumUsers != 1)
+ continue;
+
+ // We further filter the registers that we can rematerialize based on our
+ // current tracking capabilities in the stage.
+ MachineInstr *UseMI = *CandReg.Uses.begin()->getSecond().begin();
+ const MachineOperand &UseMO = UseMI->getOperand(0);
+ if (IsMarkedForRemat(UseMO) ||
+ llvm::any_of(CandReg.DefMI->operands(), IsMarkedForRemat))
+ continue;
+
+ // Do not rematerialize an instruction it it uses registers that aren't
+ // available at its use. This ensures that we are not extending any live
+ // range while rematerializing.
+ SlotIndex UseIdx = DAG.LIS->getInstructionIndex(*UseMI).getRegSlot(true);
+ if (!VirtRegAuxInfo::allUsesAvailableAt(CandReg.DefMI, UseIdx, *DAG.LIS,
+ DAG.MRI, *DAG.TII))
+ continue;
+
+ MarkedRegs.insert(CandReg.getDefReg());
+ ScoredRemat &Cand = Candidates.emplace_back(RegIdx, FreqInfo, Remater, DAG);
+ if (Cand.update(TargetRegions, RPTargets, FreqInfo, !TargetOcc))
CandidateOrder.push_back(Candidates.size() - 1);
}
- REMAT_DEBUG({
- dbgs() << "Rematerializable registers:\n";
- for (const ScoredRemat &Cand : Candidates)
- dbgs() << REMAT_PREFIX << " " << PrintCandidate(Cand) << '\n';
- dbgs() << REMAT_PREFIX << "Region frequencies\n";
- for (auto [I, Freq] : enumerate(FreqInfo.Regions)) {
- dbgs() << REMAT_PREFIX << " [" << I << "] ";
- if (Freq)
- dbgs() << Freq;
- else
- dbgs() << "unknown ";
- dbgs() << " | " << *DAG.Regions[I].first;
- }
- });
+ if (TargetOcc) {
+ // Every rematerialization we do here is likely to move the instruction
+ // into a higher frequency region, increasing the total sum latency of the
+ // instruction itself. This is acceptable if we are eliminating a spill in
+ // the process, but when the goal is increasing occupancy we get nothing
+ // out of rematerialization if occupancy is not increased in the end; in
+ // such cases we want to roll back the rematerialization.
+ Rollback = std::make_unique<RollbackSupport>(Remater);
+ }
// Rematerialize registers in successive rounds until all RP targets are
// satisifed or until we run out of rematerialization candidates.
@@ -1575,7 +1566,7 @@ bool PreRARematStage::initGCNSchedStage() {
<< "Candidates with non-null score, in rematerialization order:\n";
for (const ScoredRemat &Cand : reverse(Candidates)) {
dbgs() << REMAT_PREFIX << " " << Cand.print() << " | "
- << PrintCandidate(Cand) << '\n';
+ << Remater.printRematReg(Cand.RegIdx) << '\n';
}
PrintTargetRegions();
});
@@ -1585,6 +1576,8 @@ bool PreRARematStage::initGCNSchedStage() {
// are no longer useful to decrease RP.
while (!CandidateOrder.empty()) {
const ScoredRemat &Cand = Candidates[CandidateOrder.back()];
+ const Rematerializer::Reg &Reg = Remater.getReg(Cand.RegIdx);
+
// When previous rematerializations in this round have already satisfied
// RP targets in all regions this rematerialization can impact, we have a
// good indication that our scores have diverged significantly from
@@ -1593,44 +1586,22 @@ bool PreRARematStage::initGCNSchedStage() {
// in at least one target region.
if (!Cand.maybeBeneficial(TargetRegions, RPTargets)) {
REMAT_DEBUG(dbgs() << "Interrupt round on stale score for "
- << Cand.print() << " | " << *Cand.Remat->DefMI);
+ << Cand.print() << " | "
+ << Remater.printRematReg(Cand.RegIdx));
break;
}
CandidateOrder.pop_back();
- RematReg &Remat = *Cand.Remat;
-
- // Remove the register from all regions where it is a live-in or live-out
- // and rematerialize it.
- REMAT_DEBUG(dbgs() << "** REMAT " << PrintCandidate(Cand) << '\n');
- removeFromLiveMaps(Remat.getReg(), Cand.LiveIn, Cand.LiveOut);
- MachineInstr *RematMI = Cand.rematerialize(DAG);
-
- // Every rematerialization we do here is likely to move the instruction
- // into a higher frequency region, increasing the total sum latency of the
- // instruction itself. This is acceptable if we are eliminating a spill in
- // the process, but when the goal is increasing occupancy we get nothing
- // out of rematerialization if occupancy is not increased in the end; in
- // such cases we want to roll back the rematerialization.
- if (TargetOcc) {
- RollbackInfo &Rollback =
- Rollbacks.emplace_back(&Remat, Cand.LiveIn, Cand.LiveOut);
- Rollback.RematMI = RematMI;
- // Make the original MI a debug value so that it does not influence
- // scheduling and replace all read registers with a sentinel register to
- // prevent operands to appear in use-lists of other MIs during LIS
- // updates. Store mappings between operand indices and original
- // registers for potential rollback.
- Remat.DefMI->setDesc(DAG.TII->get(TargetOpcode::DBG_VALUE));
- for (auto [Idx, MO] : enumerate(Remat.DefMI->operands())) {
- if (MO.isReg() && MO.readsReg()) {
- Rollback.RegMap.insert({Idx, MO.getReg()});
- MO.setReg(Register());
- }
- }
- } else {
- // Just delete the original instruction if it cannot be rolled back.
- DAG.deleteMI(Remat.DefRegion, Remat.DefMI);
+
+ // Remove the register from all regions where it is a live-in or live-out,
+ // then rematerialize the register.
+ REMAT_DEBUG(dbgs() << "** REMAT " << Remater.printRematReg(Cand.RegIdx)
+ << '\n');
+ removeFromLiveMaps(Reg.getDefReg(), Cand.LiveIn, Cand.LiveOut);
+ if (Rollback) {
+ Rollback->LiveMapUpdates.emplace_back(Cand.RegIdx, Cand.LiveIn,
+ Cand.LiveOut);
}
+ Cand.rematerialize(Remater);
// Adjust RP targets. The save is guaranteed in regions in which the
// register is live-through and unused but optimistic in all other regions
@@ -2825,82 +2796,6 @@ bool PreRARematStage::setObjective() {
return TargetRegions.any();
}
-bool PreRARematStage::collectRematRegs(
- const DenseMap<MachineInstr *, unsigned> &MIRegion) {
- // We need up-to-date live-out info. to query live-out register masks in
- // regions containing rematerializable instructions.
- DAG.RegionLiveOuts.buildLiveRegMap();
-
- // Set of registers already marked for potential remterialization; used to
- // avoid rematerialization chains.
- SmallSet<Register, 4> MarkedRegs;
- auto IsMarkedForRemat = [&MarkedRegs](const MachineOperand &MO) -> bool {
- return MO.isReg() && MarkedRegs.contains(MO.getReg());
- };
-
- // Identify rematerializable instructions in the function.
- for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
- RegionBoundaries Bounds = DAG.Regions[I];
- for (auto MI = Bounds.first; MI != Bounds.second; ++MI) {
- // The instruction must be rematerializable.
- MachineInstr &DefMI = *MI;
- if (!isReMaterializable(DefMI))
- continue;
-
- // We only support rematerializing virtual registers with one
- // definition.
- Register Reg = DefMI.getOperand(0).getReg();
- if (!Reg.isVirtual() || !DAG.MRI.hasOneDef(Reg))
- continue;
-
- // We only care to rematerialize the instruction if it has a single
- // non-debug user in a different region.
- // FIXME: Allow rematerializations with multiple uses. This should be
- // relatively easy to support using the current cost model.
- MachineInstr *UseMI = DAG.MRI.getOneNonDBGUser(Reg);
- if (!UseMI)
- continue;
- auto UseRegion = MIRegion.find(UseMI);
- if (UseRegion == MIRegion.end() || UseRegion->second == I)
- continue;
-
- // Do not rematerialize an instruction if it uses or is used by an
- // instruction that we have designated for rematerialization.
- // FIXME: Allow for rematerialization chains: this requires 1. updating
- // remat points to account for uses that are rematerialized, and 2.
- // either rematerializing the candidates in careful ordering, or
- // deferring the MBB RP walk until the entire chain has been
- // rematerialized.
- const MachineOperand &UseMO = UseMI->getOperand(0);
- if (IsMarkedForRemat(UseMO) ||
- llvm::any_of(DefMI.operands(), IsMarkedForRemat))
- continue;
-
- // Do not rematerialize an instruction it it uses registers that aren't
- // available at its use. This ensures that we are not extending any live
- // range while rematerializing.
- SlotIndex UseIdx = DAG.LIS->getInstructionIndex(*UseMI).getRegSlot(true);
- if (!VirtRegAuxInfo::allUsesAvailableAt(&DefMI, UseIdx, *DAG.LIS, DAG.MRI,
- *DAG.TII))
- continue;
-
- // Add the instruction to the rematerializable list.
- MarkedRegs.insert(Reg);
- RematRegs.emplace_back(&DefMI, UseMI, DAG, MIRegion);
- }
- }
-
- return !RematRegs.empty();
-}
-
-PreRARematStage::RematReg::RematReg(
- MachineInstr *DefMI, MachineInstr *UseMI, GCNScheduleDAGMILive &DAG,
- const DenseMap<MachineInstr *, unsigned> &MIRegion)
- : DefMI(DefMI), UseMI(UseMI), DefRegion(MIRegion.at(DefMI)),
- UseRegion(MIRegion.at(UseMI)),
- Mask(DAG.RegionLiveOuts.getLiveRegsForRegionIdx(DefRegion).at(getReg())) {
-}
-
bool PreRARematStage::ScoredRemat::maybeBeneficial(
const BitVector &TargetRegions, ArrayRef<GCNRPTarget> RPTargets) const {
for (unsigned I : TargetRegions.set_bits()) {
@@ -2910,16 +2805,6 @@ bool PreRARematStage::ScoredRemat::maybeBeneficial(
return false;
}
-void PreRARematStage::ScoredRemat::insertMI(unsigned RegionIdx,
- MachineInstr *RematMI,
- GCNScheduleDAGMILive &DAG) const {
- RegionBoundaries &Bounds = DAG.Regions[RegionIdx];
- if (Bounds.first == std::next(MachineBasicBlock::iterator(RematMI)))
- Bounds.first = RematMI;
- DAG.LIS->InsertMachineInstrInMaps(*RematMI);
- DAG.LIS->createAndComputeVirtRegInterval(RematMI->getOperand(0).getReg());
-}
-
PreRARematStage::ScoredRemat::FreqInfo::FreqInfo(
MachineFunction &MF, const GCNScheduleDAGMILive &DAG) {
assert(DAG.MLI && "MLI not defined in DAG");
@@ -2951,11 +2836,16 @@ PreRARematStage::ScoredRemat::FreqInfo::FreqInfo(
}
}
-PreRARematStage::ScoredRemat::ScoredRemat(RematReg *Remat, const FreqInfo &Freq,
+PreRARematStage::ScoredRemat::ScoredRemat(RegisterIdx RegIdx,
+ const FreqInfo &Freq,
+ const Rematerializer &Remater,
GCNScheduleDAGMILive &DAG)
- : Remat(Remat), LiveIn(DAG.Regions.size()), LiveOut(DAG.Regions.size()),
+ : RegIdx(RegIdx), LiveIn(DAG.Regions.size()), LiveOut(DAG.Regions.size()),
Live(DAG.Regions.size()), UnpredictableRPSave(DAG.Regions.size()) {
- Register DefReg = Remat->getReg();
+ const Rematerializer::Reg &Reg = Remater.getReg(RegIdx);
+ Register DefReg = Reg.getDefReg();
+ assert(Reg.Uses.size() == 1 && "expected users in single region");
+ const unsigned UseRegion = Reg.Uses.begin()->first;
// Mark regions in which the rematerializable register is live.
for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
@@ -2968,22 +2858,18 @@ PreRARematStage::ScoredRemat::ScoredRemat(RematReg *Remat, const FreqInfo &Freq,
// If the register is both unused and live-through in the region, the
// latter's RP is guaranteed to decrease.
- if (!LiveIn[I] || !LiveOut[I] || I == Remat->UseRegion)
+ if (!LiveIn[I] || !LiveOut[I] || I == UseRegion)
UnpredictableRPSave.set(I);
}
Live |= LiveIn;
Live |= LiveOut;
- RPSave.inc(DefReg, LaneBitmask::getNone(), Remat->Mask, DAG.MRI);
+ RPSave.inc(DefReg, LaneBitmask::getNone(), Reg.Mask, DAG.MRI);
// Get frequencies of defining and using regions. A rematerialization from the
// least frequent region to the most frequent region will yield the greatest
- // latency penalty and therefore should get minimum score. Reciprocally, a
- // rematerialization in the other direction should get maximum score. Default
- // to values that will yield the worst possible score given known frequencies
// in order to penalize rematerializations from or into regions whose
- // frequency is unknown.
- int64_t DefOrMin = std::max(Freq.Regions[Remat->DefRegion], Freq.MinFreq);
- int64_t UseOrMax = Freq.Regions[Remat->UseRegion];
+ int64_t DefOrMin = std::max(Freq.Regions[Reg.DefRegion], Freq.MinFreq);
+ int64_t UseOrMax = Freq.Regions[UseRegion];
if (!UseOrMax)
UseOrMax = Freq.MaxFreq;
FreqDiff = DefOrMin - UseOrMax;
@@ -3023,19 +2909,14 @@ bool PreRARematStage::ScoredRemat::update(const BitVector &TargetRegions,
return !hasNullScore();
}
-MachineInstr *
-PreRARematStage::ScoredRemat::rematerialize(GCNScheduleDAGMILive &DAG) const {
- const SIInstrInfo *TII = DAG.MF.getSubtarget<GCNSubtarget>().getInstrInfo();
- MachineInstr &DefMI = *Remat->DefMI;
- Register Reg = DefMI.getOperand(0).getReg();
- Register NewReg = DAG.MRI.cloneVirtualRegister(Reg);
-
- // Rematerialize the register in the region where it is used.
- MachineBasicBlock::iterator InsertPos = Remat->UseMI;
- TII->reMaterialize(*InsertPos->getParent(), InsertPos, NewReg, 0, DefMI);
- MachineInstr *RematMI = &*std::prev(InsertPos);
- Remat->UseMI->substituteRegister(Reg, NewReg, 0, *DAG.TRI);
- insertMI(Remat->UseRegion, RematMI, DAG);
+void PreRARematStage::ScoredRemat::rematerialize(
+ Rematerializer &Remater) const {
+ const Rematerializer::Reg &Reg = Remater.getReg(RegIdx);
+ Rematerializer::DependencyReuseInfo DRI;
+ for (const Rematerializer::Reg::Dependency &Dep : Reg.Dependencies)
+ DRI.reuse(Dep.RegIdx);
+ unsigned UseRegion = Reg.Uses.begin()->first;
+ Remater.rematerializeToRegion(RegIdx, UseRegion, DRI);
#ifdef EXPENSIVE_CHECKS
// All uses are known to be available / live at the remat point. Thus,
@@ -3065,13 +2946,6 @@ PreRARematStage::ScoredRemat::rematerialize(GCNScheduleDAGMILive &DAG) const {
}
}
#endif
- return RematMI;
-}
-
-void PreRARematStage::commitRematerializations() const {
- REMAT_DEBUG(dbgs() << "Commiting all rematerializations\n");
- for (const RollbackInfo &Rollback : Rollbacks)
- DAG.deleteMI(Rollback.Remat->DefRegion, Rollback.Remat->DefMI);
}
void PreRARematStage::updateRPTargets(const BitVector &Regions,
@@ -3103,24 +2977,6 @@ bool PreRARematStage::updateAndVerifyRPTargets(const BitVector &Regions) {
return TooOptimistic;
}
-// Copied from MachineLICM
-bool PreRARematStage::isReMaterializable(const MachineInstr &MI) {
- if (!DAG.TII->isReMaterializable(MI))
- return false;
-
- for (const MachineOperand &MO : MI.all_uses()) {
- // We can't remat physreg uses, unless it is a constant or an ignorable
- // use (e.g. implicit exec use on VALU instructions)
- if (MO.getReg().isPhysical()) {
- if (DAG.MRI.isConstantPhysReg(MO.getReg()) || DAG.TII->isIgnorableUse(MO))
- continue;
- return false;
- }
- }
-
- return true;
-}
-
void PreRARematStage::removeFromLiveMaps(Register Reg, const BitVector &LiveIn,
const BitVector &LiveOut) {
assert(LiveIn.size() == DAG.Regions.size() && "region num mismatch");
@@ -3152,28 +3008,8 @@ void PreRARematStage::finalizeGCNSchedStage() {
// When increasing occupancy, it is possible that re-scheduling is not able to
// achieve the target occupancy in all regions, in which case re-scheduling in
// all regions should be reverted.
- if (DAG.MinOccupancy >= *TargetOcc) {
- commitRematerializations();
+ if (DAG.MinOccupancy >= *TargetOcc)
return;
- }
-
- // It is possible that re-scheduling lowers occupancy over the one achieved
- // just through rematerializations, in which case we revert re-scheduling in
- // all regions but do not roll back rematerializations.
- const bool ShouldRollbackRemats = AchievedOcc < *TargetOcc;
-
- // When we both need to revert re-scheduling and rollback rematerializations,
- // restore rematerialized MIs' original state before reverting so that they
- // are treated as non-debug instructions by the revert logic.
- if (ShouldRollbackRemats) {
- for (const RollbackInfo &Rollback : Rollbacks) {
- const RematReg *Remat = Rollback.Remat;
- MachineInstr *RematMI = Rollback.RematMI;
- Rollback.Remat->DefMI->setDesc(DAG.TII->get(RematMI->getOpcode()));
- for (const auto &[MOIdx, Reg] : Rollback.RegMap)
- Remat->DefMI->getOperand(MOIdx).setReg(Reg);
- }
- }
// Revert re-scheduling in all affected regions.
for (const auto &[RegionIdx, OrigMIOrder, MaxPressure] : RegionReverts) {
@@ -3183,8 +3019,10 @@ void PreRARematStage::finalizeGCNSchedStage() {
modifyRegionSchedule(RegionIdx, OrigMIOrder);
}
- if (!ShouldRollbackRemats) {
- commitRematerializations();
+ // It is possible that re-scheduling lowers occupancy over the one achieved
+ // just through rematerializations, in which case we revert re-scheduling in
+ // all regions but do not roll back rematerializations.
+ if (AchievedOcc >= *TargetOcc) {
DAG.setTargetOccupancy(AchievedOcc);
return;
}
@@ -3192,35 +3030,16 @@ void PreRARematStage::finalizeGCNSchedStage() {
// Reset the target occupancy to what it was pre-rematerialization.
DAG.setTargetOccupancy(*TargetOcc - 1);
- // Finish rolling back rematerializations, then recompute pressure in all
+ // Roll back changes made by the stage, then recompute pressure in all
// affected regions.
REMAT_DEBUG(dbgs() << "==== ROLLBACK ====\n");
- DenseSet<Register> RecomputeLI;
- for (const RollbackInfo &Rollback : Rollbacks) {
- const RematReg *Remat = Rollback.Remat;
- MachineInstr *RematMI = Rollback.RematMI;
-
- // Switch back to using the original register and delete the
- // rematerialization.
- Register Reg = RematMI->getOperand(0).getReg();
- Register OriginalReg = Remat->DefMI->getOperand(0).getReg();
- Remat->UseMI->substituteRegister(Reg, OriginalReg, 0, *DAG.TRI);
- REMAT_DEBUG(dbgs() << '[' << Remat->UseRegion
- << "] Deleting rematerialization " << *RematMI);
- DAG.deleteMI(Remat->UseRegion, RematMI);
- addToLiveMaps(OriginalReg, Remat->Mask, Rollback.LiveIn, Rollback.LiveOut);
-
- // Regenerate intervals for all register operands of rematerialized MIs as
- // slot indices may have changed slightly from before re-scheduling.
- for (MachineOperand &MO : Rollback.Remat->DefMI->operands()) {
- if (MO.isReg() && MO.getReg().isVirtual())
- RecomputeLI.insert(MO.getReg());
- }
- }
- for (Register Reg : RecomputeLI) {
- DAG.LIS->removeInterval(Reg);
- DAG.LIS->createAndComputeVirtRegInterval(Reg);
+ assert(Rollback && "rollbacker should be defined");
+ Rollback->Listener.rollback(Remater);
+ for (const auto &[RegIdx, LiveIn, LiveOut] : Rollback->LiveMapUpdates) {
+ const Rematerializer::Reg &Reg = Remater.getReg(RegIdx);
+ addToLiveMaps(Reg.getDefReg(), Reg.Mask, LiveIn, LiveOut);
}
+
#ifdef EXPENSIVE_CHECKS
// In particular, we want to check for coherent MI/slot order in regions in
// which reverts and/or rollbacks may have happened.
@@ -3232,16 +3051,6 @@ void PreRARematStage::finalizeGCNSchedStage() {
GCNSchedStage::finalizeGCNSchedStage();
}
-void GCNScheduleDAGMILive::deleteMI(unsigned RegionIdx, MachineInstr *MI) {
- // It's not possible for the deleted instruction to be upper region boundary
- // since we don't delete region terminators.
- if (Regions[RegionIdx].first == MI)
- Regions[RegionIdx].first = std::next(MachineBasicBlock::iterator(MI));
- LIS->removeInterval(MI->getOperand(0).getReg());
- LIS->RemoveMachineInstrFromMaps(*MI);
- MI->eraseFromParent();
-}
-
void GCNScheduleDAGMILive::setTargetOccupancy(unsigned TargetOccupancy) {
MinOccupancy = TargetOccupancy;
if (MFI.getOccupancy() < TargetOccupancy)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index b9c4dcdd8b7ac..1ca41ca7a374b 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -19,6 +19,7 @@
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineScheduler.h"
+#include "llvm/CodeGen/Rematerializer.h"
namespace llvm {
@@ -328,8 +329,6 @@ class GCNScheduleDAGMILive final : public ScheduleDAGMILive {
std::unique_ptr<GCNSchedStage> createSchedStage(GCNSchedStageID SchedStageID);
- void deleteMI(unsigned RegionIdx, MachineInstr *MI);
-
public:
GCNScheduleDAGMILive(MachineSchedContext *C,
std::unique_ptr<MachineSchedStrategy> S);
@@ -545,32 +544,14 @@ class ClusteredLowOccStage : public GCNSchedStage {
/// rematerialized.
class PreRARematStage : public GCNSchedStage {
private:
- /// A rematerializable register.
- struct RematReg {
- /// Single MI defining the rematerializable register.
- MachineInstr *DefMI;
- /// Single user of the rematerializable register.
- MachineInstr *UseMI;
- /// Defining and using regions.
- unsigned DefRegion, UseRegion;
- /// The rematerializable register's lane bitmask.
- LaneBitmask Mask;
-
- RematReg(MachineInstr *DefMI, MachineInstr *UseMI,
- GCNScheduleDAGMILive &DAG,
- const DenseMap<MachineInstr *, unsigned> &MIRegion);
-
- /// Returns the rematerializable register. Do not call after deleting the
- /// original defining instruction.
- Register getReg() const { return DefMI->getOperand(0).getReg(); }
- };
+ using RegisterIdx = Rematerializer::RegisterIdx;
/// A scored rematerialization candidate. Higher scores indicate more
/// beneficial rematerializations. A null score indicate the rematerialization
/// is not helpful to reduce RP in target regions.
struct ScoredRemat {
- /// The rematerializable register under consideration.
- RematReg *Remat;
+ /// The register index handle in the rematerializer.
+ RegisterIdx RegIdx;
/// Regions in which the register is live-in/live-out/live anywhere.
BitVector LiveIn, LiveOut, Live;
/// Subset of \ref Live regions in which the rematerialization is not
@@ -598,23 +579,17 @@ class PreRARematStage : public GCNSchedStage {
/// This only initializes state-independent characteristics of \p Remat, not
/// the actual score.
- ScoredRemat(RematReg *Remat, const FreqInfo &Freq,
- GCNScheduleDAGMILive &DAG);
+ ScoredRemat(RegisterIdx RegIdx, const FreqInfo &Freq,
+ const Rematerializer &Remater, GCNScheduleDAGMILive &DAG);
- /// Rematerializes the candidate at its use and returns the new MI.
- MachineInstr *rematerialize(GCNScheduleDAGMILive &DAG) const;
+ /// Rematerializes the candidate using the \p Remater.
+ void rematerialize(Rematerializer &Remater) const;
/// Determines whether this rematerialization may be beneficial in at least
/// one target region.
bool maybeBeneficial(const BitVector &TargetRegions,
ArrayRef<GCNRPTarget> RPTargets) const;
- /// Updates internal structures following a MI rematerialization. Part of
- /// the stage instead of the DAG because it makes assumptions that are
- /// specific to the rematerialization process.
- void insertMI(unsigned RegionIdx, MachineInstr *RematMI,
- GCNScheduleDAGMILive &DAG) const;
-
/// Rematerializes the candidate and returns the new MI. This removes the
/// rematerialized register from live-in/out lists in the \p DAG and updates
/// \p RPTargets in all affected regions. Regions in which RP savings are
@@ -644,12 +619,16 @@ class PreRARematStage : public GCNSchedStage {
return FreqDiff < O.FreqDiff;
if (RegionImpact != O.RegionImpact)
return RegionImpact < O.RegionImpact;
- // Break ties using pointer to rematerializable register. Rematerializable
- // registers are collected in instruction order so, within the same
- // region, this will prefer registers defined earlier that have longer
- // live ranges in their defining region (since the registers we consider
- // are always live-out in their defining region).
- return Remat > O.Remat;
+ // Break ties using register index handles. If the two registers are
+ // connected in some dependency DAG of rematerializable registers, this
+ // will tend to give a higher score to the register further from the
+ // dependency DAG's root. If the two registers are disconnected, this will
+ // give a higher score to the register with lower virtual register index.
+ // In general, within a region, this should prefer registers defined
+ // earlier that have longer live ranges in their defining region (since
+ // the registers we consider are always live-out in their defining
+ // region).
+ return RegIdx > O.RegIdx;
}
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
@@ -686,28 +665,35 @@ class PreRARematStage : public GCNSchedStage {
/// rescheduled.
BitVector RescheduleRegions;
- /// List of rematerializable registers.
- SmallVector<RematReg> RematRegs;
-
- /// Holds enough information to rollback a rematerialization decision post
- /// re-scheduling.
- struct RollbackInfo {
- /// The rematerializable register under consideration.
- const RematReg *Remat;
- /// Regions in which the original register was live-in or live-out.
- BitVector LiveIn, LiveOut;
- /// The rematerialized MI replacing the original defining MI.
- MachineInstr *RematMI;
- /// Maps register machine operand indices to their original register.
- SmallDenseMap<unsigned, Register, 4> RegMap;
-
- RollbackInfo(const RematReg *Remat, const BitVector &LiveIn,
- const BitVector &LiveOut)
- : Remat(Remat), LiveIn(LiveIn), LiveOut(LiveOut) {}
+ /// Underlying utilities to identify and perform rematerializations.
+ Rematerializer Remater;
+
+ struct RollbackSupport {
+ struct LiveMapUpdate {
+ /// The register index handle in the rematerializer.
+ RegisterIdx RegIdx;
+ /// Regions in which the original register was live-in or live-out.
+ BitVector LiveIn, LiveOut;
+
+ LiveMapUpdate(RegisterIdx RegIdx, const BitVector &LiveIn,
+ const BitVector &LiveOut)
+ : RegIdx(RegIdx), LiveIn(LiveIn), LiveOut(LiveOut) {}
+ };
+
+ /// Rollback listener.
+ Rollbacker Listener;
+ /// Registers removed from live-maps along with bitvectors indicationg the
+ /// regions in which they were live-ins and live-outs.
+ SmallVector<LiveMapUpdate> LiveMapUpdates;
+
+ /// Attaches the rollback listener to the rematerializer.
+ RollbackSupport(Rematerializer &Remater) { Remater.addListener(&Listener); }
};
- /// List of rematerializations to rollback if rematerialization does not end
- /// up being beneficial.
- SmallVector<RollbackInfo> Rollbacks;
+
+ /// Rollback support. Maintained through a unique pointer because it is
+ /// optional and needs to persist between stage initialization and
+ /// finalization.
+ std::unique_ptr<RollbackSupport> Rollback;
/// State of a region pre-re-scheduling but post-rematerializations that we
/// must keep to be able to revert re-scheduling effects.
@@ -748,18 +734,6 @@ class PreRARematStage : public GCNSchedStage {
/// but are actually not, and returns whether there were any such regions.
bool updateAndVerifyRPTargets(const BitVector &Regions);
- /// Collects all rematerializable registers and appends them to \ref
- /// RematRegs. \p MIRegion maps MIs to their region. Returns whether any
- /// rematerializable register was found.
- bool collectRematRegs(const DenseMap<MachineInstr *, unsigned> &MIRegion);
-
- /// Deletes all rematerialized MIs from the MIR when they were kept around for
- /// potential rollback.
- void commitRematerializations() const;
-
- /// Whether the MI is rematerializable
- bool isReMaterializable(const MachineInstr &MI);
-
/// Removes register \p Reg from the live-ins of regions set in \p LiveIn and
/// the live-outs of regions set in \p LiveOut.
void removeFromLiveMaps(Register Reg, const BitVector &LiveIn,
@@ -786,7 +760,8 @@ class PreRARematStage : public GCNSchedStage {
PreRARematStage(GCNSchedStageID StageID, GCNScheduleDAGMILive &DAG)
: GCNSchedStage(StageID, DAG), TargetRegions(DAG.Regions.size()),
- RescheduleRegions(DAG.Regions.size()) {
+ RescheduleRegions(DAG.Regions.size()),
+ Remater(MF, DAG.Regions, *DAG.LIS) {
const unsigned NumRegions = DAG.Regions.size();
RPTargets.reserve(NumRegions);
}
diff --git a/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir b/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
index 1daa709ab6439..190ef2f6adcd7 100644
--- a/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
+++ b/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir
@@ -910,11 +910,11 @@ body: |
; GFX90A-NEXT: [[DEF26:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
; GFX90A-NEXT: [[DEF27:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
; GFX90A-NEXT: [[DEF28:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_27:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 28, implicit $exec, implicit $mode
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_28:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 29, implicit $exec, implicit $mode
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_29:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 30, implicit $exec, implicit $mode
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_30:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 31, implicit $exec, implicit $mode
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_31:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 64, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: [[DEF29:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
+ ; GFX90A-NEXT: [[DEF30:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
+ ; GFX90A-NEXT: [[DEF31:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
+ ; GFX90A-NEXT: [[DEF32:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_27:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 64, implicit $exec, implicit $mode
; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.1:
; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_]], implicit [[V_CVT_I32_F64_e32_1]], implicit [[V_CVT_I32_F64_e32_2]], implicit [[V_CVT_I32_F64_e32_3]], implicit [[V_CVT_I32_F64_e32_4]]
@@ -922,20 +922,20 @@ body: |
; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_10]], implicit [[V_CVT_I32_F64_e32_11]], implicit [[V_CVT_I32_F64_e32_12]], implicit [[V_CVT_I32_F64_e32_13]], implicit [[V_CVT_I32_F64_e32_14]]
; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_15]], implicit [[V_CVT_I32_F64_e32_16]], implicit [[V_CVT_I32_F64_e32_17]], implicit [[V_CVT_I32_F64_e32_18]], implicit [[V_CVT_I32_F64_e32_19]]
; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_20]], implicit [[V_CVT_I32_F64_e32_21]], implicit [[V_CVT_I32_F64_e32_22]], implicit [[V_CVT_I32_F64_e32_23]], implicit [[V_CVT_I32_F64_e32_24]]
- ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_32:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode
- ; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_25]], implicit [[V_CVT_I32_F64_e32_26]], implicit [[V_CVT_I32_F64_e32_32]], implicit [[V_CVT_I32_F64_e32_27]], implicit [[V_CVT_I32_F64_e32_28]]
- ; GFX90A-NEXT: [[DEF29:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
- ; GFX90A-NEXT: [[DEF30:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
- ; GFX90A-NEXT: [[DEF31:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
- ; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_29]], implicit [[V_CVT_I32_F64_e32_30]], implicit [[DEF29]], implicit [[DEF30]], implicit [[DEF31]]
- ; GFX90A-NEXT: [[DEF32:%[0-9]+]]:agpr_32 = IMPLICIT_DEF
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF32]], implicit [[DEF]], implicit [[DEF1]], implicit [[DEF2]], implicit [[DEF3]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF4]], implicit [[DEF5]], implicit [[DEF6]], implicit [[DEF7]], implicit [[DEF8]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF9]], implicit [[DEF10]], implicit [[DEF11]], implicit [[DEF12]], implicit [[DEF13]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF14]], implicit [[DEF15]], implicit [[DEF16]], implicit [[DEF17]], implicit [[DEF18]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF19]], implicit [[DEF20]], implicit [[DEF21]], implicit [[DEF22]], implicit [[DEF23]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF24]], implicit [[DEF25]], implicit [[DEF26]], implicit [[DEF27]], implicit [[V_CVT_I32_F64_e32_31]]
- ; GFX90A-NEXT: S_NOP 0, implicit [[DEF28]]
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_28:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 27, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_29:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 28, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_30:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 29, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_25]], implicit [[V_CVT_I32_F64_e32_26]], implicit [[V_CVT_I32_F64_e32_28]], implicit [[V_CVT_I32_F64_e32_29]], implicit [[V_CVT_I32_F64_e32_30]]
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_31:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 30, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: [[V_CVT_I32_F64_e32_32:%[0-9]+]]:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 31, implicit $exec, implicit $mode
+ ; GFX90A-NEXT: S_NOP 0, implicit [[V_CVT_I32_F64_e32_31]], implicit [[V_CVT_I32_F64_e32_32]], implicit [[DEF]], implicit [[DEF1]], implicit [[DEF2]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF3]], implicit [[DEF4]], implicit [[DEF5]], implicit [[DEF6]], implicit [[DEF7]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF8]], implicit [[DEF9]], implicit [[DEF10]], implicit [[DEF11]], implicit [[DEF12]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF13]], implicit [[DEF14]], implicit [[DEF15]], implicit [[DEF16]], implicit [[DEF17]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF18]], implicit [[DEF19]], implicit [[DEF20]], implicit [[DEF21]], implicit [[DEF22]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF23]], implicit [[DEF24]], implicit [[DEF25]], implicit [[DEF26]], implicit [[DEF27]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF28]], implicit [[DEF29]], implicit [[DEF30]], implicit [[DEF31]], implicit [[V_CVT_I32_F64_e32_27]]
+ ; GFX90A-NEXT: S_NOP 0, implicit [[DEF32]]
; GFX90A-NEXT: S_ENDPGM 0
bb.0:
successors: %bb.1
diff --git a/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir b/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
index 85ffccaa90533..1e7e6e27c0ad5 100644
--- a/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
+++ b/llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir
@@ -23,8 +23,8 @@ body: |
; DEBUG: Machine code for function sink_and_inc_idx_when_skipping_small_region_1: IsSSA, NoPHIs, TracksLiveness
; DEBUG: [PreRARemat] Retrying function scheduling with new min. occupancy of 9 from rematerializing (original was 9, target was 10)
; DEBUG-NEXT: ********** MI Scheduling **********
- ; DEBUG-NEXT: sink_and_inc_idx_when_skipping_small_region_1:%bb.1
- ; DEBUG-NEXT: From: %23:vgpr_32 = nofpexcept DBG_VALUE 23, implicit $noreg, implicit $noreg
+ ; DEBUG-NEXT: sink_and_inc_idx_when_skipping_small_region_1:%bb.2
+ ; DEBUG-NEXT: From: %24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
; DEBUG-NEXT: To: End RegionInstrs: 2
bb.0:
successors: %bb.1
@@ -95,9 +95,9 @@ body: |
; DEBUG: Machine code for function sink_and_inc_idx_when_skipping_small_regions_2: IsSSA, NoPHIs, TracksLiveness
; DEBUG: [PreRARemat] Retrying function scheduling with new min. occupancy of 10 from rematerializing (original was 9, target was 10)
; DEBUG-NEXT: ********** MI Scheduling **********
- ; DEBUG-NEXT: sink_and_inc_idx_when_skipping_small_regions_2:%bb.1
- ; DEBUG-NEXT: From: %23:vgpr_32 = nofpexcept DBG_VALUE 23, implicit $noreg, implicit $noreg
- ; DEBUG-NEXT: To: End RegionInstrs: 2
+ ; DEBUG-NEXT: sink_and_inc_idx_when_skipping_small_regions_2:%bb.2
+ ; DEBUG-NEXT: From: %24:vgpr_32 = nofpexcept V_CVT_I32_F64_e32 24, implicit $exec, implicit $mode, implicit-def $m0
+ ; DEBUG-NEXT: To: End RegionInstrs: 4
bb.0:
successors: %bb.1
@@ -169,7 +169,7 @@ machineFunctionInfo:
body: |
; DEBUG: Machine code for function test_occ_inc_revert_all_regions: IsSSA, NoPHIs, TracksLiveness
; DEBUG: [PreRARemat] Retrying function scheduling with new min. occupancy of 6 from rematerializing (original was 5, target was 6)
- ; DEBUG: [PreRARemat] Commiting all rematerializations
+ ; DEBUG-NOT: [PreRARemat] ==== ROLLBACK ====
bb.0:
successors: %bb.1
diff --git a/llvm/test/CodeGen/AMDGPU/misched-remat-revert.ll b/llvm/test/CodeGen/AMDGPU/misched-remat-revert.ll
deleted file mode 100644
index a588e99980c6e..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/misched-remat-revert.ll
+++ /dev/null
@@ -1,577 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 6
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -O3 -stop-after=machine-scheduler < %s | FileCheck %s
-
-; This testcase hit a situation where reverting scheduling after the scheduler's
-; rematerialization stage would cause incoherent MI and slot orders, hitting an
-; assert later on in the RP trackers. This is fixed by ensuring that all MIs that
-; should be alive at the end of the stage are marked non-debug before scheduling
-; is reverted, but is extremely sensitive to scheduling and rematerialization
-; decisions.
-
-%f8 = type { i8 }
-
- at shared = external addrspace(3) global [16384 x i8]
-
-define amdgpu_kernel void @test_revert_schedule(i32 %arg0, i32 %arg1, ptr addrspace(3) %p15, ptr addrspace(3) %lds, ptr addrspace(3) %arg, ptr addrspace(3) %p14, i32 %arg2, ptr addrspace(3) %arg3, ptr addrspace(3) %arg4, i32 %arg5, i32 %arg6, ptr addrspace(3) %p12, i32 %x7, ptr addrspace(3) %p7, i32 %a7, ptr addrspace(3) %arg7, i1 %loopcond, i32 %a5, i32 %a3, i32 %a4, i32 %a2, <4 x i8> %arg8, <4 x i8> %arg9) #0 {
- ; CHECK-LABEL: name: test_revert_schedule
- ; CHECK: bb.0.entry:
- ; CHECK-NEXT: successors: %bb.1(0x80000000)
- ; CHECK-NEXT: liveins: $vgpr0, $sgpr4_sgpr5
- ; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr4_sgpr5
- ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 64, 0 :: (dereferenceable invariant load (s32) from %ir.loopcond.kernarg.offset.align.down, align 16, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]](p4), 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg0.kernarg.offset1, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM1:%[0-9]+]]:sgpr_32 = S_LOAD_DWORD_IMM [[COPY]](p4), 16, 0 :: (dereferenceable invariant load (s32) from %ir.arg0.kernarg.offset1 + 16, align 16, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM2:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 32, 0 :: (dereferenceable invariant load (s32) from %ir.arg4.kernarg.offset, align 16, addrspace 4)
- ; CHECK-NEXT: S_BITCMP1_B32 [[S_LOAD_DWORD_IMM]], 0, implicit-def $scc
- ; CHECK-NEXT: [[S_CSELECT_B64_:%[0-9]+]]:sreg_64_xexec = S_CSELECT_B64 -1, 0, implicit $scc
- ; CHECK-NEXT: [[V_AND_B32_e32_:%[0-9]+]]:vgpr_32 = V_AND_B32_e32 1, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B32_e32_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e32 1, [[V_AND_B32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_AND_B32_e32_1:%[0-9]+]]:vgpr_32 = V_AND_B32_e32 3, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 20, [[V_AND_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_1:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 21, [[V_AND_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_XOR_B32_e32_:%[0-9]+]]:vgpr_32 = V_XOR_B32_e32 1, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_XOR_B32_e32_1:%[0-9]+]]:vgpr_32 = V_XOR_B32_e32 1, [[V_AND_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_AND_B32_e32_2:%[0-9]+]]:vgpr_32 = V_AND_B32_e32 31, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: [[V_SUB_U32_e32_:%[0-9]+]]:vgpr_32 = V_SUB_U32_e32 0, [[V_XOR_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e32 1, [[V_SUB_U32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec, implicit $exec
- ; CHECK-NEXT: undef [[S_MOV_B32_:%[0-9]+]].sub0:sgpr_128 = S_MOV_B32 0
- ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]].sub1:sgpr_128 = COPY [[S_MOV_B32_]].sub0
- ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]].sub2:sgpr_128 = COPY [[S_MOV_B32_]].sub0
- ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]].sub3:sgpr_128 = COPY [[S_MOV_B32_]].sub0
- ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_128_align2 = COPY [[S_MOV_B32_]]
- ; CHECK-NEXT: [[V_MOV_B:%[0-9]+]]:vreg_64_align2 = V_MOV_B64_PSEUDO 0, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 12288, implicit $exec
- ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORDX4_IMM]].sub0
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORD_IMM2]], [[V_XOR_B32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORD_IMM2]], [[V_LSHLREV_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[S_XOR_B64_:%[0-9]+]]:sreg_64_xexec = S_XOR_B64 [[S_CSELECT_B64_]], -1, implicit-def dead $scc
- ; CHECK-NEXT: [[V_CNDMASK_B32_e64_:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, [[S_XOR_B64_]], implicit $exec
- ; CHECK-NEXT: [[V_CMP_NE_U32_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_NE_U32_e64 1, [[V_CNDMASK_B32_e64_]], implicit $exec
- ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_64_align2 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub1:vreg_64_align2 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: bb.1.loop:
- ; CHECK-NEXT: successors: %bb.2(0x04000000), %bb.1(0x7c000000)
- ; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub3, [[V_MOV_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET1:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_]], 0, 1, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_2]], [[S_MOV_B32_]], 0, 2048, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN1:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_2]], [[S_MOV_B32_]], 0, 3072, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[V_ADD_U32_e32_2]], [[COPY2]], 0, 0, implicit $exec :: (store (s128) into %ir.p19, addrspace 3)
- ; CHECK-NEXT: S_WAITCNT 0
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET2:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET3:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_]], 0, 1, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN2:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY3]], [[S_MOV_B32_]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_64_align2 = COPY [[S_MOV_B32_]].sub0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B32_e32_4]], [[V_MOV_B]], 0, 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_64_align2 = COPY [[S_MOV_B32_]].sub0
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_:%[0-9]+]].sub0:vreg_64_align2 = DS_READ_B32_gfx9 [[V_MOV_B32_e32_]], 0, 0, implicit $exec :: (load (s32) from `ptr addrspace(3) null`, align 8, addrspace 3)
- ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub1:vreg_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_]], 0, 0, implicit $exec :: (load (s32) from %ir.sunkaddr, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B32_e32_3]], [[V_MOV_B]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_1]], 0, 0, implicit $exec :: (load (s32) from %ir.sunkaddr2, addrspace 3)
- ; CHECK-NEXT: SCHED_GROUP_BARRIER 0, 0, 0
- ; CHECK-NEXT: $vcc = S_AND_B64 $exec, [[V_CMP_NE_U32_e64_]], implicit-def dead $scc
- ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = COPY [[V_AND_B32_e32_2]]
- ; CHECK-NEXT: S_CBRANCH_VCCNZ %bb.1, implicit $vcc
- ; CHECK-NEXT: S_BRANCH %bb.2
- ; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: bb.2.exit:
- ; CHECK-NEXT: [[V_MOV_B1:%[0-9]+]]:vreg_64_align2 = V_MOV_B64_PSEUDO 0, implicit $exec
- ; CHECK-NEXT: undef [[S_MOV_B32_1:%[0-9]+]].sub0:sgpr_128 = S_MOV_B32 0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], 0, 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: undef [[S_MOV_B32_2:%[0-9]+]].sub0:sreg_64 = S_MOV_B32 16843009
- ; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]].sub1:sreg_64 = COPY [[S_MOV_B32_2]].sub0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B32_e32_4]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_2]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY4:%[0-9]+]]:av_64_align2 = COPY [[S_MOV_B32_2]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[COPY4]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_3]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_:%[0-9]+]].sub1:vreg_64_align2 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_4]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM3:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 24, 0 :: (dereferenceable invariant load (s32) from %ir.arg2.kernarg.offset, align 8, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM4:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 40, 0 :: (dereferenceable invariant load (s32) from %ir.arg6.kernarg.offset, align 8, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM5:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 48, 0 :: (dereferenceable invariant load (s32) from %ir.x7.kernarg.offset, align 16, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM6:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 56, 0 :: (dereferenceable invariant load (s32) from %ir.a7.kernarg.offset, align 8, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM7:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 72, 0 :: (dereferenceable invariant load (s32) from %ir.a3.kernarg.offset, align 8, addrspace 4)
- ; CHECK-NEXT: [[S_LOAD_DWORD_IMM8:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]](p4), 80, 0 :: (dereferenceable invariant load (s32) from %ir.a2.kernarg.offset, align 16, addrspace 4)
- ; CHECK-NEXT: early-clobber %119:sreg_64_xexec = S_LOAD_DWORDX2_IMM_ec [[COPY]](p4), 84, 0 :: (dereferenceable invariant load (s64) from %ir.arg8.kernarg.offset, align 4, addrspace 4)
- ; CHECK-NEXT: KILL [[COPY]](p4)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_6:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_5]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_5:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 16384, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_6:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 20480, implicit $exec
- ; CHECK-NEXT: undef [[COPY5:%[0-9]+]].sub1:vreg_64_align2 = COPY [[DS_READ_B32_gfx9_]].sub1
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_7:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_6]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: undef [[COPY6:%[0-9]+]].sub0:vreg_64_align2 = COPY [[DS_READ_B32_gfx9_]].sub1
- ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]].sub1:sgpr_128 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]].sub2:sgpr_128 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_8:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_7]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]].sub3:sgpr_128 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN3:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_5]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN4:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_5]], [[S_MOV_B32_1]], 0, 1024, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_9:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_8]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY5:%[0-9]+]].sub0:vreg_64_align2 = COPY %119.sub0
- ; CHECK-NEXT: [[COPY6:%[0-9]+]].sub1:vreg_64_align2 = COPY %119.sub1
- ; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORD_IMM5]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_10:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_9]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORD_IMM3]]
- ; CHECK-NEXT: [[S_ASHR_I32_:%[0-9]+]]:sreg_32 = S_ASHR_I32 [[S_LOAD_DWORD_IMM4]], 31, implicit-def dead $scc
- ; CHECK-NEXT: [[COPY9:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORD_IMM6]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_11:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET1]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_10]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN5:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY7]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN6:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_5]], [[S_MOV_B32_1]], 0, 3072, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: KILL [[COPY7]]
- ; CHECK-NEXT: KILL [[V_MOV_B32_e32_5]]
- ; CHECK-NEXT: [[S_LSHR_B32_:%[0-9]+]]:sreg_32 = S_LSHR_B32 [[S_ASHR_I32_]], 26, implicit-def dead $scc
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_12:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET1]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_11]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN7:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY9]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN8:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_6]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: KILL [[COPY9]]
- ; CHECK-NEXT: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 [[S_LOAD_DWORD_IMM4]], [[S_LSHR_B32_]], implicit-def dead $scc
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_13:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_12]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[S_ASHR_I32_1:%[0-9]+]]:sreg_32 = S_ASHR_I32 [[S_ADD_I32_]], 6, implicit-def dead $scc
- ; CHECK-NEXT: [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[S_ASHR_I32_1]], 1, implicit-def dead $scc
- ; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_LSHL_B32_]], 19456, implicit-def dead $scc
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_14:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_13]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY10:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN9:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY8]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN10:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY10]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: KILL [[COPY10]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_15:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_1]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET4:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFSET5:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFSET [[S_MOV_B32_1]], 0, 1, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[COPY11:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORD_IMM7]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_16:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[COPY5]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_15]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY12:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORD_IMM8]]
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN11:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY11]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN12:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY12]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: KILL [[COPY11]]
- ; CHECK-NEXT: KILL [[COPY12]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_17:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[COPY6]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_16]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN13:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_6]], [[S_MOV_B32_1]], 0, 2048, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN14:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[V_MOV_B32_e32_6]], [[S_MOV_B32_1]], 0, 3072, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: KILL [[V_MOV_B32_e32_6]]
- ; CHECK-NEXT: [[COPY13:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORDX4_IMM]].sub0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_18:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_17]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX4_OFFEN15:%[0-9]+]]:vreg_128_align2 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY13]], [[S_MOV_B32_1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from `ptr addrspace(8) null`, align 1, addrspace 8)
- ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[V_OR_B32_e32_2:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 [[S_LOAD_DWORDX4_IMM]].sub0, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_19:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_18]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_3:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 1, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_4:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 24, [[V_AND_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_XOR_B32_e32_2:%[0-9]+]]:vgpr_32 = V_XOR_B32_e32 [[V_XOR_B32_e32_]], [[V_OR_B32_e32_3]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_20:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_19]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_5:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 [[V_XOR_B32_e32_2]], [[V_XOR_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e32 1, [[V_OR_B32_e32_5]], implicit $exec
- ; CHECK-NEXT: [[V_XOR_B32_e32_3:%[0-9]+]]:vgpr_32 = V_XOR_B32_e32 1, [[V_OR_B32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_21:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B32_e32_3]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_20]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_6:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 [[S_LOAD_DWORDX4_IMM]].sub0, [[V_XOR_B32_e32_3]], implicit $exec
- ; CHECK-NEXT: [[V_XOR_B32_e32_4:%[0-9]+]]:vgpr_32 = V_XOR_B32_e32 1, [[V_OR_B32_e32_4]], implicit $exec
- ; CHECK-NEXT: [[V_LSHRREV_B32_e32_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e32 3, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_22:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_21]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B32_e32_3:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e32 1, [[COPY1]](s32), implicit $exec
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DS_READ_B32_gfx9_]].sub1, [[BUFFER_LOAD_DWORDX4_OFFSET2]], 0, 0, implicit $exec :: (store (s128) into `ptr addrspace(3) null`, addrspace 3)
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[COPY8]], [[BUFFER_LOAD_DWORDX4_OFFEN2]], 0, 0, implicit $exec :: (store (s128) into %ir.p18, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_23:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_22]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_OR_B32_e32_7:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 1, [[V_AND_B32_e32_2]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORD_IMM3]], [[V_OR_B32_e32_7]], implicit $exec
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[V_ADD_U32_e32_3]], [[BUFFER_LOAD_DWORDX4_OFFSET3]], 512, 0, implicit $exec :: (store (s128) into %ir.9, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_24:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_23]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY14:%[0-9]+]]:vgpr_32 = COPY [[S_LOAD_DWORDX4_IMM]].sub2
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_1:%[0-9]+]]:av_32 = DS_READ_B32_gfx9 [[COPY14]], 0, 0, implicit $exec :: (load (s32) from %ir.4, align 8, addrspace 3)
- ; CHECK-NEXT: [[V_ADD_U32_e32_4:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub1, [[V_LSHLREV_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_25:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_24]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_5:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub1, [[V_OR_B32_e32_2]], implicit $exec
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_2:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_4]], 0, 0, implicit $exec :: (load (s32) from %ir.p10, addrspace 3)
- ; CHECK-NEXT: [[DS_READ_B64_gfx9_:%[0-9]+]]:av_64_align2 = DS_READ_B64_gfx9 [[V_ADD_U32_e32_5]], 0, 0, implicit $exec :: (load (s64) from %ir.p11, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_26:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_25]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_3:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[DS_READ_B32_gfx9_]].sub1, 0, 0, implicit $exec :: (load (s32) from `ptr addrspace(3) null`, align 8, addrspace 3)
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_2:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: undef [[COPY15:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_27:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_26]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY15:%[0-9]+]].sub1:av_64_align2 = COPY [[DS_READ_B64_gfx9_]].sub1
- ; CHECK-NEXT: [[V_ADD_U32_e32_6:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORD_IMM1]], [[V_LSHLREV_B32_e32_]], implicit $exec
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_4:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_6]], 0, 0, implicit $exec :: (load (s32) from %ir.p4, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_28:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_27]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_4:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[V_ADD_U32_e32_7:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub3, [[V_SUB_U32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_8:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub1, [[V_ADD_U32_e32_7]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_29:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN1]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_28]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_9:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub3, [[V_OR_B32_e32_1]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_10:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORDX4_IMM]].sub3, [[V_LSHLREV_B32_e32_]], implicit $exec
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_5:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_8]], 0, 0, implicit $exec :: (load (s32) from %ir.p8, addrspace 3)
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_6:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_9]], 0, 0, implicit $exec :: (load (s32) from %ir.sunkaddr3, addrspace 3)
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_7:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_10]], 0, 0, implicit $exec :: (load (s32) from %ir.p5, addrspace 3)
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_3:%[0-9]+]].sub0:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_7]], 0, 0, implicit $exec :: (load (s32) from %ir.i85, align 8, addrspace 3)
- ; CHECK-NEXT: [[V_ADD_U32_e32_11:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[S_LOAD_DWORD_IMM2]], [[V_OR_B32_e32_6]], implicit $exec
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_8:%[0-9]+]].sub0:av_64_align2 = DS_READ_B32_gfx9 [[V_ADD_U32_e32_11]], 0, 0, implicit $exec :: (load (s32) from %ir.p2, align 16, addrspace 3)
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_9:%[0-9]+]].sub1:av_64_align2 = DS_READ_B32_gfx9 [[V_LSHLREV_B32_e32_2]], 8192, 0, implicit $exec :: (load (s32) from %ir.p13, addrspace 3)
- ; CHECK-NEXT: undef [[DS_READ_B32_gfx9_10:%[0-9]+]].sub0:av_64_align2 = DS_READ_B32_gfx9 [[V_XOR_B32_e32_4]], 8192, 0, implicit $exec :: (load (s32) from %ir.p9, align 16, addrspace 3)
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_5:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_6:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_7:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_9:%[0-9]+]].sub0:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DS_READ_B64_gfx9_:%[0-9]+]].sub1:av_64_align2 = COPY [[DS_READ_B32_gfx9_1]]
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_30:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[COPY4]], [[BUFFER_LOAD_DWORDX4_OFFSET4]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_14]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_8:%[0-9]+]].sub1:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[DS_READ_B32_gfx9_10:%[0-9]+]].sub1:av_64_align2 = COPY [[S_MOV_B32_1]].sub0
- ; CHECK-NEXT: [[V_MUL_LO_U32_e64_:%[0-9]+]]:vgpr_32 = V_MUL_LO_U32_e64 [[V_LSHRREV_B32_e32_]], [[S_LOAD_DWORDX4_IMM]].sub0, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_31:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_2]], [[BUFFER_LOAD_DWORDX4_OFFEN4]].sub0_sub1, 0, 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: undef [[V_AND_OR_B32_e64_:%[0-9]+]].sub0:vreg_64_align2 = V_AND_OR_B32_e64 [[V_LSHLREV_B32_e32_3]], 56, [[V_MUL_LO_U32_e64_]], implicit $exec
- ; CHECK-NEXT: [[V_AND_OR_B32_e64_1:%[0-9]+]]:vgpr_32 = V_AND_OR_B32_e64 [[COPY1]](s32), 48, [[V_AND_B32_e32_]], implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B32_e32_4:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e32 2, [[V_AND_OR_B32_e64_1]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_32:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET4]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_30]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_33:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[COPY15]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_31]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_34:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN5]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_32]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_35:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B64_gfx9_]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_33]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_36:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN5]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_34]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_37:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN6]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_35]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_38:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN9]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_36]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_39:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_4]], [[BUFFER_LOAD_DWORDX4_OFFEN6]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_37]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_40:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN9]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_38]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_41:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_5]], [[BUFFER_LOAD_DWORDX4_OFFEN8]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_39]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_42:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN10]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_40]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_43:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_3]], [[BUFFER_LOAD_DWORDX4_OFFEN8]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_41]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_44:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN10]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_42]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_45:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_6]], [[BUFFER_LOAD_DWORDX4_OFFSET5]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_43]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_46:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN7]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_44]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_47:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFSET5]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_45]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_48:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_9]], [[BUFFER_LOAD_DWORDX4_OFFEN7]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_46]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_49:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_7]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_47]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_50:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN11]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_48]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_51:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN13]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_49]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_52:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN11]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_50]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_53:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN3]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_29]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_54:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN14]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_51]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_55:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_10]], [[BUFFER_LOAD_DWORDX4_OFFEN12]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_52]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_56:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[DS_READ_B32_gfx9_8]], [[V_MOV_B1]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_53]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_57:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN15]].sub2_sub3, 0, 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DS_READ_B32_gfx9_]].sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_57]].sub0, 0, 0, implicit $exec :: (store (s32) into `ptr addrspace(3) null`, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_58:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN14]].sub2_sub3, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_54]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_59:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN15]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_55]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[V_LSHLREV_B32_e32_4]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_59]].sub0, 0, 0, implicit $exec :: (store (s32) into %ir.p21, addrspace 3)
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DS_READ_B32_gfx9_]].sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_58]].sub0, 0, 0, implicit $exec :: (store (s32) into `ptr addrspace(3) null`, addrspace 3)
- ; CHECK-NEXT: [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_60:%[0-9]+]]:vreg_128_align2 = V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64 [[V_MOV_B1]], [[BUFFER_LOAD_DWORDX4_OFFEN13]].sub0_sub1, [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_56]], 0, 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[COPY14]], [[V_MFMA_F32_16X16X32_FP8_FP8_vgprcd_e64_60]].sub0, 0, 0, implicit $exec :: (store (s32) into %ir.4, addrspace 3)
- ; CHECK-NEXT: [[V_AND_OR_B32_e64_:%[0-9]+]].sub1:vreg_64_align2 = V_ASHRREV_I32_e32 31, [[V_AND_OR_B32_e64_]].sub0, implicit $exec
- ; CHECK-NEXT: [[V_LSHLREV_B64_e64_:%[0-9]+]]:vreg_64_align2 = V_LSHLREV_B64_e64 1, [[V_AND_OR_B32_e64_]], implicit $exec
- ; CHECK-NEXT: GLOBAL_ATOMIC_PK_ADD_BF16 [[V_LSHLREV_B64_e64_]], [[DS_READ_B32_gfx9_]].sub1, 0, 0, implicit $exec :: (load store syncscope("agent") monotonic (s32) on %ir.p20, addrspace 1)
- ; CHECK-NEXT: S_ENDPGM 0
-entry:
- %i = tail call i32 @llvm.amdgcn.workitem.id.x()
- %i10 = lshr i32 %i, 3
- %n6 = and i32 %i, 1
- %m3 = shl i32 %n6, 1
- %n4 = or i32 %i, %arg0
- %d2 = or i32 %i, 1
- %s = and i32 %i, 3
- %d = or i32 %s, 20
- %n = or i32 %d, 1
- %i11 = getelementptr i8, ptr addrspace(3) %lds, i32 %n
- %p6 = getelementptr %f8, ptr addrspace(3) %i11, i32 %m3
- %i12 = getelementptr i8, ptr addrspace(3) %p6, i32 4
- %d3 = or i32 %s, 24
- %x = xor i32 %i, 1
- %x2 = xor i32 %x, %d2
- %x6 = xor i32 1, %s
- %n2 = or i32 %x2, %x6
- %i13 = shl i32 %n2, 1
- %x4 = xor i32 1, %d
- %n3 = or i32 %x4, %arg0
- %x3 = xor i32 1, %d3
- %x5 = and i32 %i, 31
- br label %loop
-
-loop: ; preds = %loop, %entry
- %phi0 = phi i32 [ 0, %entry ], [ %x5, %loop ]
- %i14 = phi <4 x i8> [ zeroinitializer, %entry ], [ %i29, %loop ]
- %i15 = phi <4 x i8> [ zeroinitializer, %entry ], [ %i27, %loop ]
- %p19 = getelementptr %f8, ptr addrspace(3) %lds, i32 %phi0
- store <4 x i32> zeroinitializer, ptr addrspace(3) %p19, align 16
- %i16 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i15, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac = bitcast <16 x i8> %i16 to <2 x i64>
- %i17 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i14, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac2 = bitcast <16 x i8> %i17 to <2 x i64>
- %ae = extractelement <2 x i64> %ac, i64 0
- %ae2 = extractelement <2 x i64> %ac2, i64 0
- %i18 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae, i64 0, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
- %i19 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae2, i64 0, <4 x float> %i18, i32 0, i32 0, i32 0)
- %i20 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 0, i32 0, i32 0)
- %i21 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 1, i32 0, i32 0)
- %i22 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 14336, i32 0, i32 0)
- %i23 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 15360, i32 0, i32 0)
- tail call void @llvm.amdgcn.s.waitcnt(i32 0)
- %i24 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 0, i32 0, i32 0)
- %i25 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %arg0, i32 0, i32 0)
- %i26 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 1, i32 0, i32 0)
- %p3 = getelementptr i8, ptr addrspace(3) %arg4, i32 %x
- %i27 = load <4 x i8>, ptr addrspace(3) %p3, align 4
- %n5 = sub i32 0, %x6
- %i28 = shl i32 %n5, 1
- %p = getelementptr i8, ptr addrspace(3) %arg4, i32 %i28
- %i29 = load <4 x i8>, ptr addrspace(3) %p, align 4
- %i30 = load <4 x i8>, ptr addrspace(3) zeroinitializer, align 8
- tail call void @llvm.amdgcn.sched.group.barrier(i32 0, i32 0, i32 0)
- br i1 %loopcond, label %loop, label %exit
-
-exit: ; preds = %loop
- %i31 = shl i32 %i, 1
- %i32 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 0, i32 0, i32 0)
- %i33 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 16384, i32 0, i32 0)
- %i34 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 17408, i32 0, i32 0)
- %i35 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %x7, i32 0, i32 0)
- %i36 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %arg2, i32 0, i32 0)
- %i37 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 19456, i32 0, i32 0)
- %d4 = sdiv i32 %arg6, 64
- %m5 = shl i32 %d4, 1
- %a = or i32 19456, %m5
- %i38 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %a, i32 0, i32 0)
- %i39 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %a7, i32 0, i32 0)
- %i40 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 20480, i32 0, i32 0)
- %i41 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 1, i32 0, i32 0)
- %i42 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %a3, i32 0, i32 0)
- %i43 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %a2, i32 0, i32 0)
- %i44 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 22528, i32 0, i32 0)
- %i45 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 23552, i32 0, i32 0)
- %i46 = tail call <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) null, i32 %arg0, i32 0, i32 0)
- store <4 x i32> %i24, ptr addrspace(3) zeroinitializer, align 16
- %p18 = getelementptr %f8, ptr addrspace(3) @shared, i32 %arg2
- store <4 x i32> %i25, ptr addrspace(3) %p18, align 16
- %p17 = getelementptr %f8, ptr addrspace(3) %p18, i32 512
- %i47 = or i32 %x5, 1
- %p16 = getelementptr %f8, ptr addrspace(3) %p17, i32 %i47
- store <4 x i32> %i26, ptr addrspace(3) %p16, align 16
- %i48 = load <4 x i8>, ptr addrspace(3) %p15, align 8
- %bc16 = bitcast <4 x i32> %i20 to <2 x i64>
- %be21 = extractelement <2 x i64> %bc16, i64 0
- %bc11 = bitcast <4 x i32> %i22 to <2 x i64>
- %be22 = extractelement <2 x i64> %bc11, i64 0
- %as6 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i27, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac7 = bitcast <16 x i8> %as6 to <2 x i64>
- %ae6 = extractelement <2 x i64> %ac7, i64 0
- %i49 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
- %i50 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae6, i64 0, <4 x float> %i49, i32 0, i32 0, i32 0)
- %as4 = shufflevector <4 x i8> %i30, <4 x i8> zeroinitializer, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %ac6 = bitcast <16 x i8> %as4 to <2 x i64>
- %ae7 = extractelement <2 x i64> %ac6, i64 1
- %i51 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 72340172838076673, i64 0, <4 x float> %i50, i32 0, i32 0, i32 0)
- %i52 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i51, i32 0, i32 0, i32 0)
- %i53 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae7, i64 0, <4 x float> %i52, i32 0, i32 0, i32 0)
- %i54 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i53, i32 0, i32 0, i32 0)
- %i55 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i54, i32 0, i32 0, i32 0)
- %i56 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be21, <4 x float> %i55, i32 0, i32 0, i32 0)
- %bc4 = bitcast <4 x i32> %i21 to <2 x i64>
- %be29 = extractelement <2 x i64> %bc4, i64 1
- %be14 = extractelement <2 x i64> %bc4, i64 0
- %i57 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be21, <4 x float> %i56, i32 0, i32 0, i32 0)
- %i58 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be14, <4 x float> %i57, i32 0, i32 0, i32 0)
- %i59 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be29, <4 x float> %i58, i32 0, i32 0, i32 0)
- %i60 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i59, i32 0, i32 0, i32 0)
- %i61 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i60, i32 0, i32 0, i32 0)
- %as8 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %arg9, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac9 = bitcast <16 x i8> %as8 to <2 x i64>
- %as13 = shufflevector <4 x i8> %arg8, <4 x i8> zeroinitializer, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %ac18 = bitcast <16 x i8> %as13 to <2 x i64>
- %ae18 = extractelement <2 x i64> %ac18, i64 1
- %i62 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i19, i32 0, i32 0, i32 0)
- %ae9 = extractelement <2 x i64> %ac9, i64 0
- %i63 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae18, i64 0, <4 x float> %i62, i32 0, i32 0, i32 0)
- %i64 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae9, i64 0, <4 x float> %i63, i32 0, i32 0, i32 0)
- %i65 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i64, i32 0, i32 0, i32 0)
- %as5 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i29, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac8 = bitcast <16 x i8> %as5 to <2 x i64>
- %ae8 = extractelement <2 x i64> %ac8, i64 0
- %i66 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i65, i32 0, i32 0, i32 0)
- %i67 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i66, i32 0, i32 0, i32 0)
- %i68 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae8, i64 0, <4 x float> %i67, i32 0, i32 0, i32 0)
- %i69 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i68, i32 0, i32 0, i32 0)
- %i70 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i69, i32 0, i32 0, i32 0)
- %i71 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i70, i32 0, i32 0, i32 0)
- %i72 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i71, i32 0, i32 0, i32 0)
- %i73 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 0, <4 x float> %i72, i32 0, i32 0, i32 0)
- %bc13 = bitcast <4 x i32> %i23 to <2 x i64>
- %i74 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be22, <4 x float> %i73, i32 0, i32 0, i32 0)
- %be20 = extractelement <2 x i64> %bc13, i64 0
- %be26 = extractelement <2 x i64> %bc11, i64 1
- %i75 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be26, <4 x float> %i74, i32 0, i32 0, i32 0)
- %i76 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be20, <4 x float> %i75, i32 0, i32 0, i32 0)
- %i77 = load <4 x i8>, ptr addrspace(3) zeroinitializer, align 8
- %i78 = getelementptr i8, ptr addrspace(3) @shared, i32 %i28
- %p10 = getelementptr %f8, ptr addrspace(3) %i78, i32 %arg1
- %i79 = load <4 x i8>, ptr addrspace(3) %p10, align 4
- %i80 = getelementptr i8, ptr addrspace(3) @shared, i32 %n4
- %p11 = getelementptr %f8, ptr addrspace(3) %i80, i32 %arg1
- %i81 = getelementptr i8, ptr addrspace(3) %p11, i32 4
- %i82 = load <4 x i8>, ptr addrspace(3) %i81, align 4
- %i83 = load <4 x i8>, ptr addrspace(3) %p11, align 8
- %p4 = getelementptr %f8, ptr addrspace(3) %arg, i32 %m3
- %i84 = load <4 x i8>, ptr addrspace(3) %p4, align 4
- %i85 = getelementptr i8, ptr addrspace(3) %lds, i32 %n5
- %p8 = getelementptr %f8, ptr addrspace(3) %i85, i32 %arg1
- %i86 = load <4 x i8>, ptr addrspace(3) %p8, align 4
- %i87 = load <4 x i8>, ptr addrspace(3) %i85, align 8
- %i88 = load <4 x i8>, ptr addrspace(3) %i11, align 4
- %p5 = getelementptr %f8, ptr addrspace(3) %lds, i32 %m3
- %i89 = load <4 x i8>, ptr addrspace(3) %p5, align 4
- %p13 = getelementptr i8, ptr addrspace(3) inttoptr (i32 8192 to ptr addrspace(3)), i32 %i13
- %i90 = load <4 x i8>, ptr addrspace(3) %p13, align 4
- %p2 = getelementptr i8, ptr addrspace(3) %arg4, i32 %n3
- %i91 = load <4 x i8>, ptr addrspace(3) %p2, align 16
- %p9 = getelementptr i8, ptr addrspace(3) inttoptr (i32 8192 to ptr addrspace(3)), i32 %x3
- %i92 = load <4 x i8>, ptr addrspace(3) %p9, align 16
- tail call void @llvm.amdgcn.sched.barrier(i32 0)
- %bc = bitcast <4 x i32> %i32 to <2 x i64>
- %be = extractelement <2 x i64> %bc, i64 1
- %as14 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i79, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %bc18 = bitcast <4 x i32> %i35 to <2 x i64>
- %be30 = extractelement <2 x i64> %bc18, i64 0
- %ac17 = bitcast <16 x i8> %as14 to <2 x i64>
- %ae17 = extractelement <2 x i64> %ac17, i64 1
- %as16 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i82, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %as10 = shufflevector <4 x i8> %i83, <4 x i8> %i48, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %ac15 = bitcast <16 x i8> %as16 to <2 x i64>
- %ae13 = extractelement <2 x i64> %ac15, i64 0
- %bc15 = bitcast <4 x i32> %i36 to <2 x i64>
- %be8 = extractelement <2 x i64> %bc15, i64 0
- %ac10 = bitcast <16 x i8> %as10 to <2 x i64>
- %ae10 = extractelement <2 x i64> %ac10, i64 1
- %as9 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i84, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %bc7 = bitcast <4 x i32> %i38 to <2 x i64>
- %be7 = extractelement <2 x i64> %bc7, i64 0
- %ac16 = bitcast <16 x i8> %as9 to <2 x i64>
- %ae11 = extractelement <2 x i64> %ac16, i64 1
- %as11 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i86, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %as7 = shufflevector <4 x i8> %i87, <4 x i8> %i77, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %ac11 = bitcast <16 x i8> %as11 to <2 x i64>
- %ae15 = extractelement <2 x i64> %ac11, i64 0
- %bc6 = bitcast <4 x i32> %i39 to <2 x i64>
- %be17 = extractelement <2 x i64> %bc6, i64 0
- %ac14 = bitcast <16 x i8> %as7 to <2 x i64>
- %ae12 = extractelement <2 x i64> %ac14, i64 1
- %as12 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i88, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac13 = bitcast <16 x i8> %as12 to <2 x i64>
- %ae16 = extractelement <2 x i64> %ac13, i64 0
- %bc14 = bitcast <4 x i32> %i42 to <2 x i64>
- %be28 = extractelement <2 x i64> %bc14, i64 0
- %as15 = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i89, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac12 = bitcast <16 x i8> %as15 to <2 x i64>
- %ae14 = extractelement <2 x i64> %ac12, i64 0
- %bc12 = bitcast <4 x i32> %i43 to <2 x i64>
- %be19 = extractelement <2 x i64> %bc12, i64 0
- %bc2 = bitcast <4 x i32> %i46 to <2 x i64>
- %be15 = extractelement <2 x i64> %bc2, i64 1
- %i93 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be15, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
- %bc17 = bitcast <4 x i32> %i33 to <2 x i64>
- %be32 = extractelement <2 x i64> %bc17, i64 0
- %bc19 = bitcast <4 x i32> %i34 to <2 x i64>
- %be33 = extractelement <2 x i64> %bc19, i64 0
- %i94 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae17, i64 %be33, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
- %i95 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae13, i64 0, <4 x float> %i94, i32 0, i32 0, i32 0)
- %i96 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae10, i64 0, <4 x float> %i95, i32 0, i32 0, i32 0)
- %bc5 = bitcast <4 x i32> %i37 to <2 x i64>
- %be10 = extractelement <2 x i64> %bc5, i64 0
- %i97 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be10, <4 x float> %i96, i32 0, i32 0, i32 0)
- %be12 = extractelement <2 x i64> %bc5, i64 1
- %i98 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae11, i64 %be12, <4 x float> %i97, i32 0, i32 0, i32 0)
- %bc10 = bitcast <4 x i32> %i40 to <2 x i64>
- %be16 = extractelement <2 x i64> %bc10, i64 0
- %i99 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae15, i64 %be16, <4 x float> %i98, i32 0, i32 0, i32 0)
- %be6 = extractelement <2 x i64> %bc10, i64 1
- %i100 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae12, i64 %be6, <4 x float> %i99, i32 0, i32 0, i32 0)
- %bc8 = bitcast <4 x i32> %i41 to <2 x i64>
- %be3 = extractelement <2 x i64> %bc8, i64 0
- %i101 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae16, i64 %be3, <4 x float> %i100, i32 0, i32 0, i32 0)
- %be11 = extractelement <2 x i64> %bc8, i64 1
- %i102 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be11, <4 x float> %i101, i32 0, i32 0, i32 0)
- %bc9 = bitcast <4 x i32> %i44 to <2 x i64>
- %i103 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae14, i64 0, <4 x float> %i102, i32 0, i32 0, i32 0)
- %be18 = extractelement <2 x i64> %bc9, i64 1
- %i104 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be18, <4 x float> %i103, i32 0, i32 0, i32 0)
- %bc3 = bitcast <4 x i32> %i45 to <2 x i64>
- %be4 = extractelement <2 x i64> %bc3, i64 0
- %i105 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be4, <4 x float> %i104, i32 0, i32 0, i32 0)
- %be25 = extractelement <2 x i64> %bc3, i64 1
- %i106 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be25, <4 x float> %i105, i32 0, i32 0, i32 0)
- %be2 = extractelement <2 x i64> %bc, i64 0
- %i107 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 72340172838076673, i64 %be2, <4 x float> %i61, i32 0, i32 0, i32 0)
- %i108 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be, <4 x float> %i107, i32 0, i32 0, i32 0)
- %i109 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be30, <4 x float> %i108, i32 0, i32 0, i32 0)
- %be31 = extractelement <2 x i64> %bc18, i64 1
- %i110 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be31, <4 x float> %i109, i32 0, i32 0, i32 0)
- %i111 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be8, <4 x float> %i110, i32 0, i32 0, i32 0)
- %be23 = extractelement <2 x i64> %bc15, i64 1
- %i112 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be23, <4 x float> %i111, i32 0, i32 0, i32 0)
- %i113 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be7, <4 x float> %i112, i32 0, i32 0, i32 0)
- %be5 = extractelement <2 x i64> %bc7, i64 1
- %i114 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be5, <4 x float> %i113, i32 0, i32 0, i32 0)
- %as = shufflevector <4 x i8> zeroinitializer, <4 x i8> %i90, <16 x i32> <i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
- %i115 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be17, <4 x float> %i114, i32 0, i32 0, i32 0)
- %ac4 = bitcast <16 x i8> %as to <2 x i64>
- %ae4 = extractelement <2 x i64> %ac4, i64 1
- %be13 = extractelement <2 x i64> %bc6, i64 1
- %i116 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae4, i64 %be13, <4 x float> %i115, i32 0, i32 0, i32 0)
- %as3 = shufflevector <4 x i8> %i91, <4 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac5 = bitcast <16 x i8> %as3 to <2 x i64>
- %ae3 = extractelement <2 x i64> %ac5, i64 0
- %i117 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be28, <4 x float> %i116, i32 0, i32 0, i32 0)
- %be9 = extractelement <2 x i64> %bc14, i64 1
- %i118 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be9, <4 x float> %i117, i32 0, i32 0, i32 0)
- %as2 = shufflevector <4 x i8> %i92, <4 x i8> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
- %ac3 = bitcast <16 x i8> %as2 to <2 x i64>
- %ae5 = extractelement <2 x i64> %ac3, i64 0
- %be27 = extractelement <2 x i64> %bc2, i64 0
- %i119 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae5, i64 %be19, <4 x float> %i118, i32 0, i32 0, i32 0)
- %i120 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be27, <4 x float> %i119, i32 0, i32 0, i32 0)
- %i121 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be32, <4 x float> %i76, i32 0, i32 0, i32 0)
- %be24 = extractelement <2 x i64> %bc9, i64 0
- %i122 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 %ae3, i64 0, <4 x float> %i121, i32 0, i32 0, i32 0)
- %i123 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64 0, i64 %be24, <4 x float> %i122, i32 0, i32 0, i32 0)
- %n7 = and i32 %i10, 31
- %m4 = and i32 %i31, 56
- %m = mul i32 %n7, %arg0
- %a6 = or i32 %m, %m4
- %ce4 = extractelement <4 x float> %i93, i64 0
- %m2 = and i32 %i, 48
- %i124 = getelementptr float, ptr addrspace(3) @shared, i32 %m2
- %p21 = getelementptr float, ptr addrspace(3) %i124, i32 %n6
- store float %ce4, ptr addrspace(3) zeroinitializer, align 4
- %ce3 = extractelement <4 x float> %i120, i64 0
- store float %ce3, ptr addrspace(3) %p21, align 4
- %ce2 = extractelement <4 x float> %i106, i64 0
- store float %ce2, ptr addrspace(3) zeroinitializer, align 4
- %ce = extractelement <4 x float> %i123, i64 0
- store float %ce, ptr addrspace(3) %p15, align 4
- %sx = sext i32 %a6 to i64
- %p20 = getelementptr i16, ptr addrspace(1) null, i64 %sx
- %i125 = atomicrmw fadd ptr addrspace(1) %p20, <2 x bfloat> zeroinitializer syncscope("agent") monotonic, align 4
- ret void
-}
-
-; Function Attrs: convergent nocallback nocreateundeforpoison nofree nosync nounwind willreturn memory(none)
-declare <4 x float> @llvm.amdgcn.mfma.f32.16x16x32.fp8.fp8(i64, i64, <4 x float>, i32 immarg, i32 immarg, i32 immarg) #1
-
-; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
-declare <4 x i32> @llvm.amdgcn.raw.ptr.buffer.load.v4i32(ptr addrspace(8) readonly captures(none), i32, i32, i32 immarg) #2
-
-; Function Attrs: nocallback nofree nounwind willreturn
-declare void @llvm.amdgcn.s.waitcnt(i32 immarg) #3
-
-; Function Attrs: convergent nocallback nofree nounwind willreturn
-declare void @llvm.amdgcn.sched.barrier(i32 immarg) #4
-
-; Function Attrs: convergent nocallback nofree nounwind willreturn
-declare void @llvm.amdgcn.sched.group.barrier(i32 immarg, i32 immarg, i32 immarg) #4
-
-; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
-declare noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x() #5
-
-attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="2" "target-cpu"="gfx942" }
-attributes #1 = { convergent nocallback nocreateundeforpoison nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx942" }
-attributes #2 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) "target-cpu"="gfx942" }
-attributes #3 = { nocallback nofree nounwind willreturn "target-cpu"="gfx942" }
-attributes #4 = { convergent nocallback nofree nounwind willreturn "target-cpu"="gfx942" }
-attributes #5 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx942" }
diff --git a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir
index 34c82cf2c73fb..4954c073f9134 100644
--- a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir
+++ b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir
@@ -227,25 +227,25 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -323,25 +323,25 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -420,36 +420,36 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -534,37 +534,37 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -652,30 +652,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -683,11 +684,10 @@ body: |
; CHECK-NEXT: KILL [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -780,30 +780,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -811,12 +812,11 @@ body: |
; CHECK-NEXT: KILL [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -911,26 +911,27 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -942,16 +943,16 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.6(0x80000000)
@@ -959,11 +960,10 @@ body: |
; CHECK-NEXT: KILL [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1068,26 +1068,27 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -1099,16 +1100,16 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.6(0x80000000)
@@ -1116,12 +1117,11 @@ body: |
; CHECK-NEXT: KILL [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1227,18 +1227,19 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -1249,18 +1250,17 @@ body: |
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1348,18 +1348,19 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -1370,22 +1371,21 @@ body: |
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1477,30 +1477,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x40000000), %bb.5(0x40000000)
@@ -1511,20 +1512,19 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.7
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1622,30 +1622,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x40000000), %bb.5(0x40000000)
@@ -1656,22 +1657,21 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.7
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1770,36 +1770,37 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.4, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_6:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_7:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_8:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_9:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_10:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_11:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_6:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_7:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_8:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_9:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_10:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_11:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -1815,18 +1816,17 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.6(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -1936,30 +1936,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.4, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -1975,22 +1976,21 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.6
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.6(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_1]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
- ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2096,26 +2096,27 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -2127,16 +2128,16 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.6(0x80000000)
@@ -2152,20 +2153,19 @@ body: |
; CHECK-NEXT: bb.7:
; CHECK-NEXT: successors: %bb.9(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.9
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
; CHECK-NEXT: successors: %bb.9(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.9:
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2281,26 +2281,27 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -2312,10 +2313,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -2326,22 +2327,21 @@ body: |
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
- ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2453,23 +2453,25 @@ body: |
; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = COPY [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_1:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_2:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_3:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_4:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_5:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_1:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_2:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_3:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_4:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_5:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -2486,10 +2488,8 @@ body: |
; CHECK-NEXT: [[COPY7:%[0-9]+]]:vreg_128_align2 = COPY [[COPY]]
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[COPY7]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[COPY7]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[COPY7]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2583,24 +2583,25 @@ body: |
; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = COPY [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_1:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_2:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_3:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_4:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_5:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_1:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_2:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_3:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_4:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64_5:%[0-9]+]]:areg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_e64 [[DEF12]], [[DEF13]], [[COPY]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -2615,11 +2616,10 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: [[COPY7:%[0-9]+]]:vreg_128_align2 = COPY [[COPY]]
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[COPY7]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[COPY7]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[COPY7]], [[V_ADD_U32_e32_]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[COPY7]], [[V_ADD_U32_e32_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2710,36 +2710,36 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 128, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2825,37 +2825,37 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 128, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -2944,28 +2944,28 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 128, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_1]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_1]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3053,29 +3053,29 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 128, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]], 0, 0, implicit $exec
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_1]], 128, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_1]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3163,10 +3163,12 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
@@ -3191,18 +3193,16 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]], 128, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3299,10 +3299,12 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
@@ -3327,19 +3329,17 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_1]], 128, 0, implicit $exec
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]], 384, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3439,15 +3439,17 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -3467,10 +3469,8 @@ body: |
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3559,15 +3559,17 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -3591,10 +3593,8 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3687,29 +3687,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 256, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 256, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x40000000), %bb.5(0x40000000)
@@ -3720,21 +3722,19 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 0, 0, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 128, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 128, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.7
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 0, 0, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 128, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3833,29 +3833,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 256, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 256, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x40000000), %bb.5(0x40000000)
@@ -3866,23 +3868,21 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 0, 0, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 128, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 128, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.7
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 0, 0, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 128, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
- ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF14]], [[DS_READ_B128_gfx9_]].sub0, 256, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF15]], [[DS_READ_B128_gfx9_]].sub0, 256, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF15]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF16]], [[DEF17]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -3984,20 +3984,22 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 256, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -4017,10 +4019,8 @@ body: |
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_1]], 256, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_1]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4118,20 +4118,22 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 256, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
@@ -4155,10 +4157,8 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_1]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4259,10 +4259,12 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
@@ -4287,10 +4289,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -4312,10 +4314,8 @@ body: |
; CHECK-NEXT: DS_WRITE_B32_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_]].sub0, 128, 0, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4424,10 +4424,12 @@ body: |
; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF11]], 0, 0, implicit $exec
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
@@ -4452,10 +4454,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF15]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF16]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DS_READ_B128_gfx9_1]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF17]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF18]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF12]], [[DEF13]], [[DEF19]], 4, 4, [[DEF15]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -4479,10 +4481,8 @@ body: |
; CHECK-NEXT: bb.8:
; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF11]], [[DS_READ_B128_gfx9_1]], 256, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DS_READ_B128_gfx9_]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF14]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[DS_READ_B128_gfx9_]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4591,25 +4591,25 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4689,26 +4689,27 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -4720,10 +4721,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -4734,21 +4735,20 @@ body: |
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF14]], [[V_ADD_U32_e32_]], 0, 0, implicit $exec
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF15]], [[V_ADD_U32_e32_]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: S_BRANCH %bb.8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[DEF19:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[DEF19:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[DEF20:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_ADD_U32_e32_]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF21:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF21]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_]], [[DEF19]]
+ ; CHECK-NEXT: [[DEF21:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF21]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_]], [[DEF20]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4856,30 +4856,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.3, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF16]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF19]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
@@ -4887,12 +4888,11 @@ body: |
; CHECK-NEXT: KILL [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_4]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_5]]
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF15]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF14]], [[V_ADD_U32_e32_1]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF16]].sub1, [[V_ADD_U32_e32_]].sub0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF15]], [[V_ADD_U32_e32_1]], 0, 0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF15]], [[DEF16]], [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[DEF17]], [[DEF18]], [[DEF19]], [[V_ADD_U32_e32_1]], [[V_ADD_U32_e32_2]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -4987,30 +4987,31 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF14]], 0, 0, implicit $exec
+ ; CHECK-NEXT: [[DS_READ_B128_gfx9_:%[0-9]+]]:vreg_128_align2 = DS_READ_B128_gfx9 [[DEF15]], 0, 0, implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[DS_READ_B128_gfx9_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub1, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[DS_READ_B128_gfx9_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub1, [[DEF15]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DS_READ_B128_gfx9_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.6(0x40000000), %bb.5(0x40000000)
@@ -5021,21 +5022,20 @@ body: |
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub1, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.7
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.7(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF14]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
+ ; CHECK-NEXT: DS_WRITE_B128_gfx9 [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], 0, 0, implicit $exec
; CHECK-NEXT: [[V_ADD_U32_e32_:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
- ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DS_READ_B128_gfx9_]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF16]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF17]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[V_ADD_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DS_READ_B128_gfx9_]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF17]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_3]], [[V_ADD_U32_e32_]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -5133,47 +5133,47 @@ body: |
; CHECK-NEXT: SCHED_BARRIER 0
; CHECK-NEXT: [[DEF11:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF12:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF15:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: dead [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF15]].sub0, [[DEF14]], implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF16]].sub0, [[DEF15]], implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_ADD_U32_e32_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.4, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.4(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
- ; CHECK-NEXT: [[DEF17:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF17]], 4, 4, [[DEF13]].sub0, [[DEF14]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF11]], [[DEF12]], [[DEF18]], 4, 4, [[DEF14]].sub0, [[DEF15]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4:
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF17]].sub0, [[DEF14]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF18]].sub0, [[DEF15]], implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF18]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF19]], [[DEF15]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[DEF17]], [[V_ADD_U32_e32_1]]
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF13]], [[DEF16]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_1]], [[V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64_2]], [[DEF18]], [[V_ADD_U32_e32_1]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir
index 1cb6bdaf72666..bf50a81979876 100644
--- a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir
+++ b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir
@@ -55,25 +55,26 @@ body: |
; CHECK-NEXT: S_NOP 0, implicit-def %13
; CHECK-NEXT: [[DEF10:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: dead [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: dead [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -85,10 +86,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF16]], [[DEF17]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF18]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF19]].sub0, [[DEF12]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -99,8 +100,8 @@ body: |
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
@@ -112,21 +113,20 @@ body: |
; CHECK-NEXT: bb.8:
; CHECK-NEXT: successors: %bb.9(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_3:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_3:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_3:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.9:
; CHECK-NEXT: successors: %bb.10(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_4:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF13]].sub0, implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_4:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub2, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_4:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF14]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_4:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub2, [[DEF12]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.10:
- ; CHECK-NEXT: [[V_ADD_U32_e32_5:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF13]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_5:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF14]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF12]], [[DEF13]], [[V_ADD_U32_e32_4]], [[V_ADD_U32_e32_5]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF11]], [[DEF13]], [[DEF14]], [[V_ADD_U32_e32_4]], [[V_ADD_U32_e32_5]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
@@ -239,25 +239,26 @@ body: |
; CHECK-NEXT: S_NOP 0, implicit-def %13
; CHECK-NEXT: [[DEF10:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: dead [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: dead [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -269,10 +270,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF16]], [[DEF17]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF18]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF19]].sub0, [[DEF12]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -283,22 +284,21 @@ body: |
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
- ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF12]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF13]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF12]], [[DEF13]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF11]], [[DEF13]], [[DEF14]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir
index ec9b1b2079d0d..19612060d5733 100644
--- a/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir
+++ b/llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir
@@ -44,25 +44,26 @@ body: |
; CHECK-NEXT: S_NOP 0, implicit-def %13
; CHECK-NEXT: [[DEF10:%[0-9]+]]:vreg_1024_align2 = IMPLICIT_DEF
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF11:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF12:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF13:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF14:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: dead [[DEF15:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: dead [[DEF16:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: dead undef [[V_ADD_U32_e32_:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: $scc = IMPLICIT_DEF
; CHECK-NEXT: S_CBRANCH_SCC1 %bb.2, implicit killed $scc
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF12]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_1:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:
@@ -74,10 +75,10 @@ body: |
; CHECK-NEXT: bb.4:
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[DEF16:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
; CHECK-NEXT: [[DEF17:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF18:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF10:%[0-9]+]].sub0_sub1_sub2_sub3:vreg_1024_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF16]], [[DEF17]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF18]].sub0, [[DEF11]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[DEF18:%[0-9]+]]:av_128_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[DEF10:%[0-9]+]].sub0_sub1_sub2_sub3:vreg_1024_align2 = contract nofpexcept V_MFMA_SCALE_F32_16X16X128_F8F6F4_f4_f4_vgprcd_e64 [[DEF17]], [[DEF18]], [[V_ADD_U32_e32_1]], 4, 4, [[DEF19]].sub0, [[DEF12]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.7(0x40000000), %bb.6(0x40000000)
@@ -88,22 +89,21 @@ body: |
; CHECK-NEXT: bb.6:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
; CHECK-NEXT: S_BRANCH %bb.8
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.7:
; CHECK-NEXT: successors: %bb.8(0x80000000)
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub0, [[DEF11]], implicit $exec
- ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF13]].sub1, [[DEF11]], implicit $exec
+ ; CHECK-NEXT: undef [[V_ADD_U32_e32_2:%[0-9]+]].sub0:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub0, [[DEF12]], implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_2:%[0-9]+]].sub1:vreg_128_align2 = V_ADD_U32_e32 [[DEF14]].sub1, [[DEF12]], implicit $exec
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.8:
- ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF12]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e32_3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e32 [[DEF13]].sub1, [[V_ADD_U32_e32_1]].sub0, implicit $exec
; CHECK-NEXT: SCHED_BARRIER 0
- ; CHECK-NEXT: [[DEF19:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
- ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_128_align2 = IMPLICIT_DEF
- ; CHECK-NEXT: KILL [[DEF19]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF20]], [[DEF12]], [[DEF13]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
+ ; CHECK-NEXT: [[DEF20:%[0-9]+]]:vreg_1024 = IMPLICIT_DEF
+ ; CHECK-NEXT: KILL [[DEF20]], [[DEF]], [[DEF1]], [[DEF2]], [[DEF3]], [[DEF4]], [[DEF5]], [[DEF6]], [[DEF7]], [[DEF8]], [[DEF9]], [[DEF10]], [[DEF11]], [[DEF13]], [[DEF14]], [[V_ADD_U32_e32_2]], [[V_ADD_U32_e32_3]]
; CHECK-NEXT: S_NOP 0, implicit %12, implicit %13
; CHECK-NEXT: S_ENDPGM 0
bb.0:
More information about the llvm-branch-commits
mailing list