[llvm] r286766 - AMDGPU: Implement SGPR spilling with scalar stores
Arsenault, Matthew via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 16 19:14:31 PST 2016
I think this is caused by the optimization in eliminateRedundantSpills. It decides to eliminate a redundant spill and replaces the store opcode with KILL. This doesn't need the implicit def of m0 on the spill pseudo anymore, but nothing removes it.
________________________________
From: Nicolai Hähnle <nhaehnle at gmail.com>
Sent: Tuesday, November 15, 2016 2:18:03 AM
To: Arsenault, Matthew; llvm-commits at lists.llvm.org
Subject: Re: [llvm] r286766 - AMDGPU: Implement SGPR spilling with scalar stores
Hi Matt,
this change causes a regression. Compiling the attached shader with
llc -march=amdgcn -mattr=+vgpr-spilling -mcpu=tonga
triggers an assertion:
llc: /home/nha/amd/llvm/llvm/lib/CodeGen/LiveRangeEdit.cpp:248: void
llvm::LiveRangeEdit::eliminateDeadDef(llvm::MachineInstr*,
llvm::LiveRangeEdit::ToShrinkSet&, llvm::AliasAnalysis*): Assertion
`MI->allDefsAreDead() && "Def isn't really dead"' failed.
Any idea why?
Thanks,
Nicolai
On 13.11.2016 19:20, Matt Arsenault via llvm-commits wrote:
> Author: arsenm
> Date: Sun Nov 13 12:20:54 2016
> New Revision: 286766
>
> URL: http://llvm.org/viewvc/llvm-project?rev=286766&view=rev
> Log:
> AMDGPU: Implement SGPR spilling with scalar stores
>
> nThis avoids the nasty problems caused by using
> memory instructions that read the exec mask while
> spilling / restoring registers used for control flow
> masking, but only for VI when these were added.
>
> This always uses the scalar stores when enabled currently,
> but it may be better to still try to spill to a VGPR
> and use this on the fallback memory path.
>
> The cache also needs to be flushed before wave termination
> if a scalar store is used.
>
> Added:
> llvm/trunk/test/CodeGen/MIR/AMDGPU/scalar-store-cache-flush.mir
> Modified:
> llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp
> llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp
> llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp
> llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
> llvm/trunk/test/CodeGen/AMDGPU/basic-branch.ll
> llvm/trunk/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll
> llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll
>
> Modified: llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp (original)
> +++ llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp Sun Nov 13 12:20:54 2016
> @@ -532,6 +532,7 @@ bool SIInsertWaits::runOnMachineFunction
> TRI = &TII->getRegisterInfo();
> MRI = &MF.getRegInfo();
> IV = getIsaVersion(ST->getFeatureBits());
> + const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
>
> HardwareLimits.Named.VM = getVmcntBitMask(IV);
> HardwareLimits.Named.EXP = getExpcntBitMask(IV);
> @@ -543,20 +544,27 @@ bool SIInsertWaits::runOnMachineFunction
> LastOpcodeType = OTHER;
> LastInstWritesM0 = false;
> IsFlatOutstanding = false;
> - ReturnsVoid = MF.getInfo<SIMachineFunctionInfo>()->returnsVoid();
> + ReturnsVoid = MFI->returnsVoid();
>
> memset(&UsedRegs, 0, sizeof(UsedRegs));
> memset(&DefinedRegs, 0, sizeof(DefinedRegs));
>
> SmallVector<MachineInstr *, 4> RemoveMI;
> + SmallVector<MachineBasicBlock *, 4> EndPgmBlocks;
> +
> + bool HaveScalarStores = false;
>
> for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
> BI != BE; ++BI) {
>
> MachineBasicBlock &MBB = *BI;
> +
> for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
> I != E; ++I) {
>
> + if (!HaveScalarStores && TII->isScalarStore(*I))
> + HaveScalarStores = true;
> +
> if (ST->getGeneration() <= SISubtarget::SEA_ISLANDS) {
> // There is a hardware bug on CI/SI where SMRD instruction may corrupt
> // vccz bit, so when we detect that an instruction may read from a
> @@ -625,12 +633,45 @@ bool SIInsertWaits::runOnMachineFunction
>
> pushInstruction(MBB, I, Increment);
> handleSendMsg(MBB, I);
> +
> + if (I->getOpcode() == AMDGPU::S_ENDPGM ||
> + I->getOpcode() == AMDGPU::SI_RETURN)
> + EndPgmBlocks.push_back(&MBB);
> }
>
> // Wait for everything at the end of the MBB
> Changes |= insertWait(MBB, MBB.getFirstTerminator(), LastIssued);
> }
>
> + if (HaveScalarStores) {
> + // If scalar writes are used, the cache must be flushed or else the next
> + // wave to reuse the same scratch memory can be clobbered.
> + //
> + // Insert s_dcache_wb at wave termination points if there were any scalar
> + // stores, and only if the cache hasn't already been flushed. This could be
> + // improved by looking across blocks for flushes in postdominating blocks
> + // from the stores but an explicitly requested flush is probably very rare.
> + for (MachineBasicBlock *MBB : EndPgmBlocks) {
> + bool SeenDCacheWB = false;
> +
> + for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end();
> + I != E; ++I) {
> +
> + if (I->getOpcode() == AMDGPU::S_DCACHE_WB)
> + SeenDCacheWB = true;
> + else if (TII->isScalarStore(*I))
> + SeenDCacheWB = false;
> +
> + // FIXME: It would be better to insert this before a waitcnt if any.
> + if ((I->getOpcode() == AMDGPU::S_ENDPGM ||
> + I->getOpcode() == AMDGPU::SI_RETURN) && !SeenDCacheWB) {
> + Changes = true;
> + BuildMI(*MBB, I, I->getDebugLoc(), TII->get(AMDGPU::S_DCACHE_WB));
> + }
> + }
> + }
> + }
> +
> for (MachineInstr *I : RemoveMI)
> I->eraseFromParent();
>
>
> Modified: llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp (original)
> +++ llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp Sun Nov 13 12:20:54 2016
> @@ -539,7 +539,7 @@ void SIInstrInfo::storeRegToStackSlot(Ma
> MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass);
> }
>
> - BuildMI(MBB, MI, DL, OpDesc)
> + MachineInstrBuilder Spill = BuildMI(MBB, MI, DL, OpDesc)
> .addReg(SrcReg, getKillRegState(isKill)) // data
> .addFrameIndex(FrameIndex) // addr
> .addMemOperand(MMO)
> @@ -549,6 +549,11 @@ void SIInstrInfo::storeRegToStackSlot(Ma
> // needing them, and need to ensure that the reserved registers are
> // correctly handled.
>
> + if (ST.hasScalarStores()) {
> + // m0 is used for offset to scalar stores if used to spill.
> + Spill.addReg(AMDGPU::M0, RegState::ImplicitDefine);
> + }
> +
> return;
> }
>
> @@ -638,12 +643,17 @@ void SIInstrInfo::loadRegFromStackSlot(M
> MRI.constrainRegClass(DestReg, &AMDGPU::SReg_32_XM0RegClass);
> }
>
> - BuildMI(MBB, MI, DL, OpDesc, DestReg)
> + MachineInstrBuilder Spill = BuildMI(MBB, MI, DL, OpDesc, DestReg)
> .addFrameIndex(FrameIndex) // addr
> .addMemOperand(MMO)
> .addReg(MFI->getScratchRSrcReg(), RegState::Implicit)
> .addReg(MFI->getScratchWaveOffsetReg(), RegState::Implicit);
>
> + if (ST.hasScalarStores()) {
> + // m0 is used for offset to scalar stores if used to spill.
> + Spill.addReg(AMDGPU::M0, RegState::ImplicitDefine);
> + }
> +
> return;
> }
>
>
> Modified: llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp (original)
> +++ llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp Sun Nov 13 12:20:54 2016
> @@ -24,6 +24,12 @@
>
> using namespace llvm;
>
> +static cl::opt<bool> EnableSpillSGPRToSMEM(
> + "amdgpu-spill-sgpr-to-smem",
> + cl::desc("Use scalar stores to spill SGPRs if supported by subtarget"),
> + cl::init(true));
> +
> +
> static bool hasPressureSet(const int *PSets, unsigned PSetID) {
> for (unsigned i = 0; PSets[i] != -1; ++i) {
> if (PSets[i] == (int)PSetID)
> @@ -475,18 +481,21 @@ void SIRegisterInfo::buildSpillLoadStore
> void SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,
> int Index,
> RegScavenger *RS) const {
> - MachineFunction *MF = MI->getParent()->getParent();
> - MachineRegisterInfo &MRI = MF->getRegInfo();
> MachineBasicBlock *MBB = MI->getParent();
> - SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
> - MachineFrameInfo &FrameInfo = MF->getFrameInfo();
> + MachineFunction *MF = MBB->getParent();
> + MachineRegisterInfo &MRI = MF->getRegInfo();
> const SISubtarget &ST = MF->getSubtarget<SISubtarget>();
> const SIInstrInfo *TII = ST.getInstrInfo();
> - const DebugLoc &DL = MI->getDebugLoc();
>
> unsigned NumSubRegs = getNumSubRegsForSpillOp(MI->getOpcode());
> unsigned SuperReg = MI->getOperand(0).getReg();
> bool IsKill = MI->getOperand(0).isKill();
> + const DebugLoc &DL = MI->getDebugLoc();
> +
> + SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
> + MachineFrameInfo &FrameInfo = MF->getFrameInfo();
> +
> + bool SpillToSMEM = ST.hasScalarStores() && EnableSpillSGPRToSMEM;
>
> // SubReg carries the "Kill" flag when SubReg == SuperReg.
> unsigned SubKillState = getKillRegState((NumSubRegs == 1) && IsKill);
> @@ -494,6 +503,55 @@ void SIRegisterInfo::spillSGPR(MachineBa
> unsigned SubReg = NumSubRegs == 1 ?
> SuperReg : getSubReg(SuperReg, getSubRegFromChannel(i));
>
> + if (SpillToSMEM) {
> + if (SuperReg == AMDGPU::M0) {
> + assert(NumSubRegs == 1);
> + unsigned CopyM0
> + = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
> +
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), CopyM0)
> + .addReg(AMDGPU::M0, getKillRegState(IsKill));
> +
> + // The real spill now kills the temp copy.
> + SubReg = SuperReg = CopyM0;
> + IsKill = true;
> + }
> +
> + int64_t FrOffset = FrameInfo.getObjectOffset(Index);
> + unsigned Size = FrameInfo.getObjectSize(Index);
> + unsigned Align = FrameInfo.getObjectAlignment(Index);
> + MachinePointerInfo PtrInfo
> + = MachinePointerInfo::getFixedStack(*MF, Index);
> + MachineMemOperand *MMO
> + = MF->getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
> + Size, Align);
> +
> + unsigned OffsetReg = AMDGPU::M0;
> + // Add i * 4 wave offset.
> + //
> + // SMEM instructions only support a single offset, so increment the wave
> + // offset.
> +
> + int64_t Offset = ST.getWavefrontSize() * (FrOffset + 4 * i);
> + if (Offset != 0) {
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), OffsetReg)
> + .addReg(MFI->getScratchWaveOffsetReg())
> + .addImm(Offset);
> + } else {
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
> + .addReg(MFI->getScratchWaveOffsetReg());
> + }
> +
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_BUFFER_STORE_DWORD_SGPR))
> + .addReg(SubReg, getKillRegState(IsKill)) // sdata
> + .addReg(MFI->getScratchRSrcReg()) // sbase
> + .addReg(OffsetReg) // soff
> + .addImm(0) // glc
> + .addMemOperand(MMO);
> +
> + continue;
> + }
> +
> struct SIMachineFunctionInfo::SpilledReg Spill =
> MFI->getSpilledReg(MF, Index, i);
> if (Spill.hasReg()) {
> @@ -520,10 +578,9 @@ void SIRegisterInfo::spillSGPR(MachineBa
> // it are fixed.
> } else {
> // Spill SGPR to a frame index.
> - // FIXME we should use S_STORE_DWORD here for VI.
> -
> // TODO: Should VI try to spill to VGPR and then spill to SMEM?
> unsigned TmpReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
> + // TODO: Should VI try to spill to VGPR and then spill to SMEM?
>
> MachineInstrBuilder Mov
> = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)
> @@ -575,6 +632,7 @@ void SIRegisterInfo::restoreSGPR(Machine
>
> unsigned NumSubRegs = getNumSubRegsForSpillOp(MI->getOpcode());
> unsigned SuperReg = MI->getOperand(0).getReg();
> + bool SpillToSMEM = ST.hasScalarStores() && EnableSpillSGPRToSMEM;
>
> // m0 is not allowed as with readlane/writelane, so a temporary SGPR and
> // extra copy is needed.
> @@ -584,10 +642,44 @@ void SIRegisterInfo::restoreSGPR(Machine
> SuperReg = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
> }
>
> + int64_t FrOffset = FrameInfo.getObjectOffset(Index);
> +
> for (unsigned i = 0, e = NumSubRegs; i < e; ++i) {
> unsigned SubReg = NumSubRegs == 1 ?
> SuperReg : getSubReg(SuperReg, getSubRegFromChannel(i));
>
> + if (SpillToSMEM) {
> + unsigned Size = FrameInfo.getObjectSize(Index);
> + unsigned Align = FrameInfo.getObjectAlignment(Index);
> + MachinePointerInfo PtrInfo
> + = MachinePointerInfo::getFixedStack(*MF, Index);
> + MachineMemOperand *MMO
> + = MF->getMachineMemOperand(PtrInfo, MachineMemOperand::MOLoad,
> + Size, Align);
> +
> + unsigned OffsetReg = AMDGPU::M0;
> +
> + // Add i * 4 offset
> + int64_t Offset = ST.getWavefrontSize() * (FrOffset + 4 * i);
> + if (Offset != 0) {
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), OffsetReg)
> + .addReg(MFI->getScratchWaveOffsetReg())
> + .addImm(Offset);
> + } else {
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
> + .addReg(MFI->getScratchWaveOffsetReg());
> + }
> +
> + BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_BUFFER_LOAD_DWORD_SGPR), SubReg)
> + .addReg(MFI->getScratchRSrcReg()) // sbase
> + .addReg(OffsetReg) // soff
> + .addImm(0) // glc
> + .addMemOperand(MMO)
> + .addReg(MI->getOperand(0).getReg(), RegState::ImplicitDefine);
> +
> + continue;
> + }
> +
> SIMachineFunctionInfo::SpilledReg Spill
> = MFI->getSpilledReg(MF, Index, i);
>
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll Sun Nov 13 12:20:54 2016
> @@ -1,16 +1,20 @@
> -; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s | FileCheck %s
> +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s | FileCheck -check-prefix=TOSGPR -check-prefix=ALL %s
> +; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -amdgpu-spill-sgpr-to-smem=1 -verify-machineinstrs < %s | FileCheck -check-prefix=TOSMEM -check-prefix=ALL %s
>
> -; CHECK-LABEL: {{^}}max_14_sgprs:
> +; If spilling to smem, additional registers are used for the resource
> +; descriptor.
> +
> +; ALL-LABEL: {{^}}max_14_sgprs:
>
> ; FIXME: Should be ablo to skip this copying of the private segment
> ; buffer because all the SGPR spills are to VGPRs.
>
> -; CHECK: s_mov_b64 s[6:7], s[2:3]
> -; CHECK: s_mov_b64 s[4:5], s[0:1]
> -
> -; CHECK: SGPRBlocks: 1
> -; CHECK: NumSGPRsForWavesPerEU: 14
> +; ALL: s_mov_b64 s[6:7], s[2:3]
> +; ALL: s_mov_b64 s[4:5], s[0:1]
> +; ALL: SGPRBlocks: 1
> +; ALL: NumSGPRsForWavesPerEU: 14
> define void @max_14_sgprs(i32 addrspace(1)* %out1,
> +
> i32 addrspace(1)* %out2,
> i32 addrspace(1)* %out3,
> i32 addrspace(1)* %out4,
> @@ -31,7 +35,7 @@ define void @max_14_sgprs(i32 addrspace(
> ; ---------------------
> ; total: 14
>
> -; + reserved vcc, flat_scratch = 18
> +; + reserved vcc, xnack, flat_scratch = 20
>
> ; Because we can't handle re-using the last few input registers as the
> ; special vcc etc. registers (as well as decide to not use the unused
> @@ -40,14 +44,14 @@ define void @max_14_sgprs(i32 addrspace(
>
> ; ALL-LABEL: {{^}}max_12_sgprs_14_input_sgprs:
> ; TOSGPR: SGPRBlocks: 2
> -; TOSGPR: NumSGPRsForWavesPerEU: 18
> +; TOSGPR: NumSGPRsForWavesPerEU: 20
>
> ; TOSMEM: s_mov_b64 s[6:7], s[2:3]
> -; TOSMEM: s_mov_b32 s9, s13
> ; TOSMEM: s_mov_b64 s[4:5], s[0:1]
> +; TOSMEM: s_mov_b32 s3, s13
>
> ; TOSMEM: SGPRBlocks: 2
> -; TOSMEM: NumSGPRsForWavesPerEU: 18
> +; TOSMEM: NumSGPRsForWavesPerEU: 20
> define void @max_12_sgprs_14_input_sgprs(i32 addrspace(1)* %out1,
> i32 addrspace(1)* %out2,
> i32 addrspace(1)* %out3,
> @@ -79,12 +83,12 @@ define void @max_12_sgprs_14_input_sgprs
> ; ; swapping the order the registers are copied from what normally
> ; ; happens.
>
> -; TOSMEM: s_mov_b64 s[6:7], s[2:3]
> -; TOSMEM: s_mov_b64 s[4:5], s[0:1]
> -; TOSMEM: s_mov_b32 s3, s11
> +; TOSMEM: s_mov_b32 s5, s11
> +; TOSMEM: s_add_u32 m0, s5,
> +; TOSMEM: s_buffer_store_dword vcc_lo, s[0:3], m0
>
> -; ALL: SGPRBlocks: 1
> -; ALL: NumSGPRsForWavesPerEU: 16
> +; ALL: SGPRBlocks: 2
> +; ALL: NumSGPRsForWavesPerEU: 18
> define void @max_12_sgprs_12_input_sgprs(i32 addrspace(1)* %out1,
> i32 addrspace(1)* %out2,
> i32 addrspace(1)* %out3,
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/basic-branch.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/basic-branch.ll?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/basic-branch.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/basic-branch.ll Sun Nov 13 12:20:54 2016
> @@ -1,5 +1,5 @@
> ; RUN: llc -O0 -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s
> -; RUN: llc -O0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s
> +; RUN: llc -O0 -march=amdgcn -mcpu=tonga -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s | FileCheck -check-prefix=GCNNOOPT -check-prefix=GCN %s
> ; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s
> ; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=GCNOPT -check-prefix=GCN %s
>
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll Sun Nov 13 12:20:54 2016
> @@ -1,14 +1,44 @@
> -; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s | FileCheck %s
> +; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-spill-sgpr-to-smem=0 -verify-machineinstrs < %s | FileCheck -check-prefix=ALL -check-prefix=SGPR %s
> +; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-spill-sgpr-to-smem=1 -verify-machineinstrs < %s | FileCheck -check-prefix=ALL -check-prefix=SMEM %s
>
> ; Make sure this doesn't crash.
> -; CHECK: {{^}}test:
> +; ALL-LABEL: {{^}}test:
> +; ALL: s_mov_b32 s92, SCRATCH_RSRC_DWORD0
> +; ALL: s_mov_b32 s91, s3
> +
> ; Make sure we are handling hazards correctly.
> -; CHECK: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:12
> -; CHECK-NEXT: s_waitcnt vmcnt(0)
> -; CHECK-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]
> -; CHECK-NEXT: s_nop 4
> -; CHECK-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0
> -; CHECK: s_endpgm
> +; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:12
> +; SGPR-NEXT: s_waitcnt vmcnt(0)
> +; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]
> +; SGPR-NEXT: s_nop 4
> +; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0
> +
> +
> +; Make sure scratch wave offset register is correctly incremented and
> +; then restored.
> +; SMEM: s_mov_b32 m0, s91{{$}}
> +; SMEM: s_buffer_store_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Spill
> +; SMEM: s_add_u32 m0, s91, 0x100{{$}}
> +; SMEM: s_buffer_store_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Spill
> +; SMEM: s_add_u32 m0, s91, 0x200{{$}}
> +; SMEM: s_buffer_store_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Spill
> +; SMEM: s_add_u32 m0, s91, 0x300{{$}}
> +; SMEM: s_buffer_store_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Spill
> +
> +
> +; SMEM: s_mov_b32 m0, s91{{$}}
> +; SMEM: s_buffer_load_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Reload
> +; SMEM: s_add_u32 m0, s91, 0x100{{$}}
> +; SMEM: s_waitcnt lgkmcnt(0)
> +; SMEM: s_buffer_load_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Reload
> +; SMEM: s_add_u32 m0, s91, 0x200{{$}}
> +; SMEM: s_waitcnt lgkmcnt(0)
> +; SMEM: s_buffer_load_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Reload
> +; SMEM: s_add_u32 m0, s91, 0x300{{$}}
> +; SMEM: s_waitcnt lgkmcnt(0)
> +; SMEM: s_buffer_load_dword s{{[0-9]+}}, s[92:95], m0 ; 16-byte Folded Reload
> +
> +; ALL: s_endpgm
> define void @test(i32 addrspace(1)* %out, i32 %in) {
> call void asm sideeffect "", "~{SGPR0_SGPR1_SGPR2_SGPR3_SGPR4_SGPR5_SGPR6_SGPR7}" ()
> call void asm sideeffect "", "~{SGPR8_SGPR9_SGPR10_SGPR11_SGPR12_SGPR13_SGPR14_SGPR15}" ()
>
> Modified: llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll?rev=286766&r1=286765&r2=286766&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll (original)
> +++ llvm/trunk/test/CodeGen/AMDGPU/spill-m0.ll Sun Nov 13 12:20:54 2016
> @@ -1,12 +1,13 @@
> ; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
> -; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mcpu=tonga -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
> +; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -amdgpu-spill-sgpr-to-smem=0 -march=amdgcn -mcpu=tonga -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
> ; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s
> -; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mattr=+vgpr-spilling -mcpu=tonga -verify-machineinstrs < %s | FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s
> +; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-spill-sgpr-to-smem=0 -march=amdgcn -mcpu=tonga -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s
> +; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-spill-sgpr-to-smem=1 -march=amdgcn -mcpu=tonga -mattr=+vgpr-spilling -verify-machineinstrs < %s | FileCheck -check-prefix=TOSMEM -check-prefix=GCN %s
>
> ; XXX - Why does it like to use vcc?
>
> ; GCN-LABEL: {{^}}spill_m0:
> -; TOSMEM: s_mov_b32 s88, SCRATCH_RSRC_DWORD0
> +; TOSMEM: s_mov_b32 s84, SCRATCH_RSRC_DWORD0
>
> ; GCN: s_cmp_lg_u32
>
> @@ -16,6 +17,13 @@
> ; TOVMEM: v_mov_b32_e32 [[SPILL_VREG:v[0-9]+]], m0
> ; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} ; 4-byte Folded Spill
> ; TOVMEM: s_waitcnt vmcnt(0)
> +
> +; TOSMEM: s_mov_b32 vcc_hi, m0
> +; TOSMEM: s_mov_b32 m0, s3{{$}}
> +; TOSMEM-NOT: vcc_hi
> +; TOSMEM: s_buffer_store_dword vcc_hi, s[84:87], m0 ; 4-byte Folded Spill
> +; TOSMEM: s_waitcnt lgkmcnt(0)
> +
> ; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]
>
> ; GCN: [[ENDIF]]:
> @@ -27,6 +35,11 @@
> ; TOVMEM: v_readfirstlane_b32 vcc_hi, [[RELOAD_VREG]]
> ; TOVMEM: s_mov_b32 m0, vcc_hi
>
> +; TOSMEM: s_mov_b32 m0, s3{{$}}
> +; TOSMEM: s_buffer_load_dword vcc_hi, s[84:87], m0 ; 4-byte Folded Reload
> +; TOSMEM-NOT: vcc_hi
> +; TOSMEM: s_mov_b32 m0, vcc_hi
> +
> ; GCN: s_add_i32 m0, m0, 1
> define void @spill_m0(i32 %cond, i32 addrspace(1)* %out) #0 {
> entry:
> @@ -48,6 +61,8 @@ endif:
>
> ; GCN-LABEL: {{^}}spill_m0_lds:
> ; GCN-NOT: v_readlane_b32 m0
> +; GCN-NOT: s_buffer_store_dword m0
> +; GCN-NOT: s_buffer_load_dword m0
> define amdgpu_ps void @spill_m0_lds(<16 x i8> addrspace(2)* inreg, <16 x i8> addrspace(2)* inreg, <32 x i8> addrspace(2)* inreg, i32 inreg) #0 {
> main_body:
> %4 = call float @llvm.SI.fs.constant(i32 0, i32 0, i32 %3)
>
> Added: llvm/trunk/test/CodeGen/MIR/AMDGPU/scalar-store-cache-flush.mir
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/MIR/AMDGPU/scalar-store-cache-flush.mir?rev=286766&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/MIR/AMDGPU/scalar-store-cache-flush.mir (added)
> +++ llvm/trunk/test/CodeGen/MIR/AMDGPU/scalar-store-cache-flush.mir Sun Nov 13 12:20:54 2016
> @@ -0,0 +1,173 @@
> +# RUN: llc -march=amdgcn -run-pass si-insert-waits %s -o - | FileCheck %s
> +
> +--- |
> + define void @basic_insert_dcache_wb() {
> + ret void
> + }
> +
> + define void @explicit_flush_after() {
> + ret void
> + }
> +
> + define void @explicit_flush_before() {
> + ret void
> + }
> +
> + define void @no_scalar_store() {
> + ret void
> + }
> +
> + define void @multi_block_store() {
> + bb0:
> + br i1 undef, label %bb1, label %bb2
> +
> + bb1:
> + ret void
> +
> + bb2:
> + ret void
> + }
> +
> + define void @one_block_store() {
> + bb0:
> + br i1 undef, label %bb1, label %bb2
> +
> + bb1:
> + ret void
> +
> + bb2:
> + ret void
> + }
> +
> + define amdgpu_ps float @si_return() {
> + ret float undef
> + }
> +
> +...
> +---
> +# CHECK-LABEL: name: basic_insert_dcache_wb
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +name: basic_insert_dcache_wb
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_STORE_DWORD_SGPR undef %sgpr2, undef %sgpr0_sgpr1, undef %m0, 0
> + S_ENDPGM
> +...
> +---
> +# Already has an explicitly requested flush after the last store.
> +# CHECK-LABEL: name: explicit_flush_after
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +name: explicit_flush_after
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_STORE_DWORD_SGPR undef %sgpr2, undef %sgpr0_sgpr1, undef %m0, 0
> + S_DCACHE_WB
> + S_ENDPGM
> +...
> +---
> +# Already has an explicitly requested flush before the last store.
> +# CHECK-LABEL: name: explicit_flush_before
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +name: explicit_flush_before
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_DCACHE_WB
> + S_STORE_DWORD_SGPR undef %sgpr2, undef %sgpr0_sgpr1, undef %m0, 0
> + S_ENDPGM
> +...
> +---
> +# CHECK-LABEL: no_scalar_store
> +# CHECK: bb.0
> +# CHECK-NEXT: S_ENDPGM
> +name: no_scalar_store
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_ENDPGM
> +...
> +
> +# CHECK-LABEL: name: multi_block_store
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +# CHECK: bb.1:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +name: multi_block_store
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_STORE_DWORD_SGPR undef %sgpr2, undef %sgpr0_sgpr1, undef %m0, 0
> + S_ENDPGM
> +
> + bb.1:
> + S_STORE_DWORD_SGPR undef %sgpr4, undef %sgpr6_sgpr7, undef %m0, 0
> + S_ENDPGM
> +...
> +...
> +
> +# This one should be able to omit the flush in the storeless block but
> +# this isn't handled now.
> +
> +# CHECK-LABEL: name: one_block_store
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +# CHECK: bb.1:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: S_ENDPGM
> +
> +name: one_block_store
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_ENDPGM
> +
> + bb.1:
> + S_STORE_DWORD_SGPR undef %sgpr4, undef %sgpr6_sgpr7, undef %m0, 0
> + S_ENDPGM
> +...
> +---
> +# CHECK-LABEL: name: si_return
> +# CHECK: bb.0:
> +# CHECK-NEXT: S_STORE_DWORD
> +# CHECK-NEXT: S_WAITCNT
> +# CHECK-NEXT: S_DCACHE_WB
> +# CHECK-NEXT: SI_RETURN
> +
> +name: si_return
> +tracksRegLiveness: false
> +
> +body: |
> + bb.0:
> + S_STORE_DWORD_SGPR undef %sgpr2, undef %sgpr0_sgpr1, undef %m0, 0
> + SI_RETURN undef %vgpr0
> +...
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161117/6a0aa1a8/attachment-0001.html>
More information about the llvm-commits
mailing list