[PATCH] D62614: Fix for the OCL/LC to failure on some OCLPerf tests
Alexander via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 3 08:01:12 PDT 2019
alex-t marked 2 inline comments as done.
alex-t added inline comments.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:4174-4180
+ unsigned Reg = Op.getReg();
+ MachineInstr *DefMI = MRI.getUniqueVRegDef(Reg);
+ if (DefMI && DefMI->getParent() != InsertBB &&
+ MDT->dominates(DefMI->getParent(), InsertBB)) {
+ InsertBB = DefMI->getParent();
+ Insert = InsertBB->getFirstTerminator();
+ }
----------------
nhaehnle wrote:
> Can you give an example of why this is necessary? Assuming we have LCSSA phis, I don't understand the reason for this.
The copy may be placed to the block produced by the edge splitting like below:
the copy of %14 in the example below will be placed to the bb.4.Flow that is already out of the loop body.
bb.2.Flow7:
; predecessors: %bb.0, %bb.4
successors: %bb.5(0x80000000); %bb.5(100.00%)
**%7:vgpr_32 **= PHI %108:vgpr_32, %bb.0, **%14:sreg_32_xm0,** %bb.4
SI_END_CF %6:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.5
bb.3.for.body:
; predecessors: %bb.1, %bb.3
successors: %bb.4(0x04000000), %bb.3(0x7c000000); %bb.4(3.12%), %bb.3(96.88%)
%14:sreg_32_xm0 = S_ADD_I32 %10:sreg_32_xm0, killed %86:sreg_32_xm0, implicit-def dead $scc
SI_LOOP %15:sreg_64, %bb.3, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.4
bb.4.Flow:
; predecessors: %bb.3
successors: %bb.2(0x80000000); %bb.2(100.00%)
SI_END_CF %15:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.2
bb.5.for.end:
; predecessors: %bb.2
%16:vgpr_32 = PHI **%7:vgpr_32, %bb.2**
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:5969-5979
+bool SIInstrInfo::shouldSink(const MachineInstr &MI) const {
+ const MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
+ if (MI.isCopy()) {
+ unsigned SrcReg = MI.getOperand(1).getReg();
+ unsigned DstReg = MI.getOperand(0).getReg();
+ if (RI.isSGPRReg(MRI, SrcReg) && !RI.isSGPRReg(MRI, DstReg)) {
+ return false;
----------------
nhaehnle wrote:
> shouldSink should never be used for correctness. See the comment:
> ```
> /// MachineSink determines on its own whether the instruction is safe to sink;
> /// this gives the target a hook to override the default behavior with regards
> /// to which instructions should be sunk.
> ```
> I suspect the right fix is to ensure that COPY instructions inserted by legalizeGenericOperands are marked with an implicit use of EXEC when a VGPR is involved.
Comment says: "this gives the target a hook to override the default behavior with regards
to which instructions should be sunk.". So, I don't understand why it "should never be used for correctness".
Also, adding extra operands to the copies leads Peephole optimizer to crash in assert
Assertion failed: Def->getNumOperands() == 2 && "Invalid number of operands", file llvm\lib\CodeGen\PeepholeOptimizer.cpp, line 1811
The Si Fix VGPR Copies pass adds implicit uses of EXEC later on...
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D62614/new/
https://reviews.llvm.org/D62614
More information about the llvm-commits
mailing list