[PATCH] D131959: [AMDGPU] Fix SDST operand of V_DIV_SCALE to always be VCC
Pierre van Houtryve via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 30 07:16:32 PDT 2022
Pierre-vh added a comment.
So for the weird codegen in `frem.ll`, the issue is from the `addLiveIn` call I think.
In fact my changes to the DAGISel aren't needed I believe, but in practice there are segfaults in the following case: When a `DIV_FMAS` uses the result of `DIV_SCALE`.
In such cases, the following happens:
t92: f32,i1 = DIV_SCALE nofpexcept t59, t57, t59
ISEL: Starting selection on root node: t102: f32 = DIV_FMAS nofpexcept t101, t97, t100, t92:1
ISEL: Starting pattern match
Initial Opcode index to 456907
TypeSwitch[f32] from 456914 to 456917
Skipped scope entry (due to false predicate) at index 456919, continuing at 456953
Creating new node: t127: ch,glue = CopyToReg t0, Register:i1 $vcc_lo, t92:1
Morphed node: t102: f32 = V_DIV_FMAS_F32_e64 nofpexcept TargetConstant:i32<0>, t101, TargetConstant:i32<0>, t97, TargetConstant:i32<0>, t100, TargetConstant:i1<0>, TargetConstant:i32<0>, t127:1
ISEL: Match complete!
Then the following MIR is emitted.
%23:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, %16:vgpr_32, 0, %17:vgpr_32, 0, %16:vgpr_32, 0, 0, implicit-def $vcc, implicit $mode, implicit $exec
%24:sreg_32 = COPY $vcc
$vcc_lo = COPY %24:sreg_32
%29:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, killed %28:vgpr_32, 0, %22:vgpr_32, 0, %27:vgpr_32, 0, 0, implicit $mode, implicit $vcc, implicit $exec
And `SIInstrInfo.cpp` asserts when lowering the copy:
real copy: renamable $vcc_lo = COPY $vcc
llc: ../lib/Target/AMDGPU/SIInstrInfo.cpp:762: virtual void llvm::SIInstrInfo::copyPhysReg(llvm::MachineBasicBlock &, MachineBasicBlock::iterator, const llvm::DebugLoc &, llvm::MCRegister, llvm::MCRegister, bool) const: Assertion `AMDGPU::VGPR_32RegClass.contains(SrcReg)' failed.
I feel like that copy shouldn't be there in the first place, but it seems like the DAGISel is adding it automatically.
The pattern used to select the FMAS is:
class DivFmasPat<ValueType vt, Instruction inst, Register CondReg> : GCNPat<
(AMDGPUdiv_fmas (vt (VOP3Mods vt:$src0, i32:$src0_modifiers)),
(vt (VOP3Mods vt:$src1, i32:$src1_modifiers)),
(vt (VOP3Mods vt:$src2, i32:$src2_modifiers)),
(i1 CondReg)),
(inst $src0_modifiers, $src0, $src1_modifiers, $src1, $src2_modifiers, $src2)
>;
let WaveSizePredicate = isWave64 in {
def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC>;
def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC>;
}
let WaveSizePredicate = isWave32 in {
def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC_LO>;
def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC_LO>;
}
Does anyone know where the issue lies? I'm a bit stuck there at the moment.
Is it the pattern that shouldn't be using VCC_LO like that? Or a missing case in `SIInstrInfo.cpp` ?
Maybe the register allocator should be more careful and transform the COPY into an EXTRACT_SUBREG when allocating `%24` to `vcc`?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131959/new/
https://reviews.llvm.org/D131959
More information about the llvm-commits
mailing list