[PATCH] D131959: [AMDGPU] Fix SDST operand of V_DIV_SCALE to always be VCC

Tue Aug 30 07:16:32 PDT 2022

Pierre-vh added a comment.

So for the weird codegen in `frem.ll`, the issue is from the `addLiveIn` call I think.
In fact my changes to the DAGISel aren't needed I believe, but in practice there are segfaults in the following case: When a `DIV_FMAS` uses the result of `DIV_SCALE`.
In such cases, the following happens:

  t92: f32,i1 = DIV_SCALE nofpexcept t59, t57, t59

  ISEL: Starting selection on root node: t102: f32 = DIV_FMAS nofpexcept t101, t97, t100, t92:1
  ISEL: Starting pattern match
    Initial Opcode index to 456907
    TypeSwitch[f32] from 456914 to 456917
    Skipped scope entry (due to false predicate) at index 456919, continuing at 456953
  Creating new node: t127: ch,glue = CopyToReg t0, Register:i1 $vcc_lo, t92:1
    Morphed node: t102: f32 = V_DIV_FMAS_F32_e64 nofpexcept TargetConstant:i32<0>, t101, TargetConstant:i32<0>, t97, TargetConstant:i32<0>, t100, TargetConstant:i1<0>, TargetConstant:i32<0>, t127:1
  ISEL: Match complete!

Then the following MIR is emitted.

  %23:vgpr_32 = nofpexcept V_DIV_SCALE_F32_e64 0, %16:vgpr_32, 0, %17:vgpr_32, 0, %16:vgpr_32, 0, 0, implicit-def $vcc, implicit $mode, implicit $exec
  %24:sreg_32 = COPY $vcc
  $vcc_lo = COPY %24:sreg_32
  %29:vgpr_32 = nofpexcept V_DIV_FMAS_F32_e64 0, killed %28:vgpr_32, 0, %22:vgpr_32, 0, %27:vgpr_32, 0, 0, implicit $mode, implicit $vcc, implicit $exec

And `SIInstrInfo.cpp` asserts when lowering the copy:

  real copy:   renamable $vcc_lo = COPY $vcc
  llc: ../lib/Target/AMDGPU/SIInstrInfo.cpp:762: virtual void llvm::SIInstrInfo::copyPhysReg(llvm::MachineBasicBlock &, MachineBasicBlock::iterator, const llvm::DebugLoc &, llvm::MCRegister, llvm::MCRegister, bool) const: Assertion `AMDGPU::VGPR_32RegClass.contains(SrcReg)' failed.

I feel like that copy shouldn't be there in the first place, but it seems like the DAGISel is adding it automatically.
The pattern used to select the FMAS is:

  class DivFmasPat<ValueType vt, Instruction inst, Register CondReg> : GCNPat<
    (AMDGPUdiv_fmas (vt (VOP3Mods vt:$src0, i32:$src0_modifiers)),
                    (vt (VOP3Mods vt:$src1, i32:$src1_modifiers)),
                    (vt (VOP3Mods vt:$src2, i32:$src2_modifiers)),
                    (i1 CondReg)),
    (inst $src0_modifiers, $src0, $src1_modifiers, $src1, $src2_modifiers, $src2)
  >;

  let WaveSizePredicate = isWave64 in {
  def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC>;
  def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC>;
  }

  let WaveSizePredicate = isWave32 in {
  def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC_LO>;
  def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC_LO>;
  }

Does anyone know where the issue lies? I'm a bit stuck there at the moment.
Is it the pattern that shouldn't be using VCC_LO like that? Or a missing case in `SIInstrInfo.cpp` ?
Maybe the register allocator should be more careful and transform the COPY into an EXTRACT_SUBREG when allocating `%24` to `vcc`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131959/new/

https://reviews.llvm.org/D131959