[llvm] [AMDGPU][True16][CodeGen]Support V2S copy with True16 flow (PR #118037)

Wed Jan 15 02:59:46 PST 2025

================
@@ -1075,10 +1075,25 @@ void SIFixSGPRCopies::lowerVGPR2SGPRCopies(MachineFunction &MF) {
         TRI->getRegClassForOperandReg(*MRI, MI->getOperand(1));
     size_t SrcSize = TRI->getRegSizeInBits(*SrcRC);
     if (SrcSize == 16) {
-      // HACK to handle possible 16bit VGPR source
-      auto MIB = BuildMI(*MBB, MI, MI->getDebugLoc(),
-                         TII->get(AMDGPU::V_READFIRSTLANE_B32), DstReg);
-      MIB.addReg(SrcReg, 0, AMDGPU::NoSubRegister);
+      assert(MF.getSubtarget<GCNSubtarget>().useRealTrue16Insts() &&
+             "We do not expect to see 16-bit copies from VGPR to SGPR unless "
+             "we have 16-bit VGPRs");
+      assert(MRI->getRegClass(DstReg) == &AMDGPU::SGPR_LO16RegClass ||
+             MRI->getRegClass(DstReg) == &AMDGPU::SReg_32RegClass);
+      // There is no V_READFIRSTLANE_B16, so widen the destination scalar
+      // value to 32 bits
+      MRI->setRegClass(DstReg, &AMDGPU::SGPR_32RegClass);
----------------
arsenm wrote:

Could also just query the register class from V_READFIRSTLANE_B32's operand info 

https://github.com/llvm/llvm-project/pull/118037