[PATCH] D74937: [AMDGPU] Implement copyPhysReg for 16 bit subregs

Thu Feb 20 17:20:05 PST 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:701
+
+    BuildMI(MBB, MI, DL, get(AMDGPU::V_PACK_B32_F16), DestReg)
+      .addImm((!SrcLow && DstLow) ? SISrcMods::OP_SEL_0 : 0) // src0_mods
----------------
rampitec wrote:
> arsenm wrote:
> > V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have to do essentially what D74740 does
> I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32 (if available) or v_bfi_b32... In fact I do not see a good instruction to do it if v_pack_b32 does not work.
Yes, there are definitely missing instructions to handle this well. I think you can use V_ALIGNBIT_B32 without an extra register in a subset of cases

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74937/new/

https://reviews.llvm.org/D74937