[PATCH] D74937: [AMDGPU] Implement copyPhysReg for 16 bit subregs

Thu Feb 20 17:11:04 PST 2020

rampitec marked an inline comment as done.
rampitec added a comment.

In D74937#1885659 <https://reviews.llvm.org/D74937#1885659>, @arsenm wrote:

> I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:701
+
+    BuildMI(MBB, MI, DL, get(AMDGPU::V_PACK_B32_F16), DestReg)
+      .addImm((!SrcLow && DstLow) ? SISrcMods::OP_SEL_0 : 0) // src0_mods
----------------
arsenm wrote:
> V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have to do essentially what D74740 does
I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32 (if available) or v_bfi_b32... In fact I do not see a good instruction to do it if v_pack_b32 does not work.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74937/new/

https://reviews.llvm.org/D74937