[PATCH] D74937: [AMDGPU] Implement copyPhysReg for 16 bit subregs

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 20 17:23:22 PST 2020


rampitec marked an inline comment as done.
rampitec added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:701
+
+    BuildMI(MBB, MI, DL, get(AMDGPU::V_PACK_B32_F16), DestReg)
+      .addImm((!SrcLow && DstLow) ? SISrcMods::OP_SEL_0 : 0) // src0_mods
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have to do essentially what D74740 does
> > I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32 (if available) or v_bfi_b32... In fact I do not see a good instruction to do it if v_pack_b32 does not work.
> Yes, there are definitely missing instructions to handle this well. I think you can use V_ALIGNBIT_B32 without an extra register in a subset of cases
It does not work for the most needed thing: copy low to low. Well, in fact it does not help at all.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74937/new/

https://reviews.llvm.org/D74937





More information about the llvm-commits mailing list