[PATCH] D90847: [AMDGPU] Fix v3f16 interaction with image store workaround

Sebastian Neubauer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 10 09:04:46 PST 2020


sebastian-ne added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll:540
+; GFX8-PACKED-NEXT:    s_mov_b32 s7, s9
+; GFX8-PACKED-NEXT:    image_load v[0:1], v0, s[0:7] dmask:0x7 unorm d16
+; GFX8-PACKED-NEXT:    s_mov_b32 s0, 0xffff
----------------
rdomingu wrote:
> AFAIK, gfx810 can't pack v3f16 when d16 is set because of a hardware bug. So this instruction should use 3 vgpr (not 2 vgpr).
As far as I know, only image stores are affected and this is an image_load, so the workaround does not need to be applied.
If loads are also affected, we need to change the condition here: https://reviews.llvm.org/D81172#C2200035NL3550


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll:562
+; GFX9-NEXT:    s_mov_b32 s7, s9
+; GFX9-NEXT:    image_load v[0:1], v0, s[0:7] dmask:0x7 unorm d16
+; GFX9-NEXT:    v_mov_b32_e32 v2, 0xffff
----------------
rdomingu wrote:
> gfx9 has the same hardware bug. So this instruction should use 3 vgpr (not 2 vgpr).
Not a store, so this should be ok.


================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.store.2d.d16.ll:125
+; GFX81-NEXT:    v_mov_b32_e32 v5, v4
+; GFX81-NEXT:    image_store v[2:5], v[0:1], s[0:7] dmask:0x7 unorm d16
 ; GFX81-NEXT:    s_endpgm
----------------
rdomingu wrote:
> This instruction should use 3 vgpr (not 4 vgpr).
Thanks, fixed


================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.d16.dim.ll:79
+; PACKED: image_load v[0:1], v[0:2], s[0:7] dmask:0x7 unorm d16
+; GFX81: image_load v[0:1], v[0:2], s[0:7] dmask:0x7 unorm d16
+; GFX10: image_load v[0:1], v[0:2], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_3D unorm d16{{$}}
----------------
rdomingu wrote:
> This instruction should use 3 vgpr (not 2 vgpr).
Not a store, so this should be ok.


================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.d16.dim.ll:115
+; PACKED: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 unorm d16
+; GFX81: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 unorm d16
+; GFX10: image_store v[2:3], v[0:1], s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_2D unorm d16{{$}}
----------------
rdomingu wrote:
> This is instruction should use 3 vgpr (not 2 vgpr).
You’re right, I fixed it.
PACKED is gfx9, so 2 vgprs is fine there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90847/new/

https://reviews.llvm.org/D90847



More information about the llvm-commits mailing list