[llvm] [AMDGPU] Enable unaligned scratch accesses (PR #110219)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 30 08:57:44 PDT 2024
================
@@ -1273,11 +1273,54 @@ define void @store_load_i64_unaligned(ptr addrspace(5) nocapture %arg) {
; GFX9-LABEL: store_load_i64_unaligned:
; GFX9: ; %bb.0: ; %bb
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_mov_b32_e32 v1, 15
-; GFX9-NEXT: v_mov_b32_e32 v2, 0
-; GFX9-NEXT: scratch_store_dwordx2 v0, v[1:2], off
+; GFX9-NEXT: v_mov_b32_e32 v4, 15
+; GFX9-NEXT: v_add_u32_e32 v1, 4, v0
+; GFX9-NEXT: v_add_u32_e32 v2, 2, v0
+; GFX9-NEXT: v_add_u32_e32 v3, 1, v0
+; GFX9-NEXT: scratch_store_byte v0, v4, off
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: scratch_load_dwordx2 v[0:1], v0, off glc
+; GFX9-NEXT: v_mov_b32_e32 v4, 0
+; GFX9-NEXT: v_add_u32_e32 v6, 6, v0
+; GFX9-NEXT: scratch_store_byte v3, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v5, 3, v0
+; GFX9-NEXT: scratch_store_byte v2, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_store_byte v5, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v7, 5, v0
+; GFX9-NEXT: scratch_store_byte v1, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_store_byte v7, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: v_add_u32_e32 v8, 7, v0
+; GFX9-NEXT: scratch_store_byte v6, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_store_byte v8, v4, off
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_load_ubyte v4, v0, off glc
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: ; kill: killed $vgpr6
+; GFX9-NEXT: ; kill: killed $vgpr1
+; GFX9-NEXT: ; kill: killed $vgpr3
+; GFX9-NEXT: ; kill: killed $vgpr5
+; GFX9-NEXT: ; kill: killed $vgpr8
+; GFX9-NEXT: ; kill: killed $vgpr7
+; GFX9-NEXT: ; kill: killed $vgpr2
+; GFX9-NEXT: ; kill: killed $vgpr0
+; GFX9-NEXT: scratch_load_ubyte v4, v3, off glc
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_load_ubyte v4, v2, off glc
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_load_ubyte v4, v5, off glc
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_load_ubyte v4, v1, off glc
+; GFX9-NEXT: s_waitcnt vmcnt(0)
+; GFX9-NEXT: scratch_load_ubyte v4, v7, off glc
----------------
arsenm wrote:
These tests should probably be switched to using an amdhsa triple in a precommit, so we still end up using the aligned accesses. We probably want dedicated aligned and unaligned cases in the future
https://github.com/llvm/llvm-project/pull/110219
More information about the llvm-commits
mailing list