[llvm] [AMDGPU][SILoadStoreOptimizer] Try to find common base for L/Ss with 0 offset (PR #71126)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 2 17:13:01 PDT 2023
================
@@ -212,3 +212,126 @@ body: |
%13:vreg_64 = REG_SEQUENCE %9, %subreg.sub0, %11, %subreg.sub1
GLOBAL_STORE_DWORD %13, %0.sub1, 0, 0, implicit $exec
...
+---
+
+# GFX9-LABEL: name: offset0_rebased
+# GFX9: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE %{{[0-9]+}}, %subreg.sub0, %{{[0-9]+}}, %subreg.sub1
+# GFX9: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], -3072, 0
+# GFX9: [[GLOBAL_LOAD_DWORDX4_1:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], 0, 0
+# GFX9: [[GLOBAL_LOAD_DWORDX4_2:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], -4096, 0
+
+name: offset0_rebased
+body: |
+ bb.0.entry:
+ liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+ %3:sgpr_32 = COPY $sgpr6
+ %2:sgpr_64(p4) = COPY $sgpr4_sgpr5
+ %0:vgpr_32(s32) = COPY $vgpr0
+ %5:sreg_64_xexec = S_LOAD_DWORDX2_IMM %2:sgpr_64(p4), 0, 0
+ %6:sreg_64_xexec = S_LOAD_DWORDX2_IMM %2:sgpr_64(p4), 8, 0
+ %8:vgpr_32 = V_LSHL_ADD_U32_e64 %3:sgpr_32, 8, %0:vgpr_32(s32), implicit $exec
+ %90:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ %89:vreg_64_align2 = REG_SEQUENCE %8:vgpr_32, %subreg.sub0, killed %90:vgpr_32, %subreg.sub1
+ %13:vreg_64_align2 = V_LSHLREV_B64_e64 4, %89:vreg_64_align2, implicit $exec
+ %91:vgpr_32, %93:sreg_64_xexec = V_ADD_CO_U32_e64 %5.sub0:sreg_64_xexec, %13.sub0:vreg_64_align2, 0, implicit $exec
+ %99:vgpr_32 = COPY %5.sub1:sreg_64_xexec
+ %92:vgpr_32, dead %94:sreg_64_xexec = V_ADDC_U32_e64 %99:vgpr_32, %13.sub1:vreg_64_align2, killed %93:sreg_64_xexec, 0, implicit $exec
+ %22:vgpr_32 = V_LSHLREV_B32_e64 2, %0:vgpr_32(s32), implicit $exec
+ %27:sreg_32 = S_MOV_B32 1024
+ %28:vgpr_32, %29:sreg_64_xexec = V_ADD_CO_U32_e64 %27:sreg_32, %91:vgpr_32, 0, implicit $exec
+ %32:vgpr_32, %33:sreg_64 = V_ADDC_U32_e64 0, %92:vgpr_32, killed %29:sreg_64_xexec, 0, implicit $exec
+ %35:vreg_64_align2 = REG_SEQUENCE killed %28:vgpr_32, %subreg.sub0, killed %32:vgpr_32, %subreg.sub1
+ %36:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 killed %35:vreg_64_align2, 0, 0, implicit $exec
+ %50:sreg_32 = S_MOV_B32 4096
+ %51:vgpr_32, %52:sreg_64_xexec = V_ADD_CO_U32_e64 %50:sreg_32, %91:vgpr_32, 0, implicit $exec
+ %53:vgpr_32, %54:sreg_64 = V_ADDC_U32_e64 0, %92:vgpr_32, killed %52:sreg_64_xexec, 0, implicit $exec
+ %56:vreg_64_align2 = REG_SEQUENCE killed %51:vgpr_32, %subreg.sub0, killed %53:vgpr_32, %subreg.sub1
+ %57:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 killed %56:vreg_64_align2, 0, 0, implicit $exec
+ %15:vreg_64_align2 = REG_SEQUENCE %91:vgpr_32, %subreg.sub0, %92:vgpr_32, %subreg.sub1
+ %16:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 %15:vreg_64_align2, 0, 0, implicit $exec
+...
+---
+
+# GFX9-LABEL: name: offset0_base
+# GFX9: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE %{{[0-9]+}}, %subreg.sub0, %{{[0-9]+}}, %subreg.sub1
+# GFX9: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], 2048, 0
+# GFX9: [[GLOBAL_LOAD_DWORDX4_1:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], 3072, 0
+# GFX9: [[GLOBAL_LOAD_DWORDX4_2:%[0-9]+]]:vreg_128_align2 = GLOBAL_LOAD_DWORDX4 [[REG_SEQUENCE2]], 0, 0
+
+name: offset0_base
+body: |
+ bb.0.entry:
+ liveins: $vgpr0, $sgpr4_sgpr5, $sgpr6
+ %3:sgpr_32 = COPY $sgpr6
+ %2:sgpr_64(p4) = COPY $sgpr4_sgpr5
+ %0:vgpr_32(s32) = COPY $vgpr0
----------------
arsenm wrote:
Remove the LLTs, and can you run this through -run-pass=none to clean up the register numbers
https://github.com/llvm/llvm-project/pull/71126
More information about the llvm-commits
mailing list