[llvm] [AMDGPU] SIPeepholeSDWA: Add REG_SEQUENCE support (PR #133087)
Frederik Harwath via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 2 01:51:59 PDT 2025
================
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stop-after=si-peephole-sdwa -o - %s | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -start-before=si-peephole-sdwa -o - %s | FileCheck -check-prefix=ASM %s
+---
+name: sdwa_reg_sequence
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $vgpr0
+
+ ; ASM-LABEL: ; %bb.0:
+ ; ASM-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+ ; ASM-NEXT: v_add_u32_e32 v1, 10, v0
+ ; ASM-NEXT: v_add_u32_e32 v0, 20, v0
+ ; ASM-NEXT: v_add_co_u32_sdwa v0, vcc, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
+ ; ASM-NEXT: v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc
+ ; ASM-NEXT: global_store_dwordx2 v[0:1], v[0:1], off
+ ; ASM-NEXT: s_endpgm
+
+ ; CHECK-LABEL: name: sdwa_reg_sequence
+ ; CHECK: liveins: $vgpr0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], 10, 0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e64_1:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], 20, 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_ADD_U32_e64_]], %subreg.sub0, [[V_MOV_B32_e32_]], %subreg.sub1
+ ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 255
+ ; CHECK-NEXT: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[V_ADD_U32_e64_1]], killed [[S_MOV_B32_]], implicit $exec
+ ; CHECK-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_AND_B32_e64_]], %subreg.sub0, [[V_MOV_B32_e32_]], %subreg.sub1
+ ; CHECK-NEXT: [[V_ADD_CO_U32_sdwa:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_sdwa 0, [[REG_SEQUENCE]].sub0, 0, [[V_ADD_U32_e64_1]], 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[V_ADDC_U32_e64_:%[0-9]+]]:vgpr_32, dead [[V_ADDC_U32_e64_1:%[0-9]+]]:sreg_64_xexec = V_ADDC_U32_e64 0, 0, $vcc, 0, implicit $exec
+ ; CHECK-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[V_ADD_CO_U32_sdwa]], %subreg.sub0, [[V_ADDC_U32_e64_]], %subreg.sub1
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:sreg_64 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_64 = COPY [[DEF]]
+ ; CHECK-NEXT: GLOBAL_STORE_DWORDX2 killed [[COPY1]], killed [[REG_SEQUENCE2]], 0, 0, implicit $exec :: (store (s64), addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: .1.entry:
+ %0:vgpr_32 = COPY $vgpr0
+ %1:vgpr_32 = V_ADD_U32_e64 %0, 10, 0, implicit $exec
+ %2:vgpr_32 = V_ADD_U32_e64 %0, 20, 0, implicit $exec
+ %3:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ %4:vreg_64 = REG_SEQUENCE %1, %subreg.sub0, %3, %subreg.sub1
+ %5:sreg_32 = S_MOV_B32 255
+ %6:vgpr_32 = V_AND_B32_e64 killed %2, killed %5, implicit $exec
+ %7:vreg_64 = REG_SEQUENCE %6, %subreg.sub0, %3, %subreg.sub1
+ %8:vgpr_32, %9:sreg_64_xexec = V_ADD_CO_U32_e64 %4.sub0, %7.sub0, 0, implicit $exec
+ %10:vgpr_32, dead %11:sreg_64_xexec = V_ADDC_U32_e64 0, 0, killed %9, 0, implicit $exec
+ %12:vreg_64 = REG_SEQUENCE %8, %subreg.sub0, %10, %subreg.sub1
+ %13:sreg_64 = IMPLICIT_DEF
+ %14:vreg_64 = COPY %13
+ GLOBAL_STORE_DWORDX2 killed %14, killed %12, 0, 0, implicit $exec :: (store (s64), addrspace 1)
+ S_ENDPGM 0
----------------
frederik-h wrote:
> Add some tests [...] where you need to compose subregisters?
Right now the code form this PR does not support "chains of REG_SEQUENCEs", but it should be possible to support this if you think this is necessary.
https://github.com/llvm/llvm-project/pull/133087
More information about the llvm-commits
mailing list