[llvm] [AMDGPU] SIPeepholeSDWA: Handle V_CNDMASK_B32_e64 (PR #137930)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu May 1 04:08:14 PDT 2025
================
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=amdgcn -mcpu=gfx900 -run-pass=si-peephole-sdwa -o - | FileCheck -check-prefix=gfx9 %s
+
+# Test conversion of V_CNDMASK_B32 to VOPC for enabling further conversion to SDWA.
+# For this, the definition of the src2 carry-in operand must be changed to write
+# to VCC.
+
+---
+name: v_vselect_v2bf16
+tracksRegLiveness: true
+body: |
+
+ bb.0:
+ liveins: $vgpr0, $vgpr1
+
+ ; gfx9-LABEL: name: v_vselect_v2bf16
+ ; gfx9: liveins: $vgpr0, $vgpr1
+ ; gfx9-NEXT: {{ $}}
+ ; gfx9-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; gfx9-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+ ; gfx9-NEXT: $vcc = V_CMP_EQ_U32_e64 killed [[COPY1]], 1, implicit $exec
+ ; gfx9-NEXT: [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 16, [[COPY]], implicit $exec
+ ; gfx9-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ ; gfx9-NEXT: [[V_CNDMASK_B32_sdwa:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_sdwa 0, [[V_MOV_B32_e32_]], 0, [[COPY]], 0, 6, 0, 6, 5, implicit $vcc, implicit $exec
+ ; gfx9-NEXT: $vgpr0 = COPY [[V_CNDMASK_B32_sdwa]]
+ ; gfx9-NEXT: SI_RETURN implicit $vgpr0
+ %1:vgpr_32 = COPY $vgpr0
+ %2:vgpr_32 = COPY $vgpr1
+ %3:sreg_64_xexec = V_CMP_EQ_U32_e64 killed %2, 1, implicit $exec
+ %4:vgpr_32 = V_LSHRREV_B32_e64 16, %1, implicit $exec
+ %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, %4, killed %3, implicit $exec
+ $vgpr0 = COPY %5
+ SI_RETURN implicit $vgpr0
+
+...
----------------
arsenm wrote:
Should have the one use and multi use tests. Also negative tests where source modifiers are used, and there's a live vcc use. Also the undef condition where getVRegDef would fail
https://github.com/llvm/llvm-project/pull/137930
More information about the llvm-commits
mailing list