[llvm] [AMDGPU] SIPeepholeSDWA: Handle V_CNDMASK_B32_e64 (PR #137930)

Thu May 1 04:08:14 PDT 2025

================
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=amdgcn -mcpu=gfx900 -run-pass=si-peephole-sdwa -o - | FileCheck -check-prefix=gfx9 %s
+
+# Test conversion of V_CNDMASK_B32 to VOPC for enabling further conversion to SDWA.
+# For this, the definition of the src2 carry-in operand must be changed to write
+# to VCC.
+
+---
+name:            v_vselect_v2bf16
+tracksRegLiveness: true
+body:             |
+
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; gfx9-LABEL: name: v_vselect_v2bf16
+    ; gfx9: liveins: $vgpr0, $vgpr1
+    ; gfx9-NEXT: {{  $}}
+    ; gfx9-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; gfx9-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
+    ; gfx9-NEXT: $vcc = V_CMP_EQ_U32_e64 killed [[COPY1]], 1, implicit $exec
+    ; gfx9-NEXT: [[V_LSHRREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHRREV_B32_e64 16, [[COPY]], implicit $exec
+    ; gfx9-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+    ; gfx9-NEXT: [[V_CNDMASK_B32_sdwa:%[0-9]+]]:vgpr_32 = V_CNDMASK_B32_sdwa 0, [[V_MOV_B32_e32_]], 0, [[COPY]], 0, 6, 0, 6, 5, implicit $vcc, implicit $exec
+    ; gfx9-NEXT: $vgpr0 = COPY [[V_CNDMASK_B32_sdwa]]
+    ; gfx9-NEXT: SI_RETURN implicit $vgpr0
+    %1:vgpr_32 = COPY $vgpr0
+    %2:vgpr_32 = COPY $vgpr1
+    %3:sreg_64_xexec = V_CMP_EQ_U32_e64 killed %2, 1, implicit $exec
+    %4:vgpr_32 = V_LSHRREV_B32_e64 16, %1, implicit $exec
+    %5:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, %4, killed %3, implicit $exec
+    $vgpr0 = COPY %5
+    SI_RETURN implicit $vgpr0
+
+...
----------------
arsenm wrote:

Should have the one use and multi use tests. Also negative tests where source modifiers are used, and there's a live vcc use. Also the undef condition where getVRegDef would fail 

https://github.com/llvm/llvm-project/pull/137930