[llvm] [AMDGPU] Fold multiple aligned v_mov_b32 to v_mov_b64 on gfx942 (PR #138843)

Wed May 7 03:52:49 PDT 2025

================
@@ -2127,6 +2128,99 @@ bool SIFoldOperandsImpl::tryFoldOMod(MachineInstr &MI) {
   return true;
 }
 
+// gfx942+ can use V_MOV_B64 for materializing constant immediates.
+// For example:
+// %0:vgpr_32 = V_MOV_B32 0, implicit $exec
+// %1:vreg_64_align2 = REG_SEQUENCE %0, %subreg.sub0, %0, %subreg.sub1
+//  ->
+// %1:vreg_64_align2 = V_MOV_B64_PSEUDO 0, implicit $exec
+bool SIFoldOperandsImpl::tryFoldImmRegSequence(MachineInstr &MI) {
+  assert(MI.isRegSequence());
+  auto Reg = MI.getOperand(0).getReg();
+  const TargetRegisterClass *DefRC = MRI->getRegClass(Reg);
+
+  if (!ST->hasMovB64() || !TRI->isVGPR(*MRI, Reg) ||
+      !MRI->hasOneNonDBGUse(Reg) || !TRI->isProperlyAlignedRC(*DefRC))
----------------
arsenm wrote:

Better to check if the register is compatible with the register class from the instruction definition 

https://github.com/llvm/llvm-project/pull/138843