[llvm] [AMDGPU] Emit b32 movs if (a)v_mov_b64_pseudo dest vgprs are misaligned (PR #160547)
Janek van Oirschot via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 24 10:53:43 PDT 2025
JanekvO wrote:
> > machine-cp would then allow the misaligned vgpr pair to be copy-propagated a V_MOV_B64_PSEUDO which required align2.
>
> That doesn't sound right - if V_MOV_B64_PSEUDO uses aligned register classes then machine-cp should not do this, because it should be checking register class constraints?
The machine-cp of interest happens after RA:
```
# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5
0B bb.0.entry:
liveins: $sgpr4_sgpr5
32B renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
48B renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
80B renamable $vgpr0_vgpr1 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
96B renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
128B renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1
144B FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
160B S_ENDPGM 0
# End machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_.
# *** IR Dump After Machine Copy Propagation Pass (machine-cp) ***:
# Machine code for function _ZN6thrust23THRUST_200805_400100_NS11hip_rocprim14__parallel_for6kernelILj256ENS1_10for_each_fINS0_10device_ptrINS0_4pairIiN12_GLOBAL__N_15EntryEEEEENS0_6detail16wrapped_functionINSB_23allocator_traits_detail24construct1_via_allocatorINS0_16device_allocatorIS9_EEEEvEEEEmLj1EEEvT0_T1_SL_: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr4_sgpr5
bb.0.entry:
liveins: $sgpr4_sgpr5
renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
renamable $vgpr4 = AV_MOV_B32_IMM_PSEUDO 0, implicit $exec
renamable $vgpr5_vgpr6 = AV_MOV_B64_IMM_PSEUDO 0, implicit $exec
renamable $vgpr2_vgpr3 = COPY killed renamable $sgpr0_sgpr1
FLAT_STORE_DWORDX3 killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5_vgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s96) into %ir..load, align 4)
S_ENDPGM 0
```
Where `renamable $vgpr5_vgpr6 = COPY killed renamable $vgpr0_vgpr1` gets machine-cp'ed into a misaligned AV_MOV_B64_PSEUDO. This COPY originates from the si-load-store-opt emitted ` %13:vreg_96_align2 = REG_SEQUENCE killed %9:vgpr_32, %subreg.sub0, killed %11:vreg_64_align2, %subreg.sub1_sub2` where the vreg96 and vreg64 alignments already don't make sense.
https://github.com/llvm/llvm-project/pull/160547
More information about the llvm-commits
mailing list