[llvm] [AMDGPU] A SCHED_BARRIER in a bundle should not prevent other SCHED_BARRIERs to be considered (PR #152627)
Yoonseo Choi via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 8 09:52:43 PDT 2025
================
@@ -2508,7 +2508,9 @@ bool SchedGroup::canAddSU(SUnit &SU) const {
++E;
// Return true if all of the bundled MIs can be added to this group.
- return std::all_of(B, E, [this](MachineInstr &MI) { return canAddMI(MI); });
+ return std::all_of(B, E, [this](MachineInstr &MI) {
+ return (MI.isMetaInstruction()) || canAddMI(MI);
----------------
yoonseoch wrote:
Making canAddMI() return true for a metaInstruction in effect came up with more intrusive changes to existing lit tests. I got about 14 lit-test failures from CodeGen/AMDGPU, mostly on tests of iglp_opts and sched-barrier, and sched-group-barriers.
For example, llvm.amdgcn.sched.group.barrier.ll,
```
// original sequence in the test
385 define amdgpu_kernel void @test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE(ptr addrspace(1) noalias %in,
386 ; GCN-LABEL: test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE:
387 ; GCN: ; %bb.0:
388 ; GCN-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x24
389 ; GCN-NEXT: v_and_b32_e32 v0, 0x3ff, v0
390 ; GCN-NEXT: v_lshlrev_b32_e32 v16, 7, v0
----------------------------------------------------------------------------------------------------------------------
391 ; GCN-NEXT: s_waitcnt lgkmcnt(0)
392 ; GCN-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:32
393 ; GCN-NEXT: ; kill: killed $sgpr0_sgpr1
394 ; GCN-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
395 ; GCN-NEXT: ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
396 ; GCN-NEXT: ; sched_group_barrier mask(0x00000002) size(2) SyncID(0)
397 ; GCN-NEXT: ; sched_group_barrier mask(0x00000040) size(1) SyncID(0)
398 ; GCN-NEXT: ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
399 ; GCN-NEXT: s_waitcnt vmcnt(1)
400 ; GCN-NEXT: v_mul_lo_u32 v13, v13, v13
401 ; GCN-NEXT: v_mul_lo_u32 v12, v12, v12
402 ; GCN-NEXT: v_mul_lo_u32 v15, v15, v15
403 ; GCN-NEXT: v_mul_lo_u32 v14, v14, v14
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
404 ; GCN-NEXT: s_waitcnt vmcnt(0)
405 ; GCN-NEXT: v_mul_lo_u32 v3, v3, v3
406 ; GCN-NEXT: v_mul_lo_u32 v2, v2, v2
407 ; GCN-NEXT: global_store_dwordx4 v16, v[12:15], s[2:3] offset:32
408 ; GCN-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:48
409 ; GCN-NEXT: v_mul_lo_u32 v1, v1, v1
// sequence after making canAddMI return true on an metaInstruction
385 define amdgpu_kernel void @test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE(ptr addrspace(1) noalias %in
386 ; GCN-LABEL: test_sched_group_barrier_pipeline_alternating_READ_VALU_WRITE:
387 ; GCN: ; %bb.0:
388 ; GCN-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x24
389 ; GCN-NEXT: v_and_b32_e32 v0, 0x3ff, v0
390 ; GCN-NEXT: v_lshlrev_b32_e32 v16, 7, v0
391 ; GCN-NEXT: ; kill: killed $sgpr0_sgpr1
392 ; GCN-NEXT: s_waitcnt lgkmcnt(0)
393 ; GCN-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:32
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
394 ; GCN-NEXT: ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
395 ; GCN-NEXT: ; sched_group_barrier mask(0x00000002) size(2) SyncID(0)
396 ; GCN-NEXT: s_waitcnt vmcnt(0)
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
397 ; GCN-NEXT: v_mul_lo_u32 v13, v13, v13
398 ; GCN-NEXT: v_mul_lo_u32 v12, v12, v12
399 ; GCN-NEXT: v_mul_lo_u32 v15, v15, v15
400 ; GCN-NEXT: v_mul_lo_u32 v14, v14, v14
401 ; GCN-NEXT: global_store_dwordx4 v16, v[12:15], s[2:3] offset:32
402 ; GCN-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
403 ; GCN-NEXT: ; sched_group_barrier mask(0x00000040) size(1) SyncID(0)
404 ; GCN-NEXT: ; sched_group_barrier mask(0x00000020) size(1) SyncID(0)
405 ; GCN-NEXT: s_waitcnt vmcnt(0)
406 ; GCN-NEXT: v_mul_lo_u32 v3, v3, v3
407 ; GCN-NEXT: v_mul_lo_u32 v2, v2, v2
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
408 ; GCN-NEXT: v_mul_lo_u32 v1, v1, v1
```
I think using the original logic of canAddMI is safer given that it is used for a non-bundle single MI.
I can still simplify the logic to move the checking into the scanning of the range and avoid doing another scan of all_of.
https://github.com/llvm/llvm-project/pull/152627
More information about the llvm-commits
mailing list