[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)
Matt Arsenault via cfe-commits
cfe-commits at lists.llvm.org
Wed May 8 09:20:46 PDT 2024
================
@@ -1284,7 +1284,29 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
| ``// 5 MFMA``
| ``__builtin_amdgcn_sched_group_barrier(8, 5, 0)``
- llvm.amdgcn.iglp_opt An **experimental** intrinsic for instruction group level parallelism. The intrinsic
+ llvm.amdgcn.sched.group.barrier.rule It has the same behavior as sched.group.barrier, except the intrinsic includes a fourth argument:
+
+ - RuleMask : The bitmask of rules which are applied to the SchedGroup.
+
+ The RuleMask is handled as a 64 bit integer, so 64 rules are encodable with a single mask.
+
+ Users can access the intrinsic by specifying the optional fourth argument in sched_group_barrier builtin
+
+ | ``// 1 VMEM read invoking rules 1 and 2``
+ | ``__builtin_amdgcn_sched_group_barrier(32, 1, 0, 3)``
+
+ Currently available rules are:
+ - 0x0000: No rule.
+ - 0x0001: Instructions in the SchedGroup must not write to the same register
+ that a previously occuring V_CNDMASK_B32_e64 reads from.
+ - 0x0002: Instructions in the SchedGroup must not write to the same register
+ that a previously occuring V_PERM_B32_e64 reads from.
+ - 0x0004: Instructions in the SchedGroup must require data produced by a
+ V_CNDMASK_B32_e64.
+ - 0x0008: Instructions in the SchedGroup must require data produced by a
+ V_PERM_B32_e64.
+
----------------
arsenm wrote:
These scheduling rules seem way too specific. Especially that it's pointing out specific instruction encodings, by the internal pseudoinstruction names
https://github.com/llvm/llvm-project/pull/85304
More information about the cfe-commits
mailing list