[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

Wed May 8 09:20:46 PDT 2024

================
@@ -1284,7 +1284,29 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
                                                    |  ``// 5 MFMA``
                                                    |  ``__builtin_amdgcn_sched_group_barrier(8, 5, 0)``
 
-  llvm.amdgcn.iglp_opt                             An **experimental** intrinsic for instruction group level parallelism. The intrinsic
+  llvm.amdgcn.sched.group.barrier.rule             It has the same behavior as sched.group.barrier, except the intrinsic includes a fourth argument:
+
+                                                   - RuleMask : The bitmask of rules which are applied to the SchedGroup.
+
+                                                   The RuleMask is handled as a 64 bit integer, so 64 rules are encodable with a single mask.
+
+                                                   Users can access the intrinsic by specifying the optional fourth argument in sched_group_barrier builtin
+
+                                                   |  ``// 1 VMEM read invoking rules 1 and 2``
+                                                   |  ``__builtin_amdgcn_sched_group_barrier(32, 1, 0, 3)``
+
+                                                   Currently available rules are:
+                                                   - 0x0000: No rule.
+                                                   - 0x0001: Instructions in the SchedGroup must not write to the same register
+                                                     that a previously occuring V_CNDMASK_B32_e64 reads from.
+                                                   - 0x0002: Instructions in the SchedGroup must not write to the same register
+                                                     that a previously occuring V_PERM_B32_e64 reads from.
+                                                   - 0x0004: Instructions in the SchedGroup must require data produced by a
+                                                     V_CNDMASK_B32_e64.
+                                                   - 0x0008: Instructions in the SchedGroup must require data produced by a
+                                                     V_PERM_B32_e64.
+
----------------
arsenm wrote:

These scheduling rules seem way too specific. Especially that it's pointing out specific instruction encodings, by the internal pseudoinstruction names 

https://github.com/llvm/llvm-project/pull/85304