[PATCH] D128158: [AMDGPU] Add amdgcn_sched_group_barrier builtin

Fri Jul 1 11:45:44 PDT 2022

jrbyrnes added a comment.

Hey Austin -- I like the removal of canAddMIs. In the original design, I was leaving open the possibility for users to pass in canAddMIs rather than a mask / SchedGroup name, but it looks like this isn't the direction we're going, and the classification functions defined in a general canAddMI makes things easier.

I see this is a WIP, but I've added some thoughts I had from reading it over. I may have more as I use the design for my patch.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:199
+  // SchedGroupMask of instructions that should be barred.
+  SchedGroupMask invertSchedBarrierMask(SchedGroupMask Mask) const;
+
----------------
I find it confusing that SchedBarrier uses inversion while SchedGroupBarrier doesn't.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:306
+bool SchedGroup::isFull() const {
+  return MaxSize.hasValue() && Collection.size() >= *MaxSize;
+}
----------------
As in the update to IGroupLP.cpp in trunk, seems like we are not supposed to use hasValue.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:349
+  add(InitSU);
+  assert(MaxSize.hasValue());
+  (*MaxSize)++;
----------------
Not possible to have unsized groups?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:445
+  // initialized all of the SCHED_GROUP_BARRIER SchedGroups.
+  addSchedGroupBarrierEdges();
 }
----------------
If both types of barriers are present -- the SchedBarriers are handled first. However, if there is a conflict between SchedBarrier and SchedGroupBarrier, should SchedBarrier always get the priority? Maybe SchedBarrier should only handle groups not present in SchedGroupBarrier?

================
Comment at: llvm/test/CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir:104
+    GLOBAL_STORE_DWORD_SADDR %1, %13, %0, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; 1 VMEM_READ
+    SCHED_GROUP_BARRIER 32, 1, 0
----------------
I think you are aware of this issue. But the ability for the mutation to match the pipeline is dependent upon which instructions go into which group (when an instruction can be mapped to multiple groups).

If we had SchedGroups: 2 VMEM_READ, 1 VALU, 1 MFMA, 2 VMEM_READ

and initial schedule: VMEMR, VALU, VMEMR, MFMA, VMEMR, with a dependency between middle VMEMR->MFMA. 

initSchedGroup will add the middle VMEMR to the last VMEMR group, but we could get a more accurate pipeline by adding it to the first group.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128158/new/

https://reviews.llvm.org/D128158