[llvm] [AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (PR #81342)

Mon Feb 19 11:48:17 PST 2024

================
@@ -902,6 +904,921 @@ void MFMASmallGemmOpt::applyIGLPStrategy(
         SchedGroupMask::MFMA, 1, PipelineSyncID, DAG, TII);
     SG->initSchedGroup(SyncedInstrs[SG->getSyncID()]);
   }
+
+  return true;
+}
+
+class MFMAExpInterleaveOpt final : public IGLPStrategy {
+private:
+  SmallVector<SUnit *, 4> MFMAChainSeeds;
+  // Compute the heuristics for the pipeline, returning whether or not the DAG
+  // is well formatted for the mutation
+  bool analyzeDAG(const SIInstrInfo *TII);
+
+  /// Whether or not the instruction is a transitive predecessor of an MFMA
+  /// instruction
+  class IsPipeExp final : public InstructionRule {
+  public:
+    bool apply(const SUnit *SU, const ArrayRef<SUnit *> Collection,
+               SmallVectorImpl<SchedGroup> &SyncPipe) override {
+
+      auto DAG = SyncPipe[0].DAG;
+      auto TII = SyncPipe[0].TII;
+
+      if (Cache->empty()) {
----------------
jrbyrnes wrote:

This rule is only currently used in MFMAExpInterleaveOpt which is only applied if analyzeDAG finds that the DAG meets certain criteria. One of these criteria is that there exists at least one dependency chain wherein a V_EXP is a transitive predecessor of a V_MFMA. 

Perhaps it would be a better design to have analyzeDAG produce a "IGLPDAGInfo" instance which controls behavior of the pipeline and rules. Either way, in the current implementation we won't run into the problem where we are consistently looking for a non-existant MFMA.

Parsing bottom up we find the last MFMA, which falls into the third stage of the CK pipeline (see below).

https://github.com/llvm/llvm-project/pull/81342