[PATCH] D128270: [AMDGPU] New AMDGPUInsertDelayAlu pass

Wed Jun 22 17:54:31 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:162
+
+#ifndef NDEBUG
+    void dump() const {
----------------
Should check LLVM_ENABLE_DUMP instead

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:217
+        Order.push_back(I);
+      llvm::sort(Order, [](const auto &A, const auto &B) {
+        return A->first < B->first;
----------------
Bad autos, I'm not even sure what the type is supposed to be here

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:301-302
+
+    LLVM_DEBUG(dbgs() << "  State at start of BB" << MBB.getNumber() << "\n");
+    LLVM_DEBUG(State.dump(TRI));
+
----------------
Can merge these into one LLVM_DEBUG. Also should use printMBBReference

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:313-320
+      // Ignore some more instructions that do not generate any code.
+      // FIXME: should these be marked as isMetaInstruction?
+      switch (MI.getOpcode()) {
+      case AMDGPU::SI_MASKED_UNREACHABLE:
+      case AMDGPU::SI_RETURN_TO_EPILOG:
+      case AMDGPU::WAVE_BARRIER:
+        continue;
----------------
This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:323-328
+      DelayType Type =
+          (TSFlags & SIInstrFlags::TRANS)
+              ? TRANS
+              : (TSFlags & SIInstrFlags::VALU)
+                    ? VALU
+                    : (TSFlags & SIInstrFlags::SALU) ? SALU : OTHER;
----------------
Should extract this into a separate predicate function

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:330-332
+      if (TSFlags & (SIInstrFlags::DS | SIInstrFlags::EXP | SIInstrFlags::FLAT |
+                     SIInstrFlags::MIMG | SIInstrFlags::MTBUF |
+                     SIInstrFlags::MUBUF) ||
----------------
Ditto

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:345-348
+            // One of the operands of the writelane is also the output operand.
+            // This creates the insertion of redundant delays. Hence, we have to
+            // ignore this operand.
+            if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32 && Op.isTied())
----------------
Why does the opcode need special casing here? Why not every tied operand?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:385-386
+
+      LLVM_DEBUG(dbgs() << "  State after " << MI);
+      LLVM_DEBUG(State.dump(TRI));
+    }
----------------
One LLVM_DEBUG

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128270/new/

https://reviews.llvm.org/D128270