[PATCH] D128270: [AMDGPU] New AMDGPUInsertDelayAlu pass
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 22 17:54:31 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:162
+
+#ifndef NDEBUG
+ void dump() const {
----------------
Should check LLVM_ENABLE_DUMP instead
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:217
+ Order.push_back(I);
+ llvm::sort(Order, [](const auto &A, const auto &B) {
+ return A->first < B->first;
----------------
Bad autos, I'm not even sure what the type is supposed to be here
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:301-302
+
+ LLVM_DEBUG(dbgs() << " State at start of BB" << MBB.getNumber() << "\n");
+ LLVM_DEBUG(State.dump(TRI));
+
----------------
Can merge these into one LLVM_DEBUG. Also should use printMBBReference
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:313-320
+ // Ignore some more instructions that do not generate any code.
+ // FIXME: should these be marked as isMetaInstruction?
+ switch (MI.getOpcode()) {
+ case AMDGPU::SI_MASKED_UNREACHABLE:
+ case AMDGPU::SI_RETURN_TO_EPILOG:
+ case AMDGPU::WAVE_BARRIER:
+ continue;
----------------
This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird)
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:323-328
+ DelayType Type =
+ (TSFlags & SIInstrFlags::TRANS)
+ ? TRANS
+ : (TSFlags & SIInstrFlags::VALU)
+ ? VALU
+ : (TSFlags & SIInstrFlags::SALU) ? SALU : OTHER;
----------------
Should extract this into a separate predicate function
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:330-332
+ if (TSFlags & (SIInstrFlags::DS | SIInstrFlags::EXP | SIInstrFlags::FLAT |
+ SIInstrFlags::MIMG | SIInstrFlags::MTBUF |
+ SIInstrFlags::MUBUF) ||
----------------
Ditto
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:345-348
+ // One of the operands of the writelane is also the output operand.
+ // This creates the insertion of redundant delays. Hence, we have to
+ // ignore this operand.
+ if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32 && Op.isTied())
----------------
Why does the opcode need special casing here? Why not every tied operand?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp:385-386
+
+ LLVM_DEBUG(dbgs() << " State after " << MI);
+ LLVM_DEBUG(State.dump(TRI));
+ }
----------------
One LLVM_DEBUG
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128270/new/
https://reviews.llvm.org/D128270
More information about the llvm-commits
mailing list