[PATCH] D71089: [AMDGPU] Optimizing unnecessary copies for REG_SEQUENCE PHI operand. Also fixes rocBLAS error

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 6 10:31:25 PST 2019


alex-t marked an inline comment as done.
alex-t added a comment.

In D71089#1772921 <https://reviews.llvm.org/D71089#1772921>, @arsenm wrote:

> Definitely needs a test. I'm not sure I understand the fix here. This is some kind of workaround? I would be happier with the subreg fix happening first. Removing unnecessary copies should not be relevant to the correctness fix?


Agree. The reason to try submitting this first is that change in lib/CodeGen/MachineSink.cpp that fixes the issue will likely take a long time to be approved.
This issue with this https://reviews.llvm.org/D68635 patch breaks one of the rocBLAS kernels correctness because of sinking subreg definition out from the superreg use.
This in order creates several duplicated P1 <https://reviews.llvm.org/P1> bugs.

Removing unnecessary copies just covers the IR pattern that drives PostRA Machine Sink mad.

If I could to speedup the PostRA Machine Sink fix review somehow, I prefer to submit it first and work on the copy elimination patch refactoring to make it more elegant.

The PostRA Machine Sink fix is here: https://reviews.llvm.org/D71132



================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:4274
 
 void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
                                          MachineBasicBlock::iterator I,
----------------
arsenm wrote:
> I think the legalization done here should only touch the local instruction, and not be scanning through copy chains. This seems like it's trying to avoid a failure of the PeepholeOptimizer?
Initially I had intention to generalize your code that handles single COPY user of REG_SEQUENCE (SIFixSGPRCopies.cpp).  Then I realized that the pattern I'm chasing to eliminate is only related to the PHIs.
Namely, it is the single-use chain of copies that connects REG_SEQUENCE that produce the value to the PHI that consumes it. The chain is possibly going through the several BBs.
That's why I opted for the backward search from the PHI up to the REG_SEQUENCE .
I changed the walk in the runOnMachineFunction main loop to post order to ensure that PHI is processed first, before the REG_SEQUENCE.
I agree that generalizing foldVGPRCopyIntoRegSequence  would be more neat.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71089/new/

https://reviews.llvm.org/D71089





More information about the llvm-commits mailing list