[PATCH] D88081: [AMDGPU] Move WQM Pass after MI Scheduler

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 14 16:04:46 PDT 2020


nhaehnle added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:315-332
+      // Does this def cover whole register?
+      const MachineOperand &DstOp = MI->getOperand(0);
+      bool DefinesFullReg = false;
+      if (DstOp.isReg()) {
+        DefinesFullReg = (DstOp.getReg() == Reg) &&
+                         (!DstOp.getSubReg() || DstOp.getSubReg() == SubReg ||
+                          DstOp.isUndef());
----------------
critson wrote:
> nhaehnle wrote:
> > I don't understand the logic here. Why the special treatment of operand 0?
> Are you saying I should iterate MI->defs() instead?
> The code here is intended to mark all instructions defining parts of the specified register.
> If this is a partial register write then we need to follow the input values to mark the other instructions.
> Are you saying I should iterate MI->defs() instead?

Well, there's the question of whether you need to follow implicit defs as well. I just don't see why you're treating operand 0 differently from others.


================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:868-884
+      // Running after control flow lowering we must fix up control flow
+      // to consider modified EXEC mask.
+      if (PrevMI && MI.isTerminator() && BI.OutNeeds == StateExact) {
+        if (MI.getOpcode() == XorTermrOpc &&
+            MI.getOperand(0).getReg() == Exec &&
+            PrevMI->getOpcode() == OrSaveExecOpc) {
+          assert(!MRI->isSSA());
----------------
critson wrote:
> nhaehnle wrote:
> > This looks suspicious. Can you please explain what is happening here?
> This is analogous to the code above that modifies the SI_ELSE to make it respect the EXEC mask -- it is a very special case match and fix up based on current code generation so I do not like it either.
> If we make SI_ELSE always respect modifications to the EXEC mask then this can go away.  Then perhaps we add a late peephole to clear up some of the unnecessary instructions when they are not required.
> Do you have an opinion on this?
We really shouldn't rely on such details of current code generation here.

I think you're on to something. Digging into this more... SI_ELSE is currently lowered as:
```
    s_or_saveexec_bNN dst, src
    ...
    s_xor_bNN_term exec, exec, dst
```
What if we instead lowered it as:
```
    s_or_saveexec_bNN tmp, src
    ...
    s_and_bNN_term dst, exec, tmp
    s_xor_bNN_term exec, exec, dst
```
One of the OptimizeExecMasking passes can then just remove the s_and_bNN_term if there is no modification of exec in the middle.

I think I'd be happy about that solution. In practice, the scan backwards from s_and_bNN isn't that expensive and I believe it's required anyway. Thoughts?


================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1018
+          LIS->removeInterval(Reg);
+          DefMI->getOperand(0).setReg(Reg);
+          MI->removeFromParent();
----------------
critson wrote:
> nhaehnle wrote:
> > This is incorrect, there could be other users of the value. Just keep the simpler case below.
> There should not be other users of the value, it is a kill?
> 
> I am not going to fight to keep this, but we would benefit from more late clean up of unnecessary copies.
> I guess this ties into some of the things I am touching on in D89187, so the follow up to that might solve this.
I appreciate the desire to remove some unnecessary copies, but let's first figure out whether this one is correct.

Specifically, I thought LRQ.isKill() only means that the use of SrcReg in MI is the _last_ use. There could be other uses of the same definition of SrcReg that come earlier, right?

So maybe you could still eliminate the copy here if you updated those other uses as well. I would still ask you to keep things simpler here for this change and see if you can find a good place to eliminate this kind of copy separately in a dedicated pass. This code is quite difficult to follow even without this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88081/new/

https://reviews.llvm.org/D88081



More information about the llvm-commits mailing list