[PATCH] D88081: [AMDGPU] Move WQM Pass after MI Scheduler

Thu Oct 22 08:41:24 PDT 2020

nhaehnle accepted this revision.
nhaehnle added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:315-332
+      // Does this def cover whole register?
+      const MachineOperand &DstOp = MI->getOperand(0);
+      bool DefinesFullReg = false;
+      if (DstOp.isReg()) {
+        DefinesFullReg = (DstOp.getReg() == Reg) &&
+                         (!DstOp.getSubReg() || DstOp.getSubReg() == SubReg ||
+                          DstOp.isUndef());
----------------
critson wrote:
> nhaehnle wrote:
> > critson wrote:
> > > nhaehnle wrote:
> > > > critson wrote:
> > > > > nhaehnle wrote:
> > > > > > I don't understand the logic here. Why the special treatment of operand 0?
> > > > > Are you saying I should iterate MI->defs() instead?
> > > > > The code here is intended to mark all instructions defining parts of the specified register.
> > > > > If this is a partial register write then we need to follow the input values to mark the other instructions.
> > > > > Are you saying I should iterate MI->defs() instead?
> > > > 
> > > > Well, there's the question of whether you need to follow implicit defs as well. I just don't see why you're treating operand 0 differently from others.
> > > I don't think I am intentionally treating operand 0 different, this was just written to inspect the definition of the instruction and should be based on defs().
> > > 
> > > The point of this loop is to follow the chain of all definitions of a register (or parts of it), and mark each instruction involved.  The idea is it stops when the entire register has been defined. (Or rather it needs to keep going if the definitions are only partial.)  For that reason we should also look at implicit defs, as the first whole register implicit def should be a valid stopping point.
> > Well the code treats operand 0 specially, because there's **literally** a `getOperand(0)` in there, with a hard-coded `0`. If the intention is to treat all defs in the same way (which I think it should be), then why not have a single homogenous loop over operands?
> > 
> > I do understand the point about partial defs, that makes sense and it's not what I'm worried about.
> I have fixed this to iterate all operands looking for appropriate defs to follow.
Thanks!

================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1018
+          LIS->removeInterval(Reg);
+          DefMI->getOperand(0).setReg(Reg);
+          MI->removeFromParent();
----------------
critson wrote:
> nhaehnle wrote:
> > critson wrote:
> > > nhaehnle wrote:
> > > > critson wrote:
> > > > > nhaehnle wrote:
> > > > > > This is incorrect, there could be other users of the value. Just keep the simpler case below.
> > > > > There should not be other users of the value, it is a kill?
> > > > > 
> > > > > I am not going to fight to keep this, but we would benefit from more late clean up of unnecessary copies.
> > > > > I guess this ties into some of the things I am touching on in D89187, so the follow up to that might solve this.
> > > > I appreciate the desire to remove some unnecessary copies, but let's first figure out whether this one is correct.
> > > > 
> > > > Specifically, I thought LRQ.isKill() only means that the use of SrcReg in MI is the _last_ use. There could be other uses of the same definition of SrcReg that come earlier, right?
> > > > 
> > > > So maybe you could still eliminate the copy here if you updated those other uses as well. I would still ask you to keep things simpler here for this change and see if you can find a good place to eliminate this kind of copy separately in a dedicated pass. This code is quite difficult to follow even without this.
> > > OK, yep I see that there /could/ be other users, although in practice I had not encountered them.
> > > I will work on cleaning this up in a later pass.
> > What do you mean by cleaning this up in a later pass? The goal should be to keep the MachineIR an accurate representation of the program at all times.
> MachineIR is accurate.
> My point is because the pass now runs later there is nothing to optimise away trivial copies it introduces when lowering WWM operations.
> See cruft this adds in atomic tests, e.g. atomic_optimizations_buffer.ll
> 
> 
> 
> 
> 
Ah, I see. Maybe that cleanup could be done as a follow-up change.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88081/new/

https://reviews.llvm.org/D88081