[PATCH] D88081: [AMDGPU] Move WQM Pass after MI Scheduler
Carl Ritson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 16 05:05:25 PDT 2020
critson marked 5 inline comments as done.
critson added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:315-332
+ // Does this def cover whole register?
+ const MachineOperand &DstOp = MI->getOperand(0);
+ bool DefinesFullReg = false;
+ if (DstOp.isReg()) {
+ DefinesFullReg = (DstOp.getReg() == Reg) &&
+ (!DstOp.getSubReg() || DstOp.getSubReg() == SubReg ||
+ DstOp.isUndef());
----------------
nhaehnle wrote:
> critson wrote:
> > nhaehnle wrote:
> > > I don't understand the logic here. Why the special treatment of operand 0?
> > Are you saying I should iterate MI->defs() instead?
> > The code here is intended to mark all instructions defining parts of the specified register.
> > If this is a partial register write then we need to follow the input values to mark the other instructions.
> > Are you saying I should iterate MI->defs() instead?
>
> Well, there's the question of whether you need to follow implicit defs as well. I just don't see why you're treating operand 0 differently from others.
I don't think I am intentionally treating operand 0 different, this was just written to inspect the definition of the instruction and should be based on defs().
The point of this loop is to follow the chain of all definitions of a register (or parts of it), and mark each instruction involved. The idea is it stops when the entire register has been defined. (Or rather it needs to keep going if the definitions are only partial.) For that reason we should also look at implicit defs, as the first whole register implicit def should be a valid stopping point.
================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:868-884
+ // Running after control flow lowering we must fix up control flow
+ // to consider modified EXEC mask.
+ if (PrevMI && MI.isTerminator() && BI.OutNeeds == StateExact) {
+ if (MI.getOpcode() == XorTermrOpc &&
+ MI.getOperand(0).getReg() == Exec &&
+ PrevMI->getOpcode() == OrSaveExecOpc) {
+ assert(!MRI->isSSA());
----------------
nhaehnle wrote:
> critson wrote:
> > nhaehnle wrote:
> > > This looks suspicious. Can you please explain what is happening here?
> > This is analogous to the code above that modifies the SI_ELSE to make it respect the EXEC mask -- it is a very special case match and fix up based on current code generation so I do not like it either.
> > If we make SI_ELSE always respect modifications to the EXEC mask then this can go away. Then perhaps we add a late peephole to clear up some of the unnecessary instructions when they are not required.
> > Do you have an opinion on this?
> We really shouldn't rely on such details of current code generation here.
>
> I think you're on to something. Digging into this more... SI_ELSE is currently lowered as:
> ```
> s_or_saveexec_bNN dst, src
> ...
> s_xor_bNN_term exec, exec, dst
> ```
> What if we instead lowered it as:
> ```
> s_or_saveexec_bNN tmp, src
> ...
> s_and_bNN_term dst, exec, tmp
> s_xor_bNN_term exec, exec, dst
> ```
> One of the OptimizeExecMasking passes can then just remove the s_and_bNN_term if there is no modification of exec in the middle.
>
> I think I'd be happy about that solution. In practice, the scan backwards from s_and_bNN isn't that expensive and I believe it's required anyway. Thoughts?
Yes, your solution is what I was trying to suggest.
Your instead case is what happens when we do MI.getOperand(3).setImm(1).
I will put it in as a separate Phabricator review shortly to simplify SI_ELSE lowering and optimise out the unnecessary s_and in OptimizeExecMasking.
================
Comment at: llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1018
+ LIS->removeInterval(Reg);
+ DefMI->getOperand(0).setReg(Reg);
+ MI->removeFromParent();
----------------
nhaehnle wrote:
> critson wrote:
> > nhaehnle wrote:
> > > This is incorrect, there could be other users of the value. Just keep the simpler case below.
> > There should not be other users of the value, it is a kill?
> >
> > I am not going to fight to keep this, but we would benefit from more late clean up of unnecessary copies.
> > I guess this ties into some of the things I am touching on in D89187, so the follow up to that might solve this.
> I appreciate the desire to remove some unnecessary copies, but let's first figure out whether this one is correct.
>
> Specifically, I thought LRQ.isKill() only means that the use of SrcReg in MI is the _last_ use. There could be other uses of the same definition of SrcReg that come earlier, right?
>
> So maybe you could still eliminate the copy here if you updated those other uses as well. I would still ask you to keep things simpler here for this change and see if you can find a good place to eliminate this kind of copy separately in a dedicated pass. This code is quite difficult to follow even without this.
OK, yep I see that there /could/ be other users, although in practice I had not encountered them.
I will work on cleaning this up in a later pass.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D88081/new/
https://reviews.llvm.org/D88081
More information about the llvm-commits
mailing list