[PATCH] D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 27 14:42:43 PST 2018


rampitec added a comment.

In D54882#1310543 <https://reviews.llvm.org/D54882#1310543>, @ronlieb wrote:

> In D54882#1309928 <https://reviews.llvm.org/D54882#1309928>, @rampitec wrote:
>
> > In D54882#1309888 <https://reviews.llvm.org/D54882#1309888>, @ronlieb wrote:
> >
> > > In D54882#1309711 <https://reviews.llvm.org/D54882#1309711>, @rampitec wrote:
> > >
> > > > In D54882#1309708 <https://reviews.llvm.org/D54882#1309708>, @ronlieb wrote:
> > > >
> > > > > Adding the shrink pass just before Peephole SDWA does not help the lit test,  it made no difference.
> > > > >  I think at this point i should proceed with adding the MIR test to verify when we cannot fold.
> > > >
> > > >
> > > > Do you know why was it unable to shrink these instructions?
> > >
> > >
> > > SIInstrInfo::splitScalar64BitAddSub converts the S_ADD_U64_PSEUDO into the two  add instructions which use the SReg_64_XEXECRegClass instead of VCC.
> > >  Later when SIShrinkInstructions::runOnMachineFunction pass runs, it sees that the Carry regs are not VCC and simply marks them with a hint to later convert to VCC ,
> > >  and then continues without doing a transformation.
> > >
> > >   if (SDst) {
> > >     if (SDst->getReg() != AMDGPU::VCC) {
> > >       if (TargetRegisterInfo::isVirtualRegister(SDst->getReg()))
> > >         MRI.setRegAllocationHint(SDst->getReg(), 0, AMDGPU::VCC);
> > >       continue;
> > >     }
> > >   
> > >     // All of the instructions with carry outs also have an SGPR input in
> > >     // src2.
> > >     if (Src2 && Src2->getReg() != AMDGPU::VCC) {
> > >       if (TargetRegisterInfo::isVirtualRegister(Src2->getReg()))
> > >         MRI.setRegAllocationHint(Src2->getReg(), 0, AMDGPU::VCC);
> > >   
> > >       continue;
> > >     }
> > >   }
> >
> >
> > OK, that makes sense. It just leaves it to post-RA shrink that way. Probably shrink pass could do the same, but it is not desirable as it limits scheduling opportunities.
> >
> > But I see the problem in your code now: you do not check that vcc is not clobbered or used in between of two instructions.
> >  I also think you need to shrink both instructions, otherwise you have carry-in of addc and carry-out of add in different registers, which just happen to be allocated to the same vcc. Note, that isConvertibleToSDWA() returning true does not guarantee final sdwa conversion, so you can end up with vop3 form for the first instruction anyway.
>
>
> i think the following code does make sure that there are no intervening uses, i can also strengthen it to make sure the defining instruction of the CarryIn is the first ADD instruction.
>  +  if (!MRI->hasOneUse(CarryIn->getReg()) || !MRI->use_empty(CarryOut->getReg()))
>  +    return false;


It does not. It only checks that original sreg is not used. However you are replacing original carry sreg with vcc by shrinking instruction, and you do not check vcc uses.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54882/new/

https://reviews.llvm.org/D54882





More information about the llvm-commits mailing list