[PATCH] D56289: Added single use check to ShrinkDemandedConstant

Mon Jan 7 13:14:36 PST 2019

rampitec added a comment.

In D56289#1348541 <https://reviews.llvm.org/D56289#1348541>, @rampitec wrote:

> In D56289#1348529 <https://reviews.llvm.org/D56289#1348529>, @rampitec wrote:
>
> > In D56289#1348172 <https://reviews.llvm.org/D56289#1348172>, @fhahn wrote:
> >
> > > I am seeing a regression with this change on AArch64, where we do not use shrinked constants, e.g. for and/or if they have multiple users. I am not entirely sure what the problem is this patch is solving, as I am not familiar with AMDGPU. Is it a case where shrinking the constant leads to shrinking the related `Op` and in the end this introduces unnecessary conversion code, as multiple users require different bitwidths?
> > >
> > > If that is the case, I am not sure if this is the best place to fix the issue. IIUC, shrinkDemandedBits just shrinks the constant of the passed in `Op`, but not the width of the result type of `Op`. So it seems to me that shrinking the constant should be independent of the users of `Op`.
> >
> >
> > Here is DAG exposing the problem:
> >
> >   t33: i32 = or t42, Constant:i32<-2147483647>
> >           t37: f32 = bitcast t33
> >           t66: f32 = CVT_F32_UBYTE0 t33
> >         t38: f32 = fadd t37, t66
> >   
> >   
> >
> > The node t66 only needs a low byte from the t33 "or" node. The other use of "or" is bitcast and subsequently t38 fadd which needs full dword.
> >  The "Op" argument of ShrinkDemandedConstant() is t33 "or" and Demanded = 0xff as needed for t66.
> >
> > If optimization succeeds it will replace "or t42, 0x80000001" with "or t42, 1". However, this "or" has another use which needs the full range of the value.
> >  That is a bug not to take other uses into account.
> >
> > It is unfortunate this lead to regression on AArch64 side. Are you sure a similar issue can never happen with AArch64?
> >  Theoretically that is possible to move a single use check past targetShrinkDemandedConstant() call which actually do the job for AArch I presume.
> >  I think it should correct lack of optimization in that specific case, but I am not sure there will be no correctness issues elsewhere. Here is what I am speaking about:
> >
> >   --- a/lib/CodeGen/SelectionDAG/TargetLowering.cpp
> >   +++ b/lib/CodeGen/SelectionDAG/TargetLowering.cpp
> >   @@ -350,13 +350,13 @@ bool TargetLowering::ShrinkDemandedConstant(SDValue Op, const APInt &Demanded,
> >      SDLoc DL(Op);
> >      unsigned Opcode = Op.getOpcode();
> >  
> >   -  if (!Op.hasOneUse())
> >   -    return false;
> >   -
> >      // Do target-specific constant optimization.
> >      if (targetShrinkDemandedConstant(Op, Demanded, TLO))
> >        return TLO.New.getNode();
> >  
> >   +  if (!Op.hasOneUse())
> >   +    return false;
> >   +
> >      // FIXME: ISD::SELECT, ISD::SELECT_CC
> >      switch (Opcode) {
> >      default:
> >  
> >
>
>
> In fact the bit width of all operations is the same. It just t66 only known to read low byte from its operand. I believe it worth nothing t66 is custom AMDGPU node.
>  Consider t66 is a shift left by 24:
>
>   t33: i32 = or t42, Constant:i32<-2147483647>
>           t66: i32 = shl t33, 24
>           t38: i32 = add t33, t66
>   
>
> There is nothing target specific here. t66 also only known to read low 8 bits of t33 and that is not unreasonable to call ShrinkDemandedConstant on t33 with Demanded = 0xff. It will expose the same bug regardless of target.

After several hours trying to come up with a testcase which would fail with any other node or target I found none.
I have moved the check into the target: D56406 <https://reviews.llvm.org/D56406>. Maybe later a check will be needed somewhere in the common code.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56289/new/

https://reviews.llvm.org/D56289