[llvm] r301990 - [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction

Wed May 10 11:13:16 PDT 2017

On Tue, May 9, 2017 at 3:43 PM Friedman, Eli <efriedma at codeaurora.org>
wrote:

> On 5/9/2017 2:09 PM, Tim Shen wrote:
>
> On Tue, May 9, 2017 at 12:45 PM Friedman, Eli <efriedma at codeaurora.org>
> wrote:
>
>> On 5/9/2017 12:19 PM, Tim Shen wrote:
>>
>> Sorry for missing the email. :/
>>
>> On Wed, May 3, 2017 at 11:43 AM Friedman, Eli via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> On 5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:
>>> > Author: timshen
>>> > Date: Tue May  2 19:07:02 2017
>>> > New Revision: 301990
>>> >
>>> > URL: http://llvm.org/viewvc/llvm-project?rev=301990&view=rev
>>> > Log:
>>> > [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a
>>> single instruction
>>> >
>>> > Summary:
>>> > This is the corresponding llvm change to D28037 to ensure no
>>> performance
>>> > regression.
>>> >
>>> > Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
>>> >
>>> > Subscribers: nemanjai, llvm-commits
>>> >
>>> > Differential Revision: https://reviews.llvm.org/D28329
>>> >
>>> > Modified:
>>> >      llvm/trunk/include/llvm/Target/TargetLowering.h
>>> >      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> >      llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
>>> >      llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll
>>> >
>>> > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>>> > URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff
>>> >
>>> ==============================================================================
>>> > --- llvm/trunk/include/llvm/Target/TargetLowering.h (original)
>>> > +++ llvm/trunk/include/llvm/Target/TargetLowering.h Tue May  2
>>> 19:07:02 2017
>>> > @@ -2061,6 +2061,14 @@ public:
>>> >       return false;
>>> >     }
>>> >
>>> > +  // Return true if the instruction that performs a << b actually
>>> performs
>>> > +  // a << (b % (sizeof(a) * 8)).
>>> > +  virtual bool supportsModuloShift(ISD::NodeType Inst, EVT
>>> ReturnType) const {
>>> > +    assert((Inst == ISD::SHL || Inst == ISD::SRA || Inst == ISD::SRL)
>>> &&
>>> > +           "Expect a shift instruction");
>>> > +    return false;
>>> > +  }
>>> > +
>>> >
>>>  //===--------------------------------------------------------------------===//
>>> >     // Runtime Library hooks
>>> >     //
>>> >
>>> > Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> > URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff
>>> >
>>> ==============================================================================
>>> > --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>>> > +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue May  2
>>> 19:07:02 2017
>>> > @@ -5294,6 +5294,17 @@ SDValue DAGCombiner::visitSHL(SDNode *N)
>>> >       }
>>> >     }
>>> >
>>> > +  // If the target supports masking y in (shl, y),
>>> > +  // fold (shl x, (and y, ((1 << numbits(x)) - 1))) -> (shl x, y)
>>> > +  if (TLI.isOperationLegal(ISD::SHL, VT) &&
>>> > +      TLI.supportsModuloShift(ISD::SHL, VT) && N1->getOpcode() ==
>>> ISD::AND) {
>>> > +    if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > +        return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
>>> N1->getOperand(0));
>>> > +      }
>>> > +    }
>>> > +  }
>>> > +
>>> >     ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>> >
>>> >     // fold (shl c1, c2) -> c1<<c2
>>> > @@ -5492,6 +5503,17 @@ SDValue DAGCombiner::visitSRA(SDNode *N)
>>> >     EVT VT = N0.getValueType();
>>> >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>> >
>>> > +  // If the target supports masking y in (sra, y),
>>> > +  // fold (sra x, (and y, ((1 << numbits(x)) - 1))) -> (sra x, y)
>>> > +  if (TLI.isOperationLegal(ISD::SRA, VT) &&
>>> > +      TLI.supportsModuloShift(ISD::SRA, VT) && N1->getOpcode() ==
>>> ISD::AND) {
>>> > +    if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > +        return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0,
>>> N1->getOperand(0));
>>> > +      }
>>> > +    }
>>> > +  }
>>> > +
>>> >     // Arithmetic shifting an all-sign-bit value is a no-op.
>>> >     if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)
>>> >       return N0;
>>> > @@ -5650,6 +5672,17 @@ SDValue DAGCombiner::visitSRL(SDNode *N)
>>> >     EVT VT = N0.getValueType();
>>> >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>> >
>>> > +  // If the target supports masking y in (srl, y),
>>> > +  // fold (srl x, (and y, ((1 << numbits(x)) - 1))) -> (srl x, y)
>>> > +  if (TLI.isOperationLegal(ISD::SRL, VT) &&
>>> > +      TLI.supportsModuloShift(ISD::SRL, VT) && N1->getOpcode() ==
>>> ISD::AND) {
>>> > +    if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > +        return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0,
>>> N1->getOperand(0));
>>> > +      }
>>> > +    }
>>> > +  }
>>>
>>> I see three problems here:
>>>
>>> 1. It needs to be documented.
>>>
>>
>> I wrote comments on the transformation and the TLI hook. What kind of
>> documentation do you expect?
>>
>>
>> A comment in ISDOpcodes.h.
>>
>>
>>
>>
>>> 2. This isn't consistent with other transforms done by DAGCombine, e.g.
>>> constant folding won't honor this.
>>>
>>
>> How so? Although it runs before constant folding (e.g. (shl c1, c2) ->
>> c1<<c2), constant folding will pick it up during the next iteration, won't
>> it?
>>
>>
>> Suppose you have "SHL(1, AND(y, 31))".  We fold it to "SHL(1, y)" because
>> supportsModuloShift is true.  Then we figure out that y is 32.  Then we
>> constant-fold "SHL(1, 32)" to zero (because that's the default behavior for
>> APInt).
>>
>> And that's only the one I thought of off the top of my head; there are
>> probably other transforms which assume the shift amount is in range.
>>
>
> Good point. I think we should write patterns in .td files as you suggested.
>
> The only problem I got is that, I don't know how to write a working
> pattern for the following DAG:
>
> ; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8
> 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
> ; %shl = shl <16 x i8> %a, %rem
> Initial selection DAG: BB#0 'test010:'
> SelectionDAG has 12 nodes:
>   t0: ch = EntryToken
>       t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0
>         t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1
>         t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>,
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
> Constant:i8<7>, Constant:i8<7>
>       t7: v16i8 = and t4, t6
>     t8: v16i8 = shl t2, t7
>   t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8
>   t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1
>
> I tried the patterns:
> def : Pat<(v16i8 (shl v16i8:$vA,
>                  (and v16i8:$vB,
>                       (v16i8 (build_vector (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8
> 7)))))),
>           (v16i8 (VSLB $vA, $vB))>;
>
> and
> def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),
>           (v16i8 (VSLB $vA, $vB))>;
>
> but they don't match the code above.
>
> Do you see anywhere wrong in my pattern?
>
>
> Are you sure you're trying to match the right pattern?  Try looking at the
> "Optimized legalized selection DAG".  (It looks like legalization inserts
> some bitcasts.)
>

I prefer not to pattern match legalized DAG, since as you noticed, there
are bitcasts. Ideally we only want to match the high level representation.

That said, I tried matching legalized v4i32 and v8i16, they all work.
However, for v2i64 in particular, PPC legalization inserts a load from
constant pool, which is very unpleasant to match:

Legalized selection DAG: BB#0 'test010:'
SelectionDAG has 19 nodes:
  t0: ch = EntryToken
      t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg0
            t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg1
          t12: v4i32 = bitcast t4
                t23: i64,ch = PPCISD::TOC_ENTRY<LD8[GOT]>
TargetConstantPool:i64<<2 x i64> <i64 63, i64 63>> 0, Register:i64 %X2
              t19: v2f64,ch = load<LD16[ConstantPool]> t0, t23, undef:i64
            t20: v2i64 = bitcast t19
          t13: v4i32 = bitcast t20
        t14: v4i32 = and t12, t13
      t15: v2i64 = bitcast t14
    t8: v2i64 = shl t2, t15
  t10: ch,glue = CopyToReg t0, Register:v2i64 %V2, t8
  t11: ch = PPCISD::RET_FLAG t10, Register:v2i64 %V2, t10:1

Is it possible to write a pattern for a pre-legalization DAG? If not, I
think we can still do it in the combiner, but instead of combining `(shl a,
(and b, 31))` to `(shl a, b)`, we can combine it to `(PPCISD::VEC_SHL a,
b)`. What do you think?

>
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170510/b9da097c/attachment.html>