[llvm] r301990 - [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction

Tue May 9 15:43:09 PDT 2017

On 5/9/2017 2:09 PM, Tim Shen wrote:
> On Tue, May 9, 2017 at 12:45 PM Friedman, Eli <efriedma at codeaurora.org 
> <mailto:efriedma at codeaurora.org>> wrote:
>
>     On 5/9/2017 12:19 PM, Tim Shen wrote:
>>     Sorry for missing the email. :/
>>
>>     On Wed, May 3, 2017 at 11:43 AM Friedman, Eli via llvm-commits
>>     <llvm-commits at lists.llvm.org
>>     <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>>         On 5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:
>>         > Author: timshen
>>         > Date: Tue May  2 19:07:02 2017
>>         > New Revision: 301990
>>         >
>>         > URL: http://llvm.org/viewvc/llvm-project?rev=301990&view=rev
>>         > Log:
>>         > [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back
>>         to a single instruction
>>         >
>>         > Summary:
>>         > This is the corresponding llvm change to D28037 to ensure
>>         no performance
>>         > regression.
>>         >
>>         > Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
>>         >
>>         > Subscribers: nemanjai, llvm-commits
>>         >
>>         > Differential Revision: https://reviews.llvm.org/D28329
>>         >
>>         > Modified:
>>         > llvm/trunk/include/llvm/Target/TargetLowering.h
>>         > llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>         > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
>>         > llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll
>>         >
>>         > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>>         > URL:
>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff
>>         >
>>         ==============================================================================
>>         > --- llvm/trunk/include/llvm/Target/TargetLowering.h (original)
>>         > +++ llvm/trunk/include/llvm/Target/TargetLowering.h Tue
>>         May  2 19:07:02 2017
>>         > @@ -2061,6 +2061,14 @@ public:
>>         >       return false;
>>         >     }
>>         >
>>         > +  // Return true if the instruction that performs a << b
>>         actually performs
>>         > +  // a << (b % (sizeof(a) * 8)).
>>         > +  virtual bool supportsModuloShift(ISD::NodeType Inst, EVT
>>         ReturnType) const {
>>         > +    assert((Inst == ISD::SHL || Inst == ISD::SRA || Inst
>>         == ISD::SRL) &&
>>         > +           "Expect a shift instruction");
>>         > +    return false;
>>         > +  }
>>         > +
>>         >
>>          //===--------------------------------------------------------------------===//
>>         >     // Runtime Library hooks
>>         >     //
>>         >
>>         > Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>         > URL:
>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff
>>         >
>>         ==============================================================================
>>         > --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>         (original)
>>         > +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue
>>         May  2 19:07:02 2017
>>         > @@ -5294,6 +5294,17 @@ SDValue DAGCombiner::visitSHL(SDNode *N)
>>         >       }
>>         >     }
>>         >
>>         > +  // If the target supports masking y in (shl, y),
>>         > +  // fold (shl x, (and y, ((1 << numbits(x)) - 1))) ->
>>         (shl x, y)
>>         > +  if (TLI.isOperationLegal(ISD::SHL, VT) &&
>>         > +      TLI.supportsModuloShift(ISD::SHL, VT) &&
>>         N1->getOpcode() == ISD::AND) {
>>         > +    if (ConstantSDNode *Mask =
>>         isConstOrConstSplat(N1->getOperand(1))) {
>>         > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>         > +        return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
>>         N1->getOperand(0));
>>         > +      }
>>         > +    }
>>         > +  }
>>         > +
>>         >     ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>         >
>>         >     // fold (shl c1, c2) -> c1<<c2
>>         > @@ -5492,6 +5503,17 @@ SDValue DAGCombiner::visitSRA(SDNode *N)
>>         >     EVT VT = N0.getValueType();
>>         >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>         >
>>         > +  // If the target supports masking y in (sra, y),
>>         > +  // fold (sra x, (and y, ((1 << numbits(x)) - 1))) ->
>>         (sra x, y)
>>         > +  if (TLI.isOperationLegal(ISD::SRA, VT) &&
>>         > +      TLI.supportsModuloShift(ISD::SRA, VT) &&
>>         N1->getOpcode() == ISD::AND) {
>>         > +    if (ConstantSDNode *Mask =
>>         isConstOrConstSplat(N1->getOperand(1))) {
>>         > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>         > +        return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0,
>>         N1->getOperand(0));
>>         > +      }
>>         > +    }
>>         > +  }
>>         > +
>>         >     // Arithmetic shifting an all-sign-bit value is a no-op.
>>         >     if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)
>>         >       return N0;
>>         > @@ -5650,6 +5672,17 @@ SDValue DAGCombiner::visitSRL(SDNode *N)
>>         >     EVT VT = N0.getValueType();
>>         >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>         >
>>         > +  // If the target supports masking y in (srl, y),
>>         > +  // fold (srl x, (and y, ((1 << numbits(x)) - 1))) ->
>>         (srl x, y)
>>         > +  if (TLI.isOperationLegal(ISD::SRL, VT) &&
>>         > +      TLI.supportsModuloShift(ISD::SRL, VT) &&
>>         N1->getOpcode() == ISD::AND) {
>>         > +    if (ConstantSDNode *Mask =
>>         isConstOrConstSplat(N1->getOperand(1))) {
>>         > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>         > +        return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0,
>>         N1->getOperand(0));
>>         > +      }
>>         > +    }
>>         > +  }
>>
>>         I see three problems here:
>>
>>         1. It needs to be documented.
>>
>>
>>     I wrote comments on the transformation and the TLI hook. What
>>     kind of documentation do you expect?
>
>     A comment in ISDOpcodes.h.
>
>
>>         2. This isn't consistent with other transforms done by
>>         DAGCombine, e.g.
>>         constant folding won't honor this.
>>
>>
>>     How so? Although it runs before constant folding (e.g. (shl c1,
>>     c2) -> c1<<c2), constant folding will pick it up during the next
>>     iteration, won't it?
>
>     Suppose you have "SHL(1, AND(y, 31))".  We fold it to "SHL(1, y)"
>     because supportsModuloShift is true.  Then we figure out that y is
>     32.  Then we constant-fold "SHL(1, 32)" to zero (because that's
>     the default behavior for APInt).
>
>     And that's only the one I thought of off the top of my head; there
>     are probably other transforms which assume the shift amount is in
>     range.
>
>
> Good point. I think we should write patterns in .td files as you 
> suggested.
>
> The only problem I got is that, I don't know how to write a working 
> pattern for the following DAG:
>
> ; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, 
> i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
> ; %shl = shl <16 x i8> %a, %rem
> Initial selection DAG: BB#0 'test010:'
> SelectionDAG has 12 nodes:
>   t0: ch = EntryToken
>       t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0
>         t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1
>         t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>, 
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, 
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, 
> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, 
> Constant:i8<7>, Constant:i8<7>
>       t7: v16i8 = and t4, t6
>     t8: v16i8 = shl t2, t7
>   t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8
>   t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1
>
> I tried the patterns:
> def : Pat<(v16i8 (shl v16i8:$vA,
>                  (and v16i8:$vB,
>                       (v16i8 (build_vector (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8 7),
>                                            (i8 7), (i8 7), (i8 7), (i8 
> 7)))))),
>           (v16i8 (VSLB $vA, $vB))>;
>
> and
> def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),
>           (v16i8 (VSLB $vA, $vB))>;
>
> but they don't match the code above.
>
> Do you see anywhere wrong in my pattern?

Are you sure you're trying to match the right pattern?  Try looking at 
the "Optimized legalized selection DAG".  (It looks like legalization 
inserts some bitcasts.)

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170509/479dff90/attachment.html>