[llvm] r301990 - [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction

Wed May 10 12:06:00 PDT 2017

On 5/10/2017 11:13 AM, Tim Shen wrote:
> On Tue, May 9, 2017 at 3:43 PM Friedman, Eli <efriedma at codeaurora.org 
> <mailto:efriedma at codeaurora.org>> wrote:
>
>     On 5/9/2017 2:09 PM, Tim Shen wrote:
>>     On Tue, May 9, 2017 at 12:45 PM Friedman, Eli
>>     <efriedma at codeaurora.org <mailto:efriedma at codeaurora.org>> wrote:
>>
>>         On 5/9/2017 12:19 PM, Tim Shen wrote:
>>>         Sorry for missing the email. :/
>>>
>>>         On Wed, May 3, 2017 at 11:43 AM Friedman, Eli via
>>>         llvm-commits <llvm-commits at lists.llvm.org
>>>         <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>
>>>             On 5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:
>>>             > Author: timshen
>>>             > Date: Tue May  2 19:07:02 2017
>>>             > New Revision: 301990
>>>             >
>>>             > URL:
>>>             http://llvm.org/viewvc/llvm-project?rev=301990&view=rev
>>>             > Log:
>>>             > [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8))
>>>             back to a single instruction
>>>             >
>>>             > Summary:
>>>             > This is the corresponding llvm change to D28037 to
>>>             ensure no performance
>>>             > regression.
>>>             >
>>>             > Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
>>>             >
>>>             > Subscribers: nemanjai, llvm-commits
>>>             >
>>>             > Differential Revision: https://reviews.llvm.org/D28329
>>>             >
>>>             > Modified:
>>>             > llvm/trunk/include/llvm/Target/TargetLowering.h
>>>             > llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>             > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
>>>             > llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll
>>>             >
>>>             > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>>>             > URL:
>>>             http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff
>>>             >
>>>             ==============================================================================
>>>             > --- llvm/trunk/include/llvm/Target/TargetLowering.h
>>>             (original)
>>>             > +++ llvm/trunk/include/llvm/Target/TargetLowering.h
>>>             Tue May  2 19:07:02 2017
>>>             > @@ -2061,6 +2061,14 @@ public:
>>>             >       return false;
>>>             >     }
>>>             >
>>>             > +  // Return true if the instruction that performs a
>>>             << b actually performs
>>>             > +  // a << (b % (sizeof(a) * 8)).
>>>             > +  virtual bool supportsModuloShift(ISD::NodeType
>>>             Inst, EVT ReturnType) const {
>>>             > +    assert((Inst == ISD::SHL || Inst == ISD::SRA ||
>>>             Inst == ISD::SRL) &&
>>>             > +           "Expect a shift instruction");
>>>             > +    return false;
>>>             > +  }
>>>             > +
>>>             >
>>>              //===--------------------------------------------------------------------===//
>>>             >     // Runtime Library hooks
>>>             >     //
>>>             >
>>>             > Modified:
>>>             llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>             > URL:
>>>             http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff
>>>             >
>>>             ==============================================================================
>>>             > ---
>>>             llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>             (original)
>>>             > +++
>>>             llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue
>>>             May  2 19:07:02 2017
>>>             > @@ -5294,6 +5294,17 @@ SDValue
>>>             DAGCombiner::visitSHL(SDNode *N)
>>>             >       }
>>>             >     }
>>>             >
>>>             > +  // If the target supports masking y in (shl, y),
>>>             > +  // fold (shl x, (and y, ((1 << numbits(x)) - 1)))
>>>             -> (shl x, y)
>>>             > +  if (TLI.isOperationLegal(ISD::SHL, VT) &&
>>>             > + TLI.supportsModuloShift(ISD::SHL, VT) &&
>>>             N1->getOpcode() == ISD::AND) {
>>>             > +    if (ConstantSDNode *Mask =
>>>             isConstOrConstSplat(N1->getOperand(1))) {
>>>             > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>>             > +        return DAG.getNode(ISD::SHL, SDLoc(N), VT,
>>>             N0, N1->getOperand(0));
>>>             > +      }
>>>             > +    }
>>>             > +  }
>>>             > +
>>>             >     ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>>             >
>>>             >     // fold (shl c1, c2) -> c1<<c2
>>>             > @@ -5492,6 +5503,17 @@ SDValue
>>>             DAGCombiner::visitSRA(SDNode *N)
>>>             >     EVT VT = N0.getValueType();
>>>             >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>>             >
>>>             > +  // If the target supports masking y in (sra, y),
>>>             > +  // fold (sra x, (and y, ((1 << numbits(x)) - 1)))
>>>             -> (sra x, y)
>>>             > +  if (TLI.isOperationLegal(ISD::SRA, VT) &&
>>>             > + TLI.supportsModuloShift(ISD::SRA, VT) &&
>>>             N1->getOpcode() == ISD::AND) {
>>>             > +    if (ConstantSDNode *Mask =
>>>             isConstOrConstSplat(N1->getOperand(1))) {
>>>             > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>>             > +        return DAG.getNode(ISD::SRA, SDLoc(N), VT,
>>>             N0, N1->getOperand(0));
>>>             > +      }
>>>             > +    }
>>>             > +  }
>>>             > +
>>>             >     // Arithmetic shifting an all-sign-bit value is a
>>>             no-op.
>>>             >     if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)
>>>             >       return N0;
>>>             > @@ -5650,6 +5672,17 @@ SDValue
>>>             DAGCombiner::visitSRL(SDNode *N)
>>>             >     EVT VT = N0.getValueType();
>>>             >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>>             >
>>>             > +  // If the target supports masking y in (srl, y),
>>>             > +  // fold (srl x, (and y, ((1 << numbits(x)) - 1)))
>>>             -> (srl x, y)
>>>             > +  if (TLI.isOperationLegal(ISD::SRL, VT) &&
>>>             > + TLI.supportsModuloShift(ISD::SRL, VT) &&
>>>             N1->getOpcode() == ISD::AND) {
>>>             > +    if (ConstantSDNode *Mask =
>>>             isConstOrConstSplat(N1->getOperand(1))) {
>>>             > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>>             > +        return DAG.getNode(ISD::SRL, SDLoc(N), VT,
>>>             N0, N1->getOperand(0));
>>>             > +      }
>>>             > +    }
>>>             > +  }
>>>
>>>             I see three problems here:
>>>
>>>             1. It needs to be documented.
>>>
>>>
>>>         I wrote comments on the transformation and the TLI hook.
>>>         What kind of documentation do you expect?
>>
>>         A comment in ISDOpcodes.h.
>>
>>
>>>             2. This isn't consistent with other transforms done by
>>>             DAGCombine, e.g.
>>>             constant folding won't honor this.
>>>
>>>
>>>         How so? Although it runs before constant folding (e.g. (shl
>>>         c1, c2) -> c1<<c2), constant folding will pick it up during
>>>         the next iteration, won't it?
>>
>>         Suppose you have "SHL(1, AND(y, 31))".  We fold it to "SHL(1,
>>         y)" because supportsModuloShift is true.  Then we figure out
>>         that y is 32.  Then we constant-fold "SHL(1, 32)" to zero
>>         (because that's the default behavior for APInt).
>>
>>         And that's only the one I thought of off the top of my head;
>>         there are probably other transforms which assume the shift
>>         amount is in range.
>>
>>
>>     Good point. I think we should write patterns in .td files as you
>>     suggested.
>>
>>     The only problem I got is that, I don't know how to write a
>>     working pattern for the following DAG:
>>
>>     ; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,
>>     i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
>>     ; %shl = shl <16 x i8> %a, %rem
>>     Initial selection DAG: BB#0 'test010:'
>>     SelectionDAG has 12 nodes:
>>       t0: ch = EntryToken
>>           t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0
>>             t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1
>>             t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>,
>>     Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>>     Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>>     Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>>     Constant:i8<7>, Constant:i8<7>
>>           t7: v16i8 = and t4, t6
>>         t8: v16i8 = shl t2, t7
>>       t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8
>>       t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1
>>
>>     I tried the patterns:
>>     def : Pat<(v16i8 (shl v16i8:$vA,
>>                      (and v16i8:$vB,
>>                           (v16i8 (build_vector (i8 7), (i8 7), (i8
>>     7), (i8 7),
>>      (i8 7), (i8 7), (i8 7), (i8 7),
>>      (i8 7), (i8 7), (i8 7), (i8 7),
>>      (i8 7), (i8 7), (i8 7), (i8 7)))))),
>>               (v16i8 (VSLB $vA, $vB))>;
>>
>>     and
>>     def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),
>>               (v16i8 (VSLB $vA, $vB))>;
>>
>>     but they don't match the code above.
>>
>>     Do you see anywhere wrong in my pattern?
>
>     Are you sure you're trying to match the right pattern?  Try
>     looking at the "Optimized legalized selection DAG".  (It looks
>     like legalization inserts some bitcasts.)
>
>
> I prefer not to pattern match legalized DAG, since as you noticed, 
> there are bitcasts. Ideally we only want to match the high level 
> representation.
>
> That said, I tried matching legalized v4i32 and v8i16, they all work. 
> However, for v2i64 in particular, PPC legalization inserts a load from 
> constant pool, which is very unpleasant to match:
>
> Legalized selection DAG: BB#0 'test010:'
> SelectionDAG has 19 nodes:
>   t0: ch = EntryToken
>       t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg0
>             t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg1
>           t12: v4i32 = bitcast t4
>                 t23: i64,ch = PPCISD::TOC_ENTRY<LD8[GOT]> 
> TargetConstantPool:i64<<2 x i64> <i64 63, i64 63>> 0, Register:i64 %X2
>               t19: v2f64,ch = load<LD16[ConstantPool]> t0, t23, undef:i64
>             t20: v2i64 = bitcast t19
>           t13: v4i32 = bitcast t20
>         t14: v4i32 = and t12, t13
>       t15: v2i64 = bitcast t14
>     t8: v2i64 = shl t2, t15
>   t10: ch,glue = CopyToReg t0, Register:v2i64 %V2, t8
>   t11: ch = PPCISD::RET_FLAG t10, Register:v2i64 %V2, t10:1
>
> Is it possible to write a pattern for a pre-legalization DAG? If not, 
> I think we can still do it in the combiner, but instead of combining 
> `(shl a, (and b, 31))` to `(shl a, b)`, we can combine it to 
> `(PPCISD::VEC_SHL a, b)`. What do you think?

Patterns only apply after legalization.   You can write a 
target-specific DAGCombine, like you've outlined.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170510/8970c651/attachment.html>