[llvm] r301990 - [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction

Tue May 9 14:09:20 PDT 2017

On Tue, May 9, 2017 at 12:45 PM Friedman, Eli <efriedma at codeaurora.org>
wrote:

> On 5/9/2017 12:19 PM, Tim Shen wrote:
>
> Sorry for missing the email. :/
>
> On Wed, May 3, 2017 at 11:43 AM Friedman, Eli via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> On 5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:
>> > Author: timshen
>> > Date: Tue May  2 19:07:02 2017
>> > New Revision: 301990
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=301990&view=rev
>> > Log:
>> > [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single
>> instruction
>> >
>> > Summary:
>> > This is the corresponding llvm change to D28037 to ensure no performance
>> > regression.
>> >
>> > Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
>> >
>> > Subscribers: nemanjai, llvm-commits
>> >
>> > Differential Revision: https://reviews.llvm.org/D28329
>> >
>> > Modified:
>> >      llvm/trunk/include/llvm/Target/TargetLowering.h
>> >      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> >      llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
>> >      llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll
>> >
>> > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>> > URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff
>> >
>> ==============================================================================
>> > --- llvm/trunk/include/llvm/Target/TargetLowering.h (original)
>> > +++ llvm/trunk/include/llvm/Target/TargetLowering.h Tue May  2 19:07:02
>> 2017
>> > @@ -2061,6 +2061,14 @@ public:
>> >       return false;
>> >     }
>> >
>> > +  // Return true if the instruction that performs a << b actually
>> performs
>> > +  // a << (b % (sizeof(a) * 8)).
>> > +  virtual bool supportsModuloShift(ISD::NodeType Inst, EVT ReturnType)
>> const {
>> > +    assert((Inst == ISD::SHL || Inst == ISD::SRA || Inst == ISD::SRL)
>> &&
>> > +           "Expect a shift instruction");
>> > +    return false;
>> > +  }
>> > +
>> >
>>  //===--------------------------------------------------------------------===//
>> >     // Runtime Library hooks
>> >     //
>> >
>> > Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> > URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff
>> >
>> ==============================================================================
>> > --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>> > +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue May  2
>> 19:07:02 2017
>> > @@ -5294,6 +5294,17 @@ SDValue DAGCombiner::visitSHL(SDNode *N)
>> >       }
>> >     }
>> >
>> > +  // If the target supports masking y in (shl, y),
>> > +  // fold (shl x, (and y, ((1 << numbits(x)) - 1))) -> (shl x, y)
>> > +  if (TLI.isOperationLegal(ISD::SHL, VT) &&
>> > +      TLI.supportsModuloShift(ISD::SHL, VT) && N1->getOpcode() ==
>> ISD::AND) {
>> > +    if (ConstantSDNode *Mask = isConstOrConstSplat(N1->getOperand(1)))
>> {
>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>> > +        return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
>> N1->getOperand(0));
>> > +      }
>> > +    }
>> > +  }
>> > +
>> >     ConstantSDNode *N1C = isConstOrConstSplat(N1);
>> >
>> >     // fold (shl c1, c2) -> c1<<c2
>> > @@ -5492,6 +5503,17 @@ SDValue DAGCombiner::visitSRA(SDNode *N)
>> >     EVT VT = N0.getValueType();
>> >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>> >
>> > +  // If the target supports masking y in (sra, y),
>> > +  // fold (sra x, (and y, ((1 << numbits(x)) - 1))) -> (sra x, y)
>> > +  if (TLI.isOperationLegal(ISD::SRA, VT) &&
>> > +      TLI.supportsModuloShift(ISD::SRA, VT) && N1->getOpcode() ==
>> ISD::AND) {
>> > +    if (ConstantSDNode *Mask = isConstOrConstSplat(N1->getOperand(1)))
>> {
>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>> > +        return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0,
>> N1->getOperand(0));
>> > +      }
>> > +    }
>> > +  }
>> > +
>> >     // Arithmetic shifting an all-sign-bit value is a no-op.
>> >     if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)
>> >       return N0;
>> > @@ -5650,6 +5672,17 @@ SDValue DAGCombiner::visitSRL(SDNode *N)
>> >     EVT VT = N0.getValueType();
>> >     unsigned OpSizeInBits = VT.getScalarSizeInBits();
>> >
>> > +  // If the target supports masking y in (srl, y),
>> > +  // fold (srl x, (and y, ((1 << numbits(x)) - 1))) -> (srl x, y)
>> > +  if (TLI.isOperationLegal(ISD::SRL, VT) &&
>> > +      TLI.supportsModuloShift(ISD::SRL, VT) && N1->getOpcode() ==
>> ISD::AND) {
>> > +    if (ConstantSDNode *Mask = isConstOrConstSplat(N1->getOperand(1)))
>> {
>> > +      if (Mask->getZExtValue() == OpSizeInBits - 1) {
>> > +        return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0,
>> N1->getOperand(0));
>> > +      }
>> > +    }
>> > +  }
>>
>> I see three problems here:
>>
>> 1. It needs to be documented.
>>
>
> I wrote comments on the transformation and the TLI hook. What kind of
> documentation do you expect?
>
>
> A comment in ISDOpcodes.h.
>
>
>
>
>> 2. This isn't consistent with other transforms done by DAGCombine, e.g.
>> constant folding won't honor this.
>>
>
> How so? Although it runs before constant folding (e.g. (shl c1, c2) ->
> c1<<c2), constant folding will pick it up during the next iteration, won't
> it?
>
>
> Suppose you have "SHL(1, AND(y, 31))".  We fold it to "SHL(1, y)" because
> supportsModuloShift is true.  Then we figure out that y is 32.  Then we
> constant-fold "SHL(1, 32)" to zero (because that's the default behavior for
> APInt).
>
> And that's only the one I thought of off the top of my head; there are
> probably other transforms which assume the shift amount is in range.
>

Good point. I think we should write patterns in .td files as you suggested.

The only problem I got is that, I don't know how to write a working pattern
for the following DAG:

; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,
i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; %shl = shl <16 x i8> %a, %rem
Initial selection DAG: BB#0 'test010:'
SelectionDAG has 12 nodes:
  t0: ch = EntryToken
      t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0
        t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1
        t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>,
Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
Constant:i8<7>, Constant:i8<7>
      t7: v16i8 = and t4, t6
    t8: v16i8 = shl t2, t7
  t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8
  t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1

I tried the patterns:
def : Pat<(v16i8 (shl v16i8:$vA,
                 (and v16i8:$vB,
                      (v16i8 (build_vector (i8 7), (i8 7), (i8 7), (i8 7),
                                           (i8 7), (i8 7), (i8 7), (i8 7),
                                           (i8 7), (i8 7), (i8 7), (i8 7),
                                           (i8 7), (i8 7), (i8 7), (i8
7)))))),
          (v16i8 (VSLB $vA, $vB))>;

and
def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),
          (v16i8 (VSLB $vA, $vB))>;

but they don't match the code above.

Do you see anywhere wrong in my pattern?

Thanks!

>
>
>
>
>> 3. I'm not sure we want to do this; are we actually getting any benefit
>> here over letting targets pattern-match the pattern?
>>
>
> At least for both PowerPC and ARM, they have the modulo behavior:
>
> For PowerPC see the OpenPower ISA 3.0.
>
> For ARM, I found this:
>
> https://developer.arm.com/docs/den0024/latest/6-the-a64-instruction-set/62-data-processing-instructions/623-shift-operations
>
> Ideally someone can turn it on for ARM, but until now I didn't know ARM's
> behavior.
>
>
>
> That wasn't what I was trying to say.  I mean, what benefit does this have
> that you don't get if you just write a pattern like "def : Pat<(shl
> x:v4i32, (and y:v4i32, (splat 31)), (VSLW x, y)>;".
>
> I'm also concerned that most people will assume ISD::SHL has the same
> semantics as LLVM IR shl for out-of-range shifts, so future changes are
> likely to break this.
>
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170509/346191c7/attachment-0001.html>