[llvm] r301990 - [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction
Friedman, Eli via llvm-commits
llvm-commits at lists.llvm.org
Wed May 10 12:06:00 PDT 2017
On 5/10/2017 11:13 AM, Tim Shen wrote:
> On Tue, May 9, 2017 at 3:43 PM Friedman, Eli <efriedma at codeaurora.org
> <mailto:efriedma at codeaurora.org>> wrote:
>
> On 5/9/2017 2:09 PM, Tim Shen wrote:
>> On Tue, May 9, 2017 at 12:45 PM Friedman, Eli
>> <efriedma at codeaurora.org <mailto:efriedma at codeaurora.org>> wrote:
>>
>> On 5/9/2017 12:19 PM, Tim Shen wrote:
>>> Sorry for missing the email. :/
>>>
>>> On Wed, May 3, 2017 at 11:43 AM Friedman, Eli via
>>> llvm-commits <llvm-commits at lists.llvm.org
>>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>
>>> On 5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:
>>> > Author: timshen
>>> > Date: Tue May 2 19:07:02 2017
>>> > New Revision: 301990
>>> >
>>> > URL:
>>> http://llvm.org/viewvc/llvm-project?rev=301990&view=rev
>>> > Log:
>>> > [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8))
>>> back to a single instruction
>>> >
>>> > Summary:
>>> > This is the corresponding llvm change to D28037 to
>>> ensure no performance
>>> > regression.
>>> >
>>> > Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
>>> >
>>> > Subscribers: nemanjai, llvm-commits
>>> >
>>> > Differential Revision: https://reviews.llvm.org/D28329
>>> >
>>> > Modified:
>>> > llvm/trunk/include/llvm/Target/TargetLowering.h
>>> > llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> > llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
>>> > llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll
>>> >
>>> > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
>>> > URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff
>>> >
>>> ==============================================================================
>>> > --- llvm/trunk/include/llvm/Target/TargetLowering.h
>>> (original)
>>> > +++ llvm/trunk/include/llvm/Target/TargetLowering.h
>>> Tue May 2 19:07:02 2017
>>> > @@ -2061,6 +2061,14 @@ public:
>>> > return false;
>>> > }
>>> >
>>> > + // Return true if the instruction that performs a
>>> << b actually performs
>>> > + // a << (b % (sizeof(a) * 8)).
>>> > + virtual bool supportsModuloShift(ISD::NodeType
>>> Inst, EVT ReturnType) const {
>>> > + assert((Inst == ISD::SHL || Inst == ISD::SRA ||
>>> Inst == ISD::SRL) &&
>>> > + "Expect a shift instruction");
>>> > + return false;
>>> > + }
>>> > +
>>> >
>>> //===--------------------------------------------------------------------===//
>>> > // Runtime Library hooks
>>> > //
>>> >
>>> > Modified:
>>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> > URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff
>>> >
>>> ==============================================================================
>>> > ---
>>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> (original)
>>> > +++
>>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Tue
>>> May 2 19:07:02 2017
>>> > @@ -5294,6 +5294,17 @@ SDValue
>>> DAGCombiner::visitSHL(SDNode *N)
>>> > }
>>> > }
>>> >
>>> > + // If the target supports masking y in (shl, y),
>>> > + // fold (shl x, (and y, ((1 << numbits(x)) - 1)))
>>> -> (shl x, y)
>>> > + if (TLI.isOperationLegal(ISD::SHL, VT) &&
>>> > + TLI.supportsModuloShift(ISD::SHL, VT) &&
>>> N1->getOpcode() == ISD::AND) {
>>> > + if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > + if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > + return DAG.getNode(ISD::SHL, SDLoc(N), VT,
>>> N0, N1->getOperand(0));
>>> > + }
>>> > + }
>>> > + }
>>> > +
>>> > ConstantSDNode *N1C = isConstOrConstSplat(N1);
>>> >
>>> > // fold (shl c1, c2) -> c1<<c2
>>> > @@ -5492,6 +5503,17 @@ SDValue
>>> DAGCombiner::visitSRA(SDNode *N)
>>> > EVT VT = N0.getValueType();
>>> > unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>> >
>>> > + // If the target supports masking y in (sra, y),
>>> > + // fold (sra x, (and y, ((1 << numbits(x)) - 1)))
>>> -> (sra x, y)
>>> > + if (TLI.isOperationLegal(ISD::SRA, VT) &&
>>> > + TLI.supportsModuloShift(ISD::SRA, VT) &&
>>> N1->getOpcode() == ISD::AND) {
>>> > + if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > + if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > + return DAG.getNode(ISD::SRA, SDLoc(N), VT,
>>> N0, N1->getOperand(0));
>>> > + }
>>> > + }
>>> > + }
>>> > +
>>> > // Arithmetic shifting an all-sign-bit value is a
>>> no-op.
>>> > if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)
>>> > return N0;
>>> > @@ -5650,6 +5672,17 @@ SDValue
>>> DAGCombiner::visitSRL(SDNode *N)
>>> > EVT VT = N0.getValueType();
>>> > unsigned OpSizeInBits = VT.getScalarSizeInBits();
>>> >
>>> > + // If the target supports masking y in (srl, y),
>>> > + // fold (srl x, (and y, ((1 << numbits(x)) - 1)))
>>> -> (srl x, y)
>>> > + if (TLI.isOperationLegal(ISD::SRL, VT) &&
>>> > + TLI.supportsModuloShift(ISD::SRL, VT) &&
>>> N1->getOpcode() == ISD::AND) {
>>> > + if (ConstantSDNode *Mask =
>>> isConstOrConstSplat(N1->getOperand(1))) {
>>> > + if (Mask->getZExtValue() == OpSizeInBits - 1) {
>>> > + return DAG.getNode(ISD::SRL, SDLoc(N), VT,
>>> N0, N1->getOperand(0));
>>> > + }
>>> > + }
>>> > + }
>>>
>>> I see three problems here:
>>>
>>> 1. It needs to be documented.
>>>
>>>
>>> I wrote comments on the transformation and the TLI hook.
>>> What kind of documentation do you expect?
>>
>> A comment in ISDOpcodes.h.
>>
>>
>>> 2. This isn't consistent with other transforms done by
>>> DAGCombine, e.g.
>>> constant folding won't honor this.
>>>
>>>
>>> How so? Although it runs before constant folding (e.g. (shl
>>> c1, c2) -> c1<<c2), constant folding will pick it up during
>>> the next iteration, won't it?
>>
>> Suppose you have "SHL(1, AND(y, 31))". We fold it to "SHL(1,
>> y)" because supportsModuloShift is true. Then we figure out
>> that y is 32. Then we constant-fold "SHL(1, 32)" to zero
>> (because that's the default behavior for APInt).
>>
>> And that's only the one I thought of off the top of my head;
>> there are probably other transforms which assume the shift
>> amount is in range.
>>
>>
>> Good point. I think we should write patterns in .td files as you
>> suggested.
>>
>> The only problem I got is that, I don't know how to write a
>> working pattern for the following DAG:
>>
>> ; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,
>> i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
>> ; %shl = shl <16 x i8> %a, %rem
>> Initial selection DAG: BB#0 'test010:'
>> SelectionDAG has 12 nodes:
>> t0: ch = EntryToken
>> t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0
>> t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1
>> t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>,
>> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>> Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>,
>> Constant:i8<7>, Constant:i8<7>
>> t7: v16i8 = and t4, t6
>> t8: v16i8 = shl t2, t7
>> t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8
>> t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1
>>
>> I tried the patterns:
>> def : Pat<(v16i8 (shl v16i8:$vA,
>> (and v16i8:$vB,
>> (v16i8 (build_vector (i8 7), (i8 7), (i8
>> 7), (i8 7),
>> (i8 7), (i8 7), (i8 7), (i8 7),
>> (i8 7), (i8 7), (i8 7), (i8 7),
>> (i8 7), (i8 7), (i8 7), (i8 7)))))),
>> (v16i8 (VSLB $vA, $vB))>;
>>
>> and
>> def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),
>> (v16i8 (VSLB $vA, $vB))>;
>>
>> but they don't match the code above.
>>
>> Do you see anywhere wrong in my pattern?
>
> Are you sure you're trying to match the right pattern? Try
> looking at the "Optimized legalized selection DAG". (It looks
> like legalization inserts some bitcasts.)
>
>
> I prefer not to pattern match legalized DAG, since as you noticed,
> there are bitcasts. Ideally we only want to match the high level
> representation.
>
> That said, I tried matching legalized v4i32 and v8i16, they all work.
> However, for v2i64 in particular, PPC legalization inserts a load from
> constant pool, which is very unpleasant to match:
>
> Legalized selection DAG: BB#0 'test010:'
> SelectionDAG has 19 nodes:
> t0: ch = EntryToken
> t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg0
> t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg1
> t12: v4i32 = bitcast t4
> t23: i64,ch = PPCISD::TOC_ENTRY<LD8[GOT]>
> TargetConstantPool:i64<<2 x i64> <i64 63, i64 63>> 0, Register:i64 %X2
> t19: v2f64,ch = load<LD16[ConstantPool]> t0, t23, undef:i64
> t20: v2i64 = bitcast t19
> t13: v4i32 = bitcast t20
> t14: v4i32 = and t12, t13
> t15: v2i64 = bitcast t14
> t8: v2i64 = shl t2, t15
> t10: ch,glue = CopyToReg t0, Register:v2i64 %V2, t8
> t11: ch = PPCISD::RET_FLAG t10, Register:v2i64 %V2, t10:1
>
> Is it possible to write a pattern for a pre-legalization DAG? If not,
> I think we can still do it in the combiner, but instead of combining
> `(shl a, (and b, 31))` to `(shl a, b)`, we can combine it to
> `(PPCISD::VEC_SHL a, b)`. What do you think?
Patterns only apply after legalization. You can write a
target-specific DAGCombine, like you've outlined.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170510/8970c651/attachment.html>
More information about the llvm-commits
mailing list