<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, May 9, 2017 at 12:45 PM Friedman, Eli <<a href="mailto:efriedma@codeaurora.org">efriedma@codeaurora.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">
<div class="m_-8130644380894799036moz-cite-prefix">On 5/9/2017 12:19 PM, Tim Shen wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Sorry for missing the email. :/<br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Wed, May 3, 2017 at 11:43 AM Friedman, Eli
via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On
5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:<br>
> Author: timshen<br>
> Date: Tue May 2 19:07:02 2017<br>
> New Revision: 301990<br>
><br>
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=301990&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=301990&view=rev</a><br>
> Log:<br>
> [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a)
* 8)) back to a single instruction<br>
><br>
> Summary:<br>
> This is the corresponding llvm change to D28037 to
ensure no performance<br>
> regression.<br>
><br>
> Reviewers: bogner, kbarton, hfinkel, iteratee, echristo<br>
><br>
> Subscribers: nemanjai, llvm-commits<br>
><br>
> Differential Revision: <a href="https://reviews.llvm.org/D28329" rel="noreferrer" target="_blank">https://reviews.llvm.org/D28329</a><br>
><br>
> Modified:<br>
> llvm/trunk/include/llvm/Target/TargetLowering.h<br>
>
llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>
> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h<br>
> llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll<br>
><br>
> Modified:
llvm/trunk/include/llvm/Target/TargetLowering.h<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff</a><br>
>
==============================================================================<br>
> --- llvm/trunk/include/llvm/Target/TargetLowering.h
(original)<br>
> +++ llvm/trunk/include/llvm/Target/TargetLowering.h Tue
May 2 19:07:02 2017<br>
> @@ -2061,6 +2061,14 @@ public:<br>
> return false;<br>
> }<br>
><br>
> + // Return true if the instruction that performs a
<< b actually performs<br>
> + // a << (b % (sizeof(a) * 8)).<br>
> + virtual bool supportsModuloShift(ISD::NodeType Inst,
EVT ReturnType) const {<br>
> + assert((Inst == ISD::SHL || Inst == ISD::SRA ||
Inst == ISD::SRL) &&<br>
> + "Expect a shift instruction");<br>
> + return false;<br>
> + }<br>
> +<br>
>
//===--------------------------------------------------------------------===//<br>
> // Runtime Library hooks<br>
> //<br>
><br>
> Modified:
llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff</a><br>
>
==============================================================================<br>
> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
(original)<br>
> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Tue May 2 19:07:02 2017<br>
> @@ -5294,6 +5294,17 @@ SDValue
DAGCombiner::visitSHL(SDNode *N)<br>
> }<br>
> }<br>
><br>
> + // If the target supports masking y in (shl, y),<br>
> + // fold (shl x, (and y, ((1 << numbits(x)) -
1))) -> (shl x, y)<br>
> + if (TLI.isOperationLegal(ISD::SHL, VT) &&<br>
> + TLI.supportsModuloShift(ISD::SHL, VT) &&
N1->getOpcode() == ISD::AND) {<br>
> + if (ConstantSDNode *Mask =
isConstOrConstSplat(N1->getOperand(1))) {<br>
> + if (Mask->getZExtValue() == OpSizeInBits - 1)
{<br>
> + return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
N1->getOperand(0));<br>
> + }<br>
> + }<br>
> + }<br>
> +<br>
> ConstantSDNode *N1C = isConstOrConstSplat(N1);<br>
><br>
> // fold (shl c1, c2) -> c1<<c2<br>
> @@ -5492,6 +5503,17 @@ SDValue
DAGCombiner::visitSRA(SDNode *N)<br>
> EVT VT = N0.getValueType();<br>
> unsigned OpSizeInBits = VT.getScalarSizeInBits();<br>
><br>
> + // If the target supports masking y in (sra, y),<br>
> + // fold (sra x, (and y, ((1 << numbits(x)) -
1))) -> (sra x, y)<br>
> + if (TLI.isOperationLegal(ISD::SRA, VT) &&<br>
> + TLI.supportsModuloShift(ISD::SRA, VT) &&
N1->getOpcode() == ISD::AND) {<br>
> + if (ConstantSDNode *Mask =
isConstOrConstSplat(N1->getOperand(1))) {<br>
> + if (Mask->getZExtValue() == OpSizeInBits - 1)
{<br>
> + return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0,
N1->getOperand(0));<br>
> + }<br>
> + }<br>
> + }<br>
> +<br>
> // Arithmetic shifting an all-sign-bit value is a
no-op.<br>
> if (DAG.ComputeNumSignBits(N0) == OpSizeInBits)<br>
> return N0;<br>
> @@ -5650,6 +5672,17 @@ SDValue
DAGCombiner::visitSRL(SDNode *N)<br>
> EVT VT = N0.getValueType();<br>
> unsigned OpSizeInBits = VT.getScalarSizeInBits();<br>
><br>
> + // If the target supports masking y in (srl, y),<br>
> + // fold (srl x, (and y, ((1 << numbits(x)) -
1))) -> (srl x, y)<br>
> + if (TLI.isOperationLegal(ISD::SRL, VT) &&<br>
> + TLI.supportsModuloShift(ISD::SRL, VT) &&
N1->getOpcode() == ISD::AND) {<br>
> + if (ConstantSDNode *Mask =
isConstOrConstSplat(N1->getOperand(1))) {<br>
> + if (Mask->getZExtValue() == OpSizeInBits - 1)
{<br>
> + return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0,
N1->getOperand(0));<br>
> + }<br>
> + }<br>
> + }<br>
<br>
I see three problems here:<br>
<br>
1. It needs to be documented.<br>
</blockquote>
<div><br>
</div>
<div>I wrote comments on the transformation and the TLI hook.
What kind of documentation do you expect?</div>
</div>
</div>
</blockquote>
<br></div><div text="#000000" bgcolor="#FFFFFF">
A comment in ISDOpcodes.h.</div><div text="#000000" bgcolor="#FFFFFF"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2. This isn't consistent with other transforms done by
DAGCombine, e.g.<br>
constant folding won't honor this.<br>
</blockquote>
<div><br>
</div>
<div>How so? Although it runs before constant folding
(e.g. (shl c1, c2) -> c1<<c2), constant folding
will pick it up during the next iteration, won't it?</div>
</div>
</div>
</blockquote>
<br></div><div text="#000000" bgcolor="#FFFFFF">
Suppose you have "SHL(1, AND(y, 31))". We fold it to "SHL(1, y)"
because supportsModuloShift is true. Then we figure out that y is
32. Then we constant-fold "SHL(1, 32)" to zero (because that's the
default behavior for APInt).<br>
<br>
And that's only the one I thought of off the top of my head; there
are probably other transforms which assume the shift amount is in
range.</div></blockquote><div><br></div><div>Good point. I think we should write patterns in .td files as you suggested.</div><div><br></div><div>The only problem I got is that, I don't know how to write a working pattern for the following DAG:</div><div><br></div><div>; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7></div><div>; %shl = shl <16 x i8> %a, %rem</div><div><div>Initial selection DAG: BB#0 'test010:'</div><div>SelectionDAG has 12 nodes:</div><div> t0: ch = EntryToken</div><div> t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg0</div><div> t4: v16i8,ch = CopyFromReg t0, Register:v16i8 %vreg1</div><div> t6: v16i8 = BUILD_VECTOR Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7>, Constant:i8<7></div><div> t7: v16i8 = and t4, t6</div><div> t8: v16i8 = shl t2, t7</div><div> t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8</div><div> t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2, t10:1</div></div><div><br></div><div>I tried the patterns:</div><div><div><div>def : Pat<(v16i8 (shl v16i8:$vA,</div><div> (and v16i8:$vB,</div><div> (v16i8 (build_vector (i8 7), (i8 7), (i8 7), (i8 7),</div><div> (i8 7), (i8 7), (i8 7), (i8 7),</div><div> (i8 7), (i8 7), (i8 7), (i8 7),</div><div> (i8 7), (i8 7), (i8 7), (i8 7)))))),</div><div> (v16i8 (VSLB $vA, $vB))>;</div></div></div><div><br></div><div>and</div><div><div>def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB, (v16i8 7)))),</div><div> (v16i8 (VSLB $vA, $vB))>;</div></div><div><br></div><div>but they don't match the code above.</div><div><br></div><div>Do you see anywhere wrong in my pattern?</div><div><br></div><div>Thanks!</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
3. I'm not sure we want to do this; are we actually getting
any benefit<br>
here over letting targets pattern-match the pattern?<br>
</blockquote>
<div><br>
</div>
<div>At least for both PowerPC and ARM, they have the modulo
behavior:</div>
<div><br>
</div>
<div>For PowerPC see the OpenPower ISA 3.0.</div>
<div><br>
</div>
<div>For ARM, I found this:</div>
<div><a href="https://developer.arm.com/docs/den0024/latest/6-the-a64-instruction-set/62-data-processing-instructions/623-shift-operations" target="_blank">https://developer.arm.com/docs/den0024/latest/6-the-a64-instruction-set/62-data-processing-instructions/623-shift-operations</a></div>
<div><br>
</div>
<div>Ideally someone can turn it on for ARM, but until now I
didn't know ARM's behavior.</div>
<div> </div>
</div>
</div>
</blockquote>
<br></div><div text="#000000" bgcolor="#FFFFFF">
That wasn't what I was trying to say. I mean, what benefit does
this have that you don't get if you just write a pattern like "def :
Pat<(shl x:v4i32, (and y:v4i32, (splat 31)), (VSLW x, y)>;".<br>
<br>
I'm also concerned that most people will assume ISD::SHL has the
same semantics as LLVM IR shl for out-of-range shifts, so future
changes are likely to break this.</div><div text="#000000" bgcolor="#FFFFFF"><br>
<br>
-Eli<br>
<pre class="m_-8130644380894799036moz-signature" cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
</div></blockquote></div></div>