<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, May 9, 2017 at 3:43 PM Friedman, Eli <<a href="mailto:efriedma@codeaurora.org">efriedma@codeaurora.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">

    <div class="m_-6804385229949104446moz-cite-prefix">On 5/9/2017 2:09 PM, Tim Shen wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div dir="ltr">On Tue, May 9, 2017 at 12:45 PM Friedman, Eli

            <<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div text="#000000" bgcolor="#FFFFFF">

              <div class="m_-6804385229949104446m_-8130644380894799036moz-cite-prefix">On

                5/9/2017 12:19 PM, Tim Shen wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">Sorry for missing the email. :/<br>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr">On Wed, May 3, 2017 at 11:43 AM

                      Friedman, Eli via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On

                      5/2/2017 5:07 PM, Tim Shen via llvm-commits wrote:<br>

                      > Author: timshen<br>

                      > Date: Tue May  2 19:07:02 2017<br>

                      > New Revision: 301990<br>

                      ><br>

                      > URL: <a href="http://llvm.org/viewvc/llvm-project?rev=301990&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=301990&view=rev</a><br>

                      > Log:<br>

                      > [PowerPC, DAGCombiner] Fold a << (b %

                      (sizeof(a) * 8)) back to a single instruction<br>

                      ><br>

                      > Summary:<br>

                      > This is the corresponding llvm change to

                      D28037 to ensure no performance<br>

                      > regression.<br>

                      ><br>

                      > Reviewers: bogner, kbarton, hfinkel,

                      iteratee, echristo<br>

                      ><br>

                      > Subscribers: nemanjai, llvm-commits<br>

                      ><br>

                      > Differential Revision: <a href="https://reviews.llvm.org/D28329" rel="noreferrer" target="_blank">https://reviews.llvm.org/D28329</a><br>

                      ><br>

                      > Modified:<br>

                      >     

                      llvm/trunk/include/llvm/Target/TargetLowering.h<br>

                      >     

                      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>

                      >     

                      llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h<br>

                      >     

                      llvm/trunk/test/CodeGen/PowerPC/shift_mask.ll<br>

                      ><br>

                      > Modified:

                      llvm/trunk/include/llvm/Target/TargetLowering.h<br>

                      > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=301990&r1=301989&r2=301990&view=diff</a><br>

                      >

==============================================================================<br>

                      > ---

                      llvm/trunk/include/llvm/Target/TargetLowering.h

                      (original)<br>

                      > +++

                      llvm/trunk/include/llvm/Target/TargetLowering.h

                      Tue May  2 19:07:02 2017<br>

                      > @@ -2061,6 +2061,14 @@ public:<br>

                      >       return false;<br>

                      >     }<br>

                      ><br>

                      > +  // Return true if the instruction that

                      performs a << b actually performs<br>

                      > +  // a << (b % (sizeof(a) * 8)).<br>

                      > +  virtual bool

                      supportsModuloShift(ISD::NodeType Inst, EVT

                      ReturnType) const {<br>

                      > +    assert((Inst == ISD::SHL || Inst ==

                      ISD::SRA || Inst == ISD::SRL) &&<br>

                      > +           "Expect a shift instruction");<br>

                      > +    return false;<br>

                      > +  }<br>

                      > +<br>

                      >   

 //===--------------------------------------------------------------------===//<br>

                      >     // Runtime Library hooks<br>

                      >     //<br>

                      ><br>

                      > Modified:

                      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>

                      > URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=301990&r1=301989&r2=301990&view=diff</a><br>

                      >

==============================================================================<br>

                      > ---

                      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

                      (original)<br>

                      > +++

                      llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

                      Tue May  2 19:07:02 2017<br>

                      > @@ -5294,6 +5294,17 @@ SDValue

                      DAGCombiner::visitSHL(SDNode *N)<br>

                      >       }<br>

                      >     }<br>

                      ><br>

                      > +  // If the target supports masking y in

                      (shl, y),<br>

                      > +  // fold (shl x, (and y, ((1 <<

                      numbits(x)) - 1))) -> (shl x, y)<br>

                      > +  if (TLI.isOperationLegal(ISD::SHL, VT)

                      &&<br>

                      > +      TLI.supportsModuloShift(ISD::SHL, VT)

                      && N1->getOpcode() == ISD::AND) {<br>

                      > +    if (ConstantSDNode *Mask =

                      isConstOrConstSplat(N1->getOperand(1))) {<br>

                      > +      if (Mask->getZExtValue() ==

                      OpSizeInBits - 1) {<br>

                      > +        return DAG.getNode(ISD::SHL,

                      SDLoc(N), VT, N0, N1->getOperand(0));<br>

                      > +      }<br>

                      > +    }<br>

                      > +  }<br>

                      > +<br>

                      >     ConstantSDNode *N1C =

                      isConstOrConstSplat(N1);<br>

                      ><br>

                      >     // fold (shl c1, c2) -> c1<<c2<br>

                      > @@ -5492,6 +5503,17 @@ SDValue

                      DAGCombiner::visitSRA(SDNode *N)<br>

                      >     EVT VT = N0.getValueType();<br>

                      >     unsigned OpSizeInBits =

                      VT.getScalarSizeInBits();<br>

                      ><br>

                      > +  // If the target supports masking y in

                      (sra, y),<br>

                      > +  // fold (sra x, (and y, ((1 <<

                      numbits(x)) - 1))) -> (sra x, y)<br>

                      > +  if (TLI.isOperationLegal(ISD::SRA, VT)

                      &&<br>

                      > +      TLI.supportsModuloShift(ISD::SRA, VT)

                      && N1->getOpcode() == ISD::AND) {<br>

                      > +    if (ConstantSDNode *Mask =

                      isConstOrConstSplat(N1->getOperand(1))) {<br>

                      > +      if (Mask->getZExtValue() ==

                      OpSizeInBits - 1) {<br>

                      > +        return DAG.getNode(ISD::SRA,

                      SDLoc(N), VT, N0, N1->getOperand(0));<br>

                      > +      }<br>

                      > +    }<br>

                      > +  }<br>

                      > +<br>

                      >     // Arithmetic shifting an all-sign-bit

                      value is a no-op.<br>

                      >     if (DAG.ComputeNumSignBits(N0) ==

                      OpSizeInBits)<br>

                      >       return N0;<br>

                      > @@ -5650,6 +5672,17 @@ SDValue

                      DAGCombiner::visitSRL(SDNode *N)<br>

                      >     EVT VT = N0.getValueType();<br>

                      >     unsigned OpSizeInBits =

                      VT.getScalarSizeInBits();<br>

                      ><br>

                      > +  // If the target supports masking y in

                      (srl, y),<br>

                      > +  // fold (srl x, (and y, ((1 <<

                      numbits(x)) - 1))) -> (srl x, y)<br>

                      > +  if (TLI.isOperationLegal(ISD::SRL, VT)

                      &&<br>

                      > +      TLI.supportsModuloShift(ISD::SRL, VT)

                      && N1->getOpcode() == ISD::AND) {<br>

                      > +    if (ConstantSDNode *Mask =

                      isConstOrConstSplat(N1->getOperand(1))) {<br>

                      > +      if (Mask->getZExtValue() ==

                      OpSizeInBits - 1) {<br>

                      > +        return DAG.getNode(ISD::SRL,

                      SDLoc(N), VT, N0, N1->getOperand(0));<br>

                      > +      }<br>

                      > +    }<br>

                      > +  }<br>

                      <br>

                      I see three problems here:<br>

                      <br>

                      1. It needs to be documented.<br>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>I wrote comments on the transformation and the

                      TLI hook. What kind of documentation do you

                      expect?</div>

                  </div>

                </div>

              </blockquote>

              <br>

            </div>

            <div text="#000000" bgcolor="#FFFFFF"> A comment in

              ISDOpcodes.h.</div>

            <div text="#000000" bgcolor="#FFFFFF"><br>

              <br>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      2. This isn't consistent with other transforms

                      done by DAGCombine, e.g.<br>

                      constant folding won't honor this.<br>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>How so? Although it runs before constant

                      folding (e.g. (shl c1, c2) -> c1<<c2),

                      constant folding will pick it up during the next

                      iteration, won't it?</div>

                  </div>

                </div>

              </blockquote>

              <br>

            </div>

            <div text="#000000" bgcolor="#FFFFFF"> Suppose you have

              "SHL(1, AND(y, 31))".  We fold it to "SHL(1, y)" because

              supportsModuloShift is true.  Then we figure out that y is

              32.  Then we constant-fold "SHL(1, 32)" to zero (because

              that's the default behavior for APInt).<br>

              <br>

              And that's only the one I thought of off the top of my

              head; there are probably other transforms which assume the

              shift amount is in range.</div>

          </blockquote>

          <div><br>

          </div>

          <div>Good point. I think we should write patterns in .td files

            as you suggested.</div>

          <div><br>

          </div>

          <div>The only problem I got is that, I don't know how to write

            a working pattern for the following DAG:</div>

          <div><br>

          </div>

          <div>; %rem = and <16 x i8> %b, <i8 7, i8 7, i8 7, i8

            7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8

            7, i8 7, i8 7></div>

          <div>; %shl = shl <16 x i8> %a, %rem</div>

          <div>

            <div>Initial selection DAG: BB#0 'test010:'</div>

            <div>SelectionDAG has 12 nodes:</div>

            <div>  t0: ch = EntryToken</div>

            <div>      t2: v16i8,ch = CopyFromReg t0, Register:v16i8

              %vreg0</div>

            <div>        t4: v16i8,ch = CopyFromReg t0, Register:v16i8

              %vreg1</div>

            <div>        t6: v16i8 = BUILD_VECTOR Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7>, Constant:i8<7>,

              Constant:i8<7></div>

            <div>      t7: v16i8 = and t4, t6</div>

            <div>    t8: v16i8 = shl t2, t7</div>

            <div>  t10: ch,glue = CopyToReg t0, Register:v16i8 %V2, t8</div>

            <div>  t11: ch = PPCISD::RET_FLAG t10, Register:v16i8 %V2,

              t10:1</div>

          </div>

          <div><br>

          </div>

          <div>I tried the patterns:</div>

          <div>

            <div>

              <div>def : Pat<(v16i8 (shl v16i8:$vA,</div>

              <div>                 (and v16i8:$vB,</div>

              <div>                      (v16i8 (build_vector (i8 7),

                (i8 7), (i8 7), (i8 7),</div>

              <div>                                           (i8 7),

                (i8 7), (i8 7), (i8 7),</div>

              <div>                                           (i8 7),

                (i8 7), (i8 7), (i8 7),</div>

              <div>                                           (i8 7),

                (i8 7), (i8 7), (i8 7)))))),</div>

              <div>          (v16i8 (VSLB $vA, $vB))>;</div>

            </div>

          </div>

          <div><br>

          </div>

          <div>and</div>

          <div>

            <div>def : Pat<(v16i8 (shl v16i8:$vA, (and v16i8:$vB,

              (v16i8 7)))),</div>

            <div>          (v16i8 (VSLB $vA, $vB))>;</div>

          </div>

          <div><br>

          </div>

          <div>but they don't match the code above.</div>

          <div><br>

          </div>

          <div>Do you see anywhere wrong in my pattern?</div>

        </div>

      </div>

    </blockquote>

    <br></div><div text="#000000" bgcolor="#FFFFFF">

    Are you sure you're trying to match the right pattern?  Try looking

    at the "Optimized legalized selection DAG".  (It looks like

    legalization inserts some bitcasts.)</div></blockquote><div><br></div><div>I prefer not to pattern match legalized DAG, since as you noticed, there are bitcasts. Ideally we only want to match the high level representation.</div><div><br></div><div>That said, I tried matching legalized v4i32 and v8i16, they all work. However, for v2i64 in particular, PPC legalization inserts a load from constant pool, which is very unpleasant to match:</div><div><br></div><div><div>Legalized selection DAG: BB#0 'test010:'</div><div>SelectionDAG has 19 nodes:</div><div>  t0: ch = EntryToken</div><div>      t2: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg0</div><div>            t4: v2i64,ch = CopyFromReg t0, Register:v2i64 %vreg1</div><div>          t12: v4i32 = bitcast t4</div><div>                t23: i64,ch = PPCISD::TOC_ENTRY<LD8[GOT]> TargetConstantPool:i64<<2 x i64> <i64 63, i64 63>> 0, Register:i64 %X2</div><div>              t19: v2f64,ch = load<LD16[ConstantPool]> t0, t23, undef:i64</div><div>            t20: v2i64 = bitcast t19</div><div>          t13: v4i32 = bitcast t20</div><div>        t14: v4i32 = and t12, t13</div><div>      t15: v2i64 = bitcast t14</div><div>    t8: v2i64 = shl t2, t15</div><div>  t10: ch,glue = CopyToReg t0, Register:v2i64 %V2, t8</div><div>  t11: ch = PPCISD::RET_FLAG t10, Register:v2i64 %V2, t10:1</div></div><div><br></div><div>Is it possible to write a pattern for a pre-legalization DAG? If not, I think we can still do it in the combiner, but instead of combining `(shl a, (and b, 31))` to `(shl a, b)`, we can combine it to `(PPCISD::VEC_SHL a, b)`. What do you think?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><br>

    <br>

    -Eli<br>

    <pre class="m_-6804385229949104446moz-signature" cols="72">-- 

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>

  </div></blockquote></div></div>