<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi sanjay,<br>
    </p>
    <blockquote type="cite"
cite="mid:CA+wODiuHOgDdXSHv=bobz9vV6=4yLsM4xaRkBjXEvKF+2j+Bvg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div>If I'm seeing it correctly, (part of?) the fold you're
            looking for is here:</div>
          <div><a href="https://reviews.llvm.org/rL350006"
              moz-do-not-send="true">https://reviews.llvm.org/rL350006</a></div>
          <div><br>
          </div>
          <div>...but it's restricted to pre-legalization.<br>
          </div>
          <div>I don't remember exactly what the problem was allowing
            that fold post-legalization, but maybe you can loosen that
            restriction? <br>
          </div>
        </div>
      </div>
      <br>
    </blockquote>
    <p>Thanks! I tried just to remove the !LegalOperations condition
      (DAGCombiner.cpp:10056), and indeed my problem was solved. Doing
      this on SystemZ (for all of the opcodes) did not affect SPEC that
      much. Opcode counts (trunk to left):<br>
    </p>
    <p><tt>aghi           :                38759               
        38742      -17</tt><tt><br>
      </tt><tt>ahi            :                34921               
        34936      +15</tt><tt><br>
      </tt><tt>risbgn         :                37104               
        37092      -12</tt><tt><br>
      </tt><tt>nill           :                 2172                
        2183      +11</tt><tt><br>
      </tt><tt>lr             :                29731               
        29735       +4</tt><tt><br>
      </tt><tt>sr             :                 6055                
        6059       +4</tt><tt><br>
      </tt><tt>srk            :                 3743                
        3741       -2</tt><tt><br>
      </tt><tt>lhi            :                89566               
        89568       +2</tt><tt><br>
      </tt><tt>risblg         :                 6528                
        6529       +1</tt><tt><br>
      </tt><tt>la             :               192375              
        192374       -1</tt><tt><br>
      </tt></p>
    <p><tt>Spill|Reload   :               189670              
        189670       +0</tt><tt><br>
      </tt></p>
    <p>So, to me it seems this could be the default on SystemZ at least.</p>
    <p>/Jonas<br>
      <tt></tt></p>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CA+wODiuHOgDdXSHv=bobz9vV6=4yLsM4xaRkBjXEvKF+2j+Bvg@mail.gmail.com">
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Fri, Feb 8, 2019 at 10:20
          AM Jonas Paulsson via llvm-dev <<a
            href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
          <br>
          SystemZ supports @llvm.ctlz.i64() natively with a single
          instruction <br>
          (FLOGR), and lesser bitwidth versions of the intrinsic are
          promoted to i64.<br>
          <br>
          For some reason, this leads to unfolded additions of constants
          as shown <br>
          below:<br>
          <br>
          This function:<br>
          <br>
          define i16 @fun(i16 %arg) {<br>
             %1 = tail call i16 @llvm.ctlz.i16(i16 %arg, i1 false)<br>
             ret i16 %1<br>
          }<br>
          <br>
          ,gives this optimized DAG as input to instruction selection:<br>
          <br>
          SelectionDAG has 15 nodes:<br>
             t0: ch = EntryToken<br>
                           t2: i32,ch = CopyFromReg t0, Register:i32 %0<br>
                         t10: i32 = and t2, Constant:i32<65535><br>
                       t16: i64 = zero_extend t10<br>
                     t17: i64 = ctlz t16<br>
                   t22: i64 = add t17, Constant:i64<-32><br>
                 t20: i32 = truncate t22<br>
               t15: i32 = add t20, Constant:i32<-16><br>
             t7: ch,glue = CopyToReg t0, Register:i32 $r2l, t15<br>
             t8: ch = SystemZISD::RET_FLAG t7, Register:i32 $r2l, t7:1<br>
          <br>
          It seems that SelectionDAG::computeKnownBits() has a case for
          ISD::CTLZ, <br>
          and it seems to figure out that the high bits of t17 are zero,
          as expected.<br>
          <br>
          t17 is guaranteed to have a value between 48 and 64, so there
          could not <br>
          be any overflow here, even though I am not sure if that's the
          problem or <br>
          not... Should DAGCombiner::visitADD() handle this, or perhaps
          <br>
          visitTRUNCATE()?<br>
          <br>
          Thanks for any help,<br>
          <br>
          Jonas<br>
          <br>
          <br>
          _______________________________________________<br>
          LLVM Developers mailing list<br>
          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"
            moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
          <a
            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
            rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>