<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    <div class="moz-cite-prefix">On 01/19/2016 09:04 PM, Sean Silva via

      llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CAHnXoakwE443AzE4FoKe9o2j=xYtT4wppaOKnZLzYx-MFOVwGg@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra">AFAIK, the cost of a well-predicted,

          not-taken branch is the same as a nop on every x86 made in the

          last many years.

          See <a class="moz-txt-link-freetext" href="http://www.agner.org/optimize/instruction_tables.pdf">http://www.agner.org/optimize/instruction_tables.pdf</a>

          <div class="gmail_quote"><a moz-do-not-send="true"

              href="http://www.agner.org/optimize/instruction_tables.pdf"></a>

            <div>Generally speaking a correctly-predicted not-taken

              branch is basically identical to a nop, and a

              correctly-predicted taken branch is has an extra overhead

              similar to an "add" or other extremely cheap operation. </div>

          </div>

        </div>

      </div>

    </blockquote>

    Specifically on this point only: While absolutely true for most

    micro-benchmarks, this is less true at large scale.  I've definitely

    seen removing a highly predictable branch (in many, many places,

    some of which are hot) to benefit performance in the 5-10% range. 

    For instance, removing highly predictable branches is the primary

    motivation of implicit null checking. 

    (<a class="moz-txt-link-freetext" href="http://llvm.org/docs/FaultMaps.html">http://llvm.org/docs/FaultMaps.html</a>).  Where exactly the

    performance improvement comes from is hard to say, but, empirically,

    it does matter.  <br>

    <br>

    (Caveat to above: I have not run an experiment that actually put in

    the same number of bytes in nops.  It's possible the entire benefit

    I mentioned is code size related, but I doubt it given how many

    ticks a sample profiler will show on said branches.)<br>

    <br>

    p.s. Sean mentions down-thread that most of the slowdown from checks

    is in the effect on the optimizer, not the direct impact of the

    instructions emitted.  This is absolutely our experience as well.  I

    don't intend for anything I said above to imply otherwise.  <br>

    <br>

    Philip<br>

    <br>

  </body>

</html>