<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 9, 2016 at 8:07 AM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF"><span class="">

    <br>

    <br>

    <div>On 02/09/2016 06:57 AM, Jonas Wagner

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi,

        <div><br>

        </div>

        <div>I'm coming back to this old thread with data about the

          performance of NOPs. Recalling that I was considering

          transforming NOP instructions into branches and back, in order

          to dynamically enable code. One use case for this was

          enabling/disabling individual sanitizer checks (ASan, UBSan)

          on demand.</div>

        <div><br>

        </div>

        <div>I wrote a pass which takes an ASan-instrumented program,

          and replaces each ASan check with an

          llvm.experimental.patchpoint intrinsic. This intrinsic inserts

          a NOP of configurable size. It has otherwise no effect on the

          program semantics. It does prevent some optimizations,

          presumably because instructions cannot be moved across the

          patchpoint.</div>

        <div><br>

        </div>

        <div>Some results:</div>

        <div>- On SPEC, patchpoints introduce an overhead of ~25%

          compared to a version where ASan checks are removed.</div>

        <div>- This is almost half of the cost of the checks themselves.</div>

        <div>- The results are similar for NOPs of size 1 and 5 bytes.</div>

        <div>- Interestingly, the results are similar for NOPs of 0

          bytes, too. These are patchpoints that don't insert any code

          and only inhibit optimizations. I've only tested this on one

          benchmark, though.</div>

        <div><br>

        </div>

        <div>To summarize, only part of the cost of NOPs is due to

          executing them. Their effect on optimizations is significant,

          too. I guess this would hold for branches and sanitizer checks

          as well.</div>

      </div>

    </blockquote></span>

    I don't think you can really draw strong conclusions from the

    experiments you described.  What you've ended up measuring is nearly

    the impact of not optimizing over patchpoints at the check

    locations.  This doesn't really tell you much about what a check

    (which is likely to inhibit optimization much less) costs over a nop

    at the same position.  <br>

    <br>

    One bit of data you could extract from the experiment as constructed

    would be the relative cost of extra nops.  You do mention that the

    results are similar for sizes 1-5 bytes, but similar is very vague

    in this context.  Are the results statistically indistinguishable? 

    Or is there a noticeable but small slowdown that results?  (Numbers

    would be great here.)</div></blockquote><div><br></div><div>In this same vein, try inserting 1,2,3,4,5,6,... nops and measure the performance impact (the total size of nops is also interesting but is more difficult to measure reliably). I've used this kind of technique successfully in the past for e.g. measuring the cost of "stat" syscalls on windows. I call the technique "stuffing". Basically, make a plot of the performance degradation as you insert more and more redundant stuff (e.g. 1 nop, 2 nops, 3 nops, etc.). If the result is a strong linear trend, then you can pretty confidently extrapolate backward to the "0 nop" case to see the overhead of inserting 1 nop.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class=""><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Best,</div>

        <div>Jonas</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>

          <div class="gmail_quote">

            <div dir="ltr">On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner

              <<a href="mailto:jonas.wagner@epfl.ch" target="_blank">jonas.wagner@epfl.ch</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">

                <div>

                  <p style="margin:1.2em 0px!important">Hello,</p>

                </div>

              </div>

              <div dir="ltr">

                <div>

                  <blockquote style="margin:1.2em 0px;border-left-width:4px;border-left-style:solid;border-left-color:rgb(221,221,221);padding:0px 1em;color:rgb(119,119,119);quotes:none">

                    <blockquote style="margin:1.2em 0px;border-left-width:4px;border-left-style:solid;border-left-color:rgb(221,221,221);padding:0px 1em;color:rgb(119,119,119);quotes:none">

                      <p style="margin:1.2em 0px!important">There is

                        some data on this, e.g, in <a href="http://dslab.epfl.ch/proj/asap/#publications" target="_blank">“High System-Code Security

                          with Low Overhead”</a>. In this work we found

                        that, for ASan as well as other instrumentation

                        tools, most overhead comes from the checks.

                        Especially for CPU-intensive applications, the

                        cost of maintaining shadow memory is small.</p>

                    </blockquote>

                    <p style="margin:1.2em 0px!important">How did you

                      measure this? If it was measured by removing the

                      checks before optimization happens, then what you

                      may have been measuring is not the execution

                      overhead of the branches (which is what would be

                      eliminated by nop’ing them out) but the effect on

                      the optimizer.</p>

                  </blockquote>

                </div>

              </div>

              <div dir="ltr">

                <div>

                  <p style="margin:1.2em 0px!important">Interesting.

                    Indeed this was measured by removing some checks and

                    then re-optimizing the program.</p>

                  <p style="margin:1.2em 0px!important">I’m aware of

                    some impact checks may have on optimization. For

                    example, I’ve seen cases where much less inlining

                    happens because functions with checks are larger. Do

                    you know other concrete examples? This is definitely

                    something I’ll have to be careful about. Philip

                    Reames confirms this, too.</p>

                  <p style="margin:1.2em 0px!important">On the other

                    hand, we’ve also found that the benefit from

                    removing a check is roughly proportional to the

                    number of cycles spent executing that check’s

                    instructions. Our model of this is not very precise,

                    but it shows that the cost of executing the check’s

                    instructions matters.</p>

                  <p style="margin:1.2em 0px!important">I'll try to

                    measure this, and will come back when I have data.</p>

                  <p style="margin:1.2em 0px!important">Best,<br>

                    Jonas</p>

                  <div title="MDH:SGVsbG8sPGJyPjxicj4mZ3Q7Jmd0OyBUaGVyZSBpcyBzb21lIGRhdGEgb24gdGhpcywgZS5nLCBpbiA8YSBocmVmPSJodHRwOi8vZHNsYWIuZXBmbC5jaC9wcm9qL2FzYXAvI3B1YmxpY2F0aW9ucyI+

IkhpZ2ggU3lzdGVtLUNvZGUgU2VjdXJpdHkgd2l0aCBMb3cgT3ZlcmhlYWQiPC9hPi4gSW4gdGhp

cyB3b3JrIHdlIGZvdW5kIHRoYXQsIGZvciBBU2FuIGFzIHdlbGwgYXMgb3RoZXIgaW5zdHJ1bWVu

dGF0aW9uIHRvb2xzLCBtb3N0IG92ZXJoZWFkIGNvbWVzIGZyb20gdGhlIGNoZWNrcy4gRXNwZWNp

YWxseSBmb3IgQ1BVLWludGVuc2l2ZSBhcHBsaWNhdGlvbnMsIHRoZSBjb3N0IG9mIG1haW50YWlu

aW5nIHNoYWRvdyBtZW1vcnkgaXMgc21hbGwuPGRpdj48YnI+Jmd0OyZuYnNwO0hvdyBkaWQgeW91

IG1lYXN1cmUgdGhpcz8gSWYgaXQgd2FzIG1lYXN1cmVkIGJ5IHJlbW92aW5nIHRoZSBjaGVja3Mg

YmVmb3JlIG9wdGltaXphdGlvbiBoYXBwZW5zLCB0aGVuIHdoYXQgeW91IG1heSBoYXZlIGJlZW4g

bWVhc3VyaW5nIGlzIG5vdCB0aGUgZXhlY3V0aW9uIG92ZXJoZWFkIG9mIHRoZSBicmFuY2hlcyAo

d2hpY2ggaXMgd2hhdCB3b3VsZCBiZSBlbGltaW5hdGVkIGJ5IG5vcCdpbmcgdGhlbSBvdXQpIGJ1

dCB0aGUgZWZmZWN0IG9uIHRoZSBvcHRpbWl6ZXIuPGJyPjxkaXY+PGJyPjwvZGl2PjxkaXY+SW50

ZXJlc3RpbmcuIEluZGVlZCB0aGlzIHdhcyBtZWFzdXJlZCBieSByZW1vdmluZyBzb21lIGNoZWNr

cyBhbmQgdGhlbiByZS1vcHRpbWl6aW5nIHRoZSBwcm9ncmFtLjwvZGl2PjxkaXY+PGJyPjwvZGl2

PjxkaXY+SSdtIGF3YXJlIG9mIHNvbWUgaW1wYWN0IGNoZWNrcyBtYXkgaGF2ZSBvbiBvcHRpbWl6

YXRpb24uIEZvciBleGFtcGxlLCBJJ3ZlIHNlZW4gY2FzZXMgd2hlcmUgbXVjaCBsZXNzIGlubGlu

aW5nIGhhcHBlbnMgYmVjYXVzZSBmdW5jdGlvbnMgd2l0aCBjaGVja3MgYXJlIGxhcmdlci4gRG8g

eW91IGtub3cgb3RoZXIgY29uY3JldGUgZXhhbXBsZXM/IFRoaXMgaXMgZGVmaW5pdGVseSBzb21l

dGhpbmcgSSdsbCBoYXZlIHRvIGJlIGNhcmVmdWwgYWJvdXQuIFBoaWxpcCBSZWFtZXMgY29uZmly

bXMgdGhpcywgdG9vLjwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+T24gdGhlIG90aGVyIGhhbmQs

IHdlJ3ZlIGFsc28gZm91bmQgdGhhdCB0aGUgYmVuZWZpdCBmcm9tIHJlbW92aW5nIGEgY2hlY2sg

aXMgcm91Z2hseSBwcm9wb3J0aW9uYWwgdG8gdGhlIG51bWJlciBvZiBjeWNsZXMgc3BlbnQgZXhl

Y3V0aW5nIHRoYXQgY2hlY2sncyBpbnN0cnVjdGlvbnMuIE91ciBtb2RlbCBvZiB0aGlzIGlzIG5v

dCB2ZXJ5IHByZWNpc2UsIGJ1dCBpdCBzaG93cyB0aGF0IHRoZSBjb3N0IG9mIGV4ZWN1dGluZyB0

aGUgY2hlY2sncyBpbnN0cnVjdGlvbnMgbWF0dGVycy48L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2

                    PkJlc3QsPC9kaXY+PGRpdj5Kb25hczwvZGl2PjwvZGl2Pg==" style="min-height:0;width:0;max-height:0;max-width:0;overflow:hidden;font-size:0em;padding:0;margin:0"></div>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </span></div>

</blockquote></div><br></div></div>