<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 9, 2016 at 8:07 AM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><span class="">
    <br>
    <br>
    <div>On 02/09/2016 06:57 AM, Jonas Wagner
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Hi,
        <div><br>
        </div>
        <div>I'm coming back to this old thread with data about the
          performance of NOPs. Recalling that I was considering
          transforming NOP instructions into branches and back, in order
          to dynamically enable code. One use case for this was
          enabling/disabling individual sanitizer checks (ASan, UBSan)
          on demand.</div>
        <div><br>
        </div>
        <div>I wrote a pass which takes an ASan-instrumented program,
          and replaces each ASan check with an
          llvm.experimental.patchpoint intrinsic. This intrinsic inserts
          a NOP of configurable size. It has otherwise no effect on the
          program semantics. It does prevent some optimizations,
          presumably because instructions cannot be moved across the
          patchpoint.</div>
        <div><br>
        </div>
        <div>Some results:</div>
        <div>- On SPEC, patchpoints introduce an overhead of ~25%
          compared to a version where ASan checks are removed.</div>
        <div>- This is almost half of the cost of the checks themselves.</div>
        <div>- The results are similar for NOPs of size 1 and 5 bytes.</div>
        <div>- Interestingly, the results are similar for NOPs of 0
          bytes, too. These are patchpoints that don't insert any code
          and only inhibit optimizations. I've only tested this on one
          benchmark, though.</div>
        <div><br>
        </div>
        <div>To summarize, only part of the cost of NOPs is due to
          executing them. Their effect on optimizations is significant,
          too. I guess this would hold for branches and sanitizer checks
          as well.</div>
      </div>
    </blockquote></span>
    I don't think you can really draw strong conclusions from the
    experiments you described.  What you've ended up measuring is nearly
    the impact of not optimizing over patchpoints at the check
    locations.  This doesn't really tell you much about what a check
    (which is likely to inhibit optimization much less) costs over a nop
    at the same position.  <br>
    <br>
    One bit of data you could extract from the experiment as constructed
    would be the relative cost of extra nops.  You do mention that the
    results are similar for sizes 1-5 bytes, but similar is very vague
    in this context.  Are the results statistically indistinguishable? 
    Or is there a noticeable but small slowdown that results?  (Numbers
    would be great here.)</div></blockquote><div><br></div><div>In this same vein, try inserting 1,2,3,4,5,6,... nops and measure the performance impact (the total size of nops is also interesting but is more difficult to measure reliably). I've used this kind of technique successfully in the past for e.g. measuring the cost of "stat" syscalls on windows. I call the technique "stuffing". Basically, make a plot of the performance degradation as you insert more and more redundant stuff (e.g. 1 nop, 2 nops, 3 nops, etc.). If the result is a strong linear trend, then you can pretty confidently extrapolate backward to the "0 nop" case to see the overhead of inserting 1 nop.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class=""><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Best,</div>
        <div>Jonas</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>
          <div class="gmail_quote">
            <div dir="ltr">On Thu, Jan 21, 2016 at 11:52 PM Jonas Wagner
              <<a href="mailto:jonas.wagner@epfl.ch" target="_blank">jonas.wagner@epfl.ch</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr">
                <div>
                  <p style="margin:1.2em 0px!important">Hello,</p>
                </div>
              </div>
              <div dir="ltr">
                <div>
                  <blockquote style="margin:1.2em 0px;border-left-width:4px;border-left-style:solid;border-left-color:rgb(221,221,221);padding:0px 1em;color:rgb(119,119,119);quotes:none">
                    <blockquote style="margin:1.2em 0px;border-left-width:4px;border-left-style:solid;border-left-color:rgb(221,221,221);padding:0px 1em;color:rgb(119,119,119);quotes:none">
                      <p style="margin:1.2em 0px!important">There is
                        some data on this, e.g, in <a href="http://dslab.epfl.ch/proj/asap/#publications" target="_blank">“High System-Code Security
                          with Low Overhead”</a>. In this work we found
                        that, for ASan as well as other instrumentation
                        tools, most overhead comes from the checks.
                        Especially for CPU-intensive applications, the
                        cost of maintaining shadow memory is small.</p>
                    </blockquote>
                    <p style="margin:1.2em 0px!important">How did you
                      measure this? If it was measured by removing the
                      checks before optimization happens, then what you
                      may have been measuring is not the execution
                      overhead of the branches (which is what would be
                      eliminated by nop’ing them out) but the effect on
                      the optimizer.</p>
                  </blockquote>
                </div>
              </div>
              <div dir="ltr">
                <div>
                  <p style="margin:1.2em 0px!important">Interesting.
                    Indeed this was measured by removing some checks and
                    then re-optimizing the program.</p>
                  <p style="margin:1.2em 0px!important">I’m aware of
                    some impact checks may have on optimization. For
                    example, I’ve seen cases where much less inlining
                    happens because functions with checks are larger. Do
                    you know other concrete examples? This is definitely
                    something I’ll have to be careful about. Philip
                    Reames confirms this, too.</p>
                  <p style="margin:1.2em 0px!important">On the other
                    hand, we’ve also found that the benefit from
                    removing a check is roughly proportional to the
                    number of cycles spent executing that check’s
                    instructions. Our model of this is not very precise,
                    but it shows that the cost of executing the check’s
                    instructions matters.</p>
                  <p style="margin:1.2em 0px!important">I'll try to
                    measure this, and will come back when I have data.</p>
                  <p style="margin:1.2em 0px!important">Best,<br>
                    Jonas</p>
                  <div title="MDH:SGVsbG8sPGJyPjxicj4mZ3Q7Jmd0OyBUaGVyZSBpcyBzb21lIGRhdGEgb24gdGhpcywgZS5nLCBpbiA8YSBocmVmPSJodHRwOi8vZHNsYWIuZXBmbC5jaC9wcm9qL2FzYXAvI3B1YmxpY2F0aW9ucyI+
IkhpZ2ggU3lzdGVtLUNvZGUgU2VjdXJpdHkgd2l0aCBMb3cgT3ZlcmhlYWQiPC9hPi4gSW4gdGhp
cyB3b3JrIHdlIGZvdW5kIHRoYXQsIGZvciBBU2FuIGFzIHdlbGwgYXMgb3RoZXIgaW5zdHJ1bWVu
dGF0aW9uIHRvb2xzLCBtb3N0IG92ZXJoZWFkIGNvbWVzIGZyb20gdGhlIGNoZWNrcy4gRXNwZWNp
YWxseSBmb3IgQ1BVLWludGVuc2l2ZSBhcHBsaWNhdGlvbnMsIHRoZSBjb3N0IG9mIG1haW50YWlu
aW5nIHNoYWRvdyBtZW1vcnkgaXMgc21hbGwuPGRpdj48YnI+Jmd0OyZuYnNwO0hvdyBkaWQgeW91
IG1lYXN1cmUgdGhpcz8gSWYgaXQgd2FzIG1lYXN1cmVkIGJ5IHJlbW92aW5nIHRoZSBjaGVja3Mg
YmVmb3JlIG9wdGltaXphdGlvbiBoYXBwZW5zLCB0aGVuIHdoYXQgeW91IG1heSBoYXZlIGJlZW4g
bWVhc3VyaW5nIGlzIG5vdCB0aGUgZXhlY3V0aW9uIG92ZXJoZWFkIG9mIHRoZSBicmFuY2hlcyAo
d2hpY2ggaXMgd2hhdCB3b3VsZCBiZSBlbGltaW5hdGVkIGJ5IG5vcCdpbmcgdGhlbSBvdXQpIGJ1
dCB0aGUgZWZmZWN0IG9uIHRoZSBvcHRpbWl6ZXIuPGJyPjxkaXY+PGJyPjwvZGl2PjxkaXY+SW50
ZXJlc3RpbmcuIEluZGVlZCB0aGlzIHdhcyBtZWFzdXJlZCBieSByZW1vdmluZyBzb21lIGNoZWNr
cyBhbmQgdGhlbiByZS1vcHRpbWl6aW5nIHRoZSBwcm9ncmFtLjwvZGl2PjxkaXY+PGJyPjwvZGl2
PjxkaXY+SSdtIGF3YXJlIG9mIHNvbWUgaW1wYWN0IGNoZWNrcyBtYXkgaGF2ZSBvbiBvcHRpbWl6
YXRpb24uIEZvciBleGFtcGxlLCBJJ3ZlIHNlZW4gY2FzZXMgd2hlcmUgbXVjaCBsZXNzIGlubGlu
aW5nIGhhcHBlbnMgYmVjYXVzZSBmdW5jdGlvbnMgd2l0aCBjaGVja3MgYXJlIGxhcmdlci4gRG8g
eW91IGtub3cgb3RoZXIgY29uY3JldGUgZXhhbXBsZXM/IFRoaXMgaXMgZGVmaW5pdGVseSBzb21l
dGhpbmcgSSdsbCBoYXZlIHRvIGJlIGNhcmVmdWwgYWJvdXQuIFBoaWxpcCBSZWFtZXMgY29uZmly
bXMgdGhpcywgdG9vLjwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+T24gdGhlIG90aGVyIGhhbmQs
IHdlJ3ZlIGFsc28gZm91bmQgdGhhdCB0aGUgYmVuZWZpdCBmcm9tIHJlbW92aW5nIGEgY2hlY2sg
aXMgcm91Z2hseSBwcm9wb3J0aW9uYWwgdG8gdGhlIG51bWJlciBvZiBjeWNsZXMgc3BlbnQgZXhl
Y3V0aW5nIHRoYXQgY2hlY2sncyBpbnN0cnVjdGlvbnMuIE91ciBtb2RlbCBvZiB0aGlzIGlzIG5v
dCB2ZXJ5IHByZWNpc2UsIGJ1dCBpdCBzaG93cyB0aGF0IHRoZSBjb3N0IG9mIGV4ZWN1dGluZyB0
aGUgY2hlY2sncyBpbnN0cnVjdGlvbnMgbWF0dGVycy48L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2
                    PkJlc3QsPC9kaXY+PGRpdj5Kb25hczwvZGl2PjwvZGl2Pg==" style="min-height:0;width:0;max-height:0;max-width:0;overflow:hidden;font-size:0em;padding:0;margin:0">​</div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </span></div>

</blockquote></div><br></div></div>