<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 14, 2016 at 4:27 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  

    

  

  <div text="#000000" bgcolor="#FFFFFF">It's not clear to me this is true if the seq_cst fence is expected

    to fence non-temporal stores.  I think in practice, you'd be very

    unlikely to notice a difference, but I can't point to anything in

    the Intel docs which justifies a lock prefixed instruction as

    sufficient to fence any non-temporal access.  <br></div></blockquote><div><br></div><div>Agreed.  I think it's not guaranteed.  And the most rational explanation for the fact that LOCK; X is faster than MFENCE seems to be that LOCK only deals with normal write-back cacheable accesses, and hence may not work for cases like this.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="">

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>If you have to dirty a cache line, (%esp) seems like

              relatively safe one.  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote></span>

    Agreed.  As we discussed previously, it is possible to false sharing

    in C++, but this would require one thread to be accessing

    information stored in the last frame of another running thread's

    stack.  That seems sufficiently unlikely to be ignored.  <br></div></blockquote><div><br></div><div>I disagree with the reasoning, but not really with the conclusion.  Starting a thread with a lambda that captures locals by reference is likely to do this, and is a common C++ idiom, especially in textbook examples.  This is aggravated by the fact that I don't understand the hardware prefetcher, and that it sometimes seems to fetch an adjacent line.  (Note that C, unlike C++, allows implementations to make thread stacks inaccessible to other threads.  Some of us consider that a bug and would refuse to use a general purpose implementation that actually did this.  I suspect there are enough of us that it doesn't matter.)</div><div><br></div><div>I think a stronger argument is that the compiler is always allowed to push temporaries on the stack.  So this looks exactly as though a sequentially consistent fence required a stack temporary.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class=""><br>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>It's only the idea of writing to a memory location when

              MFENCE is available, and could be used instead, that seems

              questionable.</div>

          </div>

        </div>

      </div>

    </blockquote></span>

    While in principal I agree, it appears in practice that this

    tradeoff is worthwhile.  The hardware doesn't seem to optimize for

    the MFENCE case whereas lock prefix instructions appear to be

    handled much better.<br></div></blockquote><div>The concern is that it is actually fairly easy to get contention as a result in C++.  And programmers might think they know that certain fences shouldn't use temporaries and the rest of their code should run in registers.  But I agree this is not a completely clear call.  I wish x86 provided a plain fence instruction that handled the common case efficiently, so we could avoid these trade-offs.  (A "sequentially consistent store" instruction might be even better, in that it should largely eliminate fences and allows other optimizations.)</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span class="">

    </span></div></blockquote></div><br></div><div class="gmail_extra">Hans</div></div>