<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 01/14/2016 04:05 PM, Hans Boehm via
      llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Thu, Jan 14, 2016 at 1:37 PM, JF
            Bastien <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:jfb@google.com" target="_blank">jfb@google.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr">
                <div class="gmail_extra">
                  <div class="gmail_quote"><span class="">On Thu, Jan
                      14, 2016 at 1:35 PM, David Majnemer <span
                        dir="ltr"><<a moz-do-not-send="true"
                          href="mailto:david.majnemer@gmail.com"
                          target="_blank">david.majnemer@gmail.com</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div dir="ltr"><br>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote"><span>On Thu, Jan
                                14, 2016 at 1:13 PM, JF Bastien <span
                                  dir="ltr"><<a
                                    moz-do-not-send="true"
                                    href="mailto:jfb@google.com"
                                    target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jfb@google.com">jfb@google.com</a></a>></span>
                                wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                  <div dir="ltr">
                                    <div class="gmail_extra">
                                      <div class="gmail_quote"><span>On
                                          Thu, Jan 14, 2016 at 1:10 PM,
                                          David Majnemer via llvm-dev <span
                                            dir="ltr"><<a
                                              moz-do-not-send="true"
                                              href="mailto:llvm-dev@lists.llvm.org"
                                              target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>
                                          wrote:<br>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                            <div dir="ltr"><br>
                                              <div class="gmail_extra"><br>
                                                <div class="gmail_quote"><span>On
                                                    Wed, Jan 13, 2016 at
                                                    7:00 PM, Hans Boehm
                                                    via llvm-dev <span
                                                      dir="ltr"><<a
                                                        moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>
                                                    wrote:<br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0px
                                                      0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                                                      <div dir="ltr">I
                                                        agree with Tim's
                                                        assessment for
                                                        ARM.  That's
                                                        interesting; I
                                                        wasn't
                                                        previously aware
                                                        of that
                                                        instruction.
                                                        <div><br>
                                                        </div>
                                                        <div>My
                                                          understanding
                                                          is that Alpha
                                                          would have the
                                                          same problem
                                                          for normal
                                                          loads.
                                                          <div><br>
                                                          </div>
                                                          <div>I'm all
                                                          in favor of
                                                          more
                                                          systematic
                                                          handling of
                                                          the fences
                                                          associated
                                                          with x86
                                                          non-temporal
                                                          accesses.</div>
                                                          <div><br>
                                                          </div>
                                                          <div>AFAICT,
                                                          nontemporal
                                                          loads and
                                                          stores seem to
                                                          have different
                                                          fencing rules
                                                          on x86, none
                                                          of them very
                                                          clear. 
                                                          Nontemporal
                                                          stores should
                                                          probably
                                                          ideally use an
                                                          SFENCE. 
                                                          Locked
                                                          instructions
                                                          seem to be
                                                          documented to
                                                          work with
                                                          MOVNTDQA.  In
                                                          both cases,
                                                          there seems to
                                                          be only
                                                          empirical
                                                          evidence as to
                                                          which side(s)
                                                          of the
                                                          nontemporal
                                                          operations
                                                          they should go
                                                          on?</div>
                                                          <div><br>
                                                          </div>
                                                          <div>I finally
                                                          decided that I
                                                          was OK with
                                                          using a LOCKed
                                                          top-of-stack
                                                          update as a
                                                          fence in Java
                                                          on x86.  I'm
                                                          significantly
                                                          less
                                                          enthusiastic
                                                          for C++.  I
                                                          also think
                                                          that risks
                                                          unexpected
                                                          coherence miss
                                                          problems,
                                                          though they
                                                          would probably
                                                          be very rare. 
                                                          But they would
                                                          be very
                                                          surprising if
                                                          they did
                                                          occur.</div>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                    <div><br>
                                                    </div>
                                                  </span>
                                                  <div>Today's LLVM
                                                    already emits 'lock
                                                    or %eax, (%esp)' for
                                                    'fence
                                                    seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
                                                    when targeting
                                                    32-bit x86 machines
                                                    which do not support
                                                    mfence.  What
                                                    instruction sequence
                                                    should we be using
                                                    instead?</div>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                        </span>
                                        <div>Do they have non-temporal
                                          accesses in the ISA?</div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <div><br>
                                </div>
                              </span>
                              <div>I thought not but there appear to be
                                instructions like movntps.  mfence was
                                introduced in SSE2 while movntps and
                                sfence were introduced in SSE.</div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                    </span>
                    <div>So the new builtin could be sfence? I think the
                      codegen you point out for SEQ_CST is fine if we
                      fix the memory model as suggested.</div>
                  </div>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I agree that it's fine to use a locked instruction as a
              seq_cst fence if MFENCE is not available.  </div>
          </div>
        </div>
      </div>
    </blockquote>
    It's not clear to me this is true if the seq_cst fence is expected
    to fence non-temporal stores.  I think in practice, you'd be very
    unlikely to notice a difference, but I can't point to anything in
    the Intel docs which justifies a lock prefixed instruction as
    sufficient to fence any non-temporal access.  <br>
    <br>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>If you have to dirty a cache line, (%esp) seems like
              relatively safe one.  <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Agreed.  As we discussed previously, it is possible to false sharing
    in C++, but this would require one thread to be accessing
    information stored in the last frame of another running thread's
    stack.  That seems sufficiently unlikely to be ignored.  <br>
    <br>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>(I'm assuming that CPUID is appreciably slower and out
              of the running?  I haven't tried.  But it also probably
              clobbers too many registers.)  <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    This is my belief.  I haven't actually tried this experiment, but
    I've seen no reports that CPUID is a good choice here.<br>
    <br>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div>It's only the idea of writing to a memory location when
              MFENCE is available, and could be used instead, that seems
              questionable.</div>
          </div>
        </div>
      </div>
    </blockquote>
    While in principal I agree, it appears in practice that this
    tradeoff is worthwhile.  The hardware doesn't seem to optimize for
    the MFENCE case whereas lock prefix instructions appear to be
    handled much better.  <br>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <div>What exactly would the non-temporal fences be?  It
              seems that on x86, the load and store case may differ.  In
              theory, there's also a before vs. after question.  In
              practice code using MOVNTA seems to assume that you only
              need an SFENCE afterwards.  I can't back that up with spec
              verbiage.  I don't know about MOVNTDQA.  What about ARM?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    I'll leave this to JF to answer.  I'm not knowledgeable enough about
    non-temporals to answer without substantial research first.  <br>
    <blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
      type="cite">
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>