<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    <div class="moz-cite-prefix">On 01/14/2016 04:05 PM, Hans Boehm via

      llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Thu, Jan 14, 2016 at 1:37 PM, JF

            Bastien <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:jfb@google.com" target="_blank">jfb@google.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div dir="ltr">

                <div class="gmail_extra">

                  <div class="gmail_quote"><span class="">On Thu, Jan

                      14, 2016 at 1:35 PM, David Majnemer <span

                        dir="ltr"><<a moz-do-not-send="true"

                          href="mailto:david.majnemer@gmail.com"

                          target="_blank">david.majnemer@gmail.com</a>></span>

                      wrote:<br>

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr"><br>

                          <div class="gmail_extra"><br>

                            <div class="gmail_quote"><span>On Thu, Jan

                                14, 2016 at 1:13 PM, JF Bastien <span

                                  dir="ltr"><<a

                                    moz-do-not-send="true"

                                    href="mailto:jfb@google.com"

                                    target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jfb@google.com">jfb@google.com</a></a>></span>

                                wrote:<br>

                                <blockquote class="gmail_quote"

                                  style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                                  <div dir="ltr">

                                    <div class="gmail_extra">

                                      <div class="gmail_quote"><span>On

                                          Thu, Jan 14, 2016 at 1:10 PM,

                                          David Majnemer via llvm-dev <span

                                            dir="ltr"><<a

                                              moz-do-not-send="true"

                                              href="mailto:llvm-dev@lists.llvm.org"

                                              target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>

                                          wrote:<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                                            <div dir="ltr"><br>

                                              <div class="gmail_extra"><br>

                                                <div class="gmail_quote"><span>On

                                                    Wed, Jan 13, 2016 at

                                                    7:00 PM, Hans Boehm

                                                    via llvm-dev <span

                                                      dir="ltr"><<a

                                                        moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>

                                                    wrote:<br>

                                                    <blockquote

                                                      class="gmail_quote"

                                                      style="margin:0px

                                                      0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                                                      <div dir="ltr">I

                                                        agree with Tim's

                                                        assessment for

                                                        ARM.  That's

                                                        interesting; I

                                                        wasn't

                                                        previously aware

                                                        of that

                                                        instruction.

                                                        <div><br>

                                                        </div>

                                                        <div>My

                                                          understanding

                                                          is that Alpha

                                                          would have the

                                                          same problem

                                                          for normal

                                                          loads.

                                                          <div><br>

                                                          </div>

                                                          <div>I'm all

                                                          in favor of

                                                          more

                                                          systematic

                                                          handling of

                                                          the fences

                                                          associated

                                                          with x86

                                                          non-temporal

                                                          accesses.</div>

                                                          <div><br>

                                                          </div>

                                                          <div>AFAICT,

                                                          nontemporal

                                                          loads and

                                                          stores seem to

                                                          have different

                                                          fencing rules

                                                          on x86, none

                                                          of them very

                                                          clear. 

                                                          Nontemporal

                                                          stores should

                                                          probably

                                                          ideally use an

                                                          SFENCE. 

                                                          Locked

                                                          instructions

                                                          seem to be

                                                          documented to

                                                          work with

                                                          MOVNTDQA.  In

                                                          both cases,

                                                          there seems to

                                                          be only

                                                          empirical

                                                          evidence as to

                                                          which side(s)

                                                          of the

                                                          nontemporal

                                                          operations

                                                          they should go

                                                          on?</div>

                                                          <div><br>

                                                          </div>

                                                          <div>I finally

                                                          decided that I

                                                          was OK with

                                                          using a LOCKed

                                                          top-of-stack

                                                          update as a

                                                          fence in Java

                                                          on x86.  I'm

                                                          significantly

                                                          less

                                                          enthusiastic

                                                          for C++.  I

                                                          also think

                                                          that risks

                                                          unexpected

                                                          coherence miss

                                                          problems,

                                                          though they

                                                          would probably

                                                          be very rare. 

                                                          But they would

                                                          be very

                                                          surprising if

                                                          they did

                                                          occur.</div>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                    <div><br>

                                                    </div>

                                                  </span>

                                                  <div>Today's LLVM

                                                    already emits 'lock

                                                    or %eax, (%esp)' for

                                                    'fence

                                                    seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)

                                                    when targeting

                                                    32-bit x86 machines

                                                    which do not support

                                                    mfence.  What

                                                    instruction sequence

                                                    should we be using

                                                    instead?</div>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <div><br>

                                          </div>

                                        </span>

                                        <div>Do they have non-temporal

                                          accesses in the ISA?</div>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                                <div><br>

                                </div>

                              </span>

                              <div>I thought not but there appear to be

                                instructions like movntps.  mfence was

                                introduced in SSE2 while movntps and

                                sfence were introduced in SSE.</div>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                      <div><br>

                      </div>

                    </span>

                    <div>So the new builtin could be sfence? I think the

                      codegen you point out for SEQ_CST is fine if we

                      fix the memory model as suggested.</div>

                  </div>

                </div>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>I agree that it's fine to use a locked instruction as a

              seq_cst fence if MFENCE is not available.  </div>

          </div>

        </div>

      </div>

    </blockquote>

    It's not clear to me this is true if the seq_cst fence is expected

    to fence non-temporal stores.  I think in practice, you'd be very

    unlikely to notice a difference, but I can't point to anything in

    the Intel docs which justifies a lock prefixed instruction as

    sufficient to fence any non-temporal access.  <br>

    <br>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>If you have to dirty a cache line, (%esp) seems like

              relatively safe one.  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    Agreed.  As we discussed previously, it is possible to false sharing

    in C++, but this would require one thread to be accessing

    information stored in the last frame of another running thread's

    stack.  That seems sufficiently unlikely to be ignored.  <br>

    <br>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>(I'm assuming that CPUID is appreciably slower and out

              of the running?  I haven't tried.  But it also probably

              clobbers too many registers.)  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    This is my belief.  I haven't actually tried this experiment, but

    I've seen no reports that CPUID is a good choice here.<br>

    <br>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>It's only the idea of writing to a memory location when

              MFENCE is available, and could be used instead, that seems

              questionable.</div>

          </div>

        </div>

      </div>

    </blockquote>

    While in principal I agree, it appears in practice that this

    tradeoff is worthwhile.  The hardware doesn't seem to optimize for

    the MFENCE case whereas lock prefix instructions appear to be

    handled much better.  <br>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>What exactly would the non-temporal fences be?  It

              seems that on x86, the load and store case may differ.  In

              theory, there's also a before vs. after question.  In

              practice code using MOVNTA seems to assume that you only

              need an SFENCE afterwards.  I can't back that up with spec

              verbiage.  I don't know about MOVNTDQA.  What about ARM?<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    I'll leave this to JF to answer.  I'm not knowledgeable enough about

    non-temporals to answer without substantial research first.  <br>

    <blockquote

cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"

      type="cite">

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>