<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 01/14/2016 04:05 PM, Hans Boehm via
llvm-dev wrote:<br>
</div>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 14, 2016 at 1:37 PM, JF
Bastien <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:jfb@google.com" target="_blank">jfb@google.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><span class="">On Thu, Jan
14, 2016 at 1:35 PM, David Majnemer <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:david.majnemer@gmail.com"
target="_blank">david.majnemer@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span>On Thu, Jan
14, 2016 at 1:13 PM, JF Bastien <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:jfb@google.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jfb@google.com">jfb@google.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><span>On
Thu, Jan 14, 2016 at 1:10 PM,
David Majnemer via llvm-dev <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote"><span>On
Wed, Jan 13, 2016 at
7:00 PM, Hans Boehm
via llvm-dev <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0px
0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">I
agree with Tim's
assessment for
ARM. That's
interesting; I
wasn't
previously aware
of that
instruction.
<div><br>
</div>
<div>My
understanding
is that Alpha
would have the
same problem
for normal
loads.
<div><br>
</div>
<div>I'm all
in favor of
more
systematic
handling of
the fences
associated
with x86
non-temporal
accesses.</div>
<div><br>
</div>
<div>AFAICT,
nontemporal
loads and
stores seem to
have different
fencing rules
on x86, none
of them very
clear.
Nontemporal
stores should
probably
ideally use an
SFENCE.
Locked
instructions
seem to be
documented to
work with
MOVNTDQA. In
both cases,
there seems to
be only
empirical
evidence as to
which side(s)
of the
nontemporal
operations
they should go
on?</div>
<div><br>
</div>
<div>I finally
decided that I
was OK with
using a LOCKed
top-of-stack
update as a
fence in Java
on x86. I'm
significantly
less
enthusiastic
for C++. I
also think
that risks
unexpected
coherence miss
problems,
though they
would probably
be very rare.
But they would
be very
surprising if
they did
occur.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</span>
<div>Today's LLVM
already emits 'lock
or %eax, (%esp)' for
'fence
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
when targeting
32-bit x86 machines
which do not support
mfence. What
instruction sequence
should we be using
instead?</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</span>
<div>Do they have non-temporal
accesses in the ISA?</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</span>
<div>I thought not but there appear to be
instructions like movntps. mfence was
introduced in SSE2 while movntps and
sfence were introduced in SSE.</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</span>
<div>So the new builtin could be sfence? I think the
codegen you point out for SEQ_CST is fine if we
fix the memory model as suggested.</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I agree that it's fine to use a locked instruction as a
seq_cst fence if MFENCE is not available. </div>
</div>
</div>
</div>
</blockquote>
It's not clear to me this is true if the seq_cst fence is expected
to fence non-temporal stores. I think in practice, you'd be very
unlikely to notice a difference, but I can't point to anything in
the Intel docs which justifies a lock prefixed instruction as
sufficient to fence any non-temporal access. <br>
<br>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>If you have to dirty a cache line, (%esp) seems like
relatively safe one. <br>
</div>
</div>
</div>
</div>
</blockquote>
Agreed. As we discussed previously, it is possible to false sharing
in C++, but this would require one thread to be accessing
information stored in the last frame of another running thread's
stack. That seems sufficiently unlikely to be ignored. <br>
<br>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>(I'm assuming that CPUID is appreciably slower and out
of the running? I haven't tried. But it also probably
clobbers too many registers.) <br>
</div>
</div>
</div>
</div>
</blockquote>
This is my belief. I haven't actually tried this experiment, but
I've seen no reports that CPUID is a good choice here.<br>
<br>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>It's only the idea of writing to a memory location when
MFENCE is available, and could be used instead, that seems
questionable.</div>
</div>
</div>
</div>
</blockquote>
While in principal I agree, it appears in practice that this
tradeoff is worthwhile. The hardware doesn't seem to optimize for
the MFENCE case whereas lock prefix instructions appear to be
handled much better. <br>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>What exactly would the non-temporal fences be? It
seems that on x86, the load and store case may differ. In
theory, there's also a before vs. after question. In
practice code using MOVNTA seems to assume that you only
need an SFENCE afterwards. I can't back that up with spec
verbiage. I don't know about MOVNTDQA. What about ARM?<br>
</div>
</div>
</div>
</div>
</blockquote>
I'll leave this to JF to answer. I'm not knowledgeable enough about
non-temporals to answer without substantial research first. <br>
<blockquote
cite="mid:CAMOCf+iBRjBjyayv2-SaWSvOMTbWvx6+miBaHaOA0VqgphZgkQ@mail.gmail.com"
type="cite">
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>