[llvm-dev] RFC: non-temporal fencing in LLVM IR
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Thu Jan 14 16:27:09 PST 2016
On 01/14/2016 04:05 PM, Hans Boehm via llvm-dev wrote:
>
>
> On Thu, Jan 14, 2016 at 1:37 PM, JF Bastien <jfb at google.com
> <mailto:jfb at google.com>> wrote:
>
> On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer
> <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>> wrote:
>
>
>
> On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com
> <mailto:jfb at google.com>> wrote:
>
> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via
> llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>
>
> On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via
> llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> I agree with Tim's assessment for ARM. That's
> interesting; I wasn't previously aware of that
> instruction.
>
> My understanding is that Alpha would have the same
> problem for normal loads.
>
> I'm all in favor of more systematic handling of
> the fences associated with x86 non-temporal accesses.
>
> AFAICT, nontemporal loads and stores seem to have
> different fencing rules on x86, none of them very
> clear. Nontemporal stores should probably ideally
> use an SFENCE. Locked instructions seem to be
> documented to work with MOVNTDQA. In both cases,
> there seems to be only empirical evidence as to
> which side(s) of the nontemporal operations they
> should go on?
>
> I finally decided that I was OK with using a
> LOCKed top-of-stack update as a fence in Java on
> x86. I'm significantly less enthusiastic for
> C++. I also think that risks unexpected coherence
> miss problems, though they would probably be very
> rare. But they would be very surprising if they
> did occur.
>
>
> Today's LLVM already emits 'lock or %eax, (%esp)' for
> 'fence
> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST)
> when targeting 32-bit x86 machines which do not
> support mfence. What instruction sequence should we
> be using instead?
>
>
> Do they have non-temporal accesses in the ISA?
>
>
> I thought not but there appear to be instructions
> like movntps. mfence was introduced in SSE2 while movntps and
> sfence were introduced in SSE.
>
>
> So the new builtin could be sfence? I think the codegen you point
> out for SEQ_CST is fine if we fix the memory model as suggested.
>
>
> I agree that it's fine to use a locked instruction as a seq_cst fence
> if MFENCE is not available.
It's not clear to me this is true if the seq_cst fence is expected to
fence non-temporal stores. I think in practice, you'd be very unlikely
to notice a difference, but I can't point to anything in the Intel docs
which justifies a lock prefixed instruction as sufficient to fence any
non-temporal access.
> If you have to dirty a cache line, (%esp) seems like relatively safe one.
Agreed. As we discussed previously, it is possible to false sharing in
C++, but this would require one thread to be accessing information
stored in the last frame of another running thread's stack. That seems
sufficiently unlikely to be ignored.
> (I'm assuming that CPUID is appreciably slower and out of the
> running? I haven't tried. But it also probably clobbers too many
> registers.)
This is my belief. I haven't actually tried this experiment, but I've
seen no reports that CPUID is a good choice here.
> It's only the idea of writing to a memory location when MFENCE is
> available, and could be used instead, that seems questionable.
While in principal I agree, it appears in practice that this tradeoff is
worthwhile. The hardware doesn't seem to optimize for the MFENCE case
whereas lock prefix instructions appear to be handled much better.
>
> What exactly would the non-temporal fences be? It seems that on x86,
> the load and store case may differ. In theory, there's also a before
> vs. after question. In practice code using MOVNTA seems to assume
> that you only need an SFENCE afterwards. I can't back that up with
> spec verbiage. I don't know about MOVNTDQA. What about ARM?
I'll leave this to JF to answer. I'm not knowledgeable enough about
non-temporals to answer without substantial research first.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/cd9f16a8/attachment.html>
More information about the llvm-dev
mailing list