[llvm-dev] RFC: non-temporal fencing in LLVM IR

Thu Jan 14 13:10:05 PST 2016

On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I agree with Tim's assessment for ARM.  That's interesting; I wasn't
> previously aware of that instruction.
>
> My understanding is that Alpha would have the same problem for normal
> loads.
>
> I'm all in favor of more systematic handling of the fences associated with
> x86 non-temporal accesses.
>
> AFAICT, nontemporal loads and stores seem to have different fencing rules
> on x86, none of them very clear.  Nontemporal stores should probably
> ideally use an SFENCE.  Locked instructions seem to be documented to work
> with MOVNTDQA.  In both cases, there seems to be only empirical evidence as
> to which side(s) of the nontemporal operations they should go on?
>
> I finally decided that I was OK with using a LOCKed top-of-stack update as
> a fence in Java on x86.  I'm significantly less enthusiastic for C++.  I
> also think that risks unexpected coherence miss problems, though they would
> probably be very rare.  But they would be very surprising if they did occur.
>

Today's LLVM already emits 'lock or %eax, (%esp)' for 'fence
seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when
targeting 32-bit x86 machines which do not support mfence.  What
instruction sequence should we be using instead?

>
>
>
> On Wed, Jan 13, 2016 at 10:59 AM, Tim Northover <t.p.northover at gmail.com>
> wrote:
>
>> > I haven't touched ARMv8 in a few years so I'm rusty on the non-temporal
>> > details for that ISA. I lifted this example from here:
>> >
>> >
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CJACGJJF.html
>> >
>> > Which is correct?
>>
>> FWIW, I agree with John here. The example I'd give for the unexpected
>> behaviour allowed in the spec is:
>>
>> .Lwait_for_data:
>>     ldr x0, [x3]
>>     cbz x0, .Lwait_for_data
>>     ldnp x2, x1, [x0]
>>
>> where another thread first writes to a buffer then tells us where that
>> buffer is. For a normal ldp, the address dependency rule means we
>> don't need a barrier or acquiring load to ensure we see the real data
>> in the buffer. For ldnp, we would need a barrier to prevent stale
>> data.
>>
>> I suspect this is actually even closer to the x86 situation than what
>> the guide implies (which looks like a straight-up exposed pipeline to
>> me, beyond even what Alpha would have done).
>>
>> Cheers.
>>
>> Tim.
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/94254859/attachment.html>