[LLVMdev] Predicated Vector Operations
Dan Gohman
dan433584 at gmail.com
Thu May 9 08:36:56 PDT 2013
On Thu, May 9, 2013 at 7:47 AM, <dag at cray.com> wrote:
> Chandler Carruth <chandlerc at google.com> writes:
>
> > What are the desired memory model semantics for a masked store?
> > Specifically, let me suppose a simplified vector model of <2 x i64> on
> > an i64-word-size platform.
> >
> > masked_store(<42, 42>, Ptr, <true, false>)
> >
> > Does this write to the entier <2 x i64> object stored at Ptr or not?
>
> No. It writes one element.
>
> > Put another way, consider:
> >
> > thread A:
> > ...
> > masked_store(<42, 42>, Ptr, <true, false>)
> > ...
> >
> > thread B:
> > ...
> > masked_store(<42, 42>, Ptr, <false, true>)
> > ...
> >
> > Assuming there is no specific synchronization relevant to Ptr between
> > these two threads and their masked stores, does this form a data race
> > or not?
>
> It entirely depends on the hardware implementation. In most cases I
> would say yes due to cache conherence issues. From a purely theoretical
> machine that doesn't have false sharing, there would be no data race.
>
> Of course this assumes that thread B won't access the element stored by
> thread A and vice versa.
>
> > From a memory model perspective, if this does *not* form a data race,
> > that makes this tremendously more complex to implement, analyze, and
> > optimize... I'm somewhat hopeful that the desired semantics are for
> > this to form a datarace (and thus require synchronization when
> > occurring in different threads like this).
>
> Most of the time the compiler will not know the mask value and will have
> to be conservative. As Nadav has pointed out, what constitutes
> "conservative" is entirely context-dependent.
>
> But I don't understand why defining this as not being a data race would
> complicate things. I'm assuming the mask values are statically known.
> Can you explain a bit more?
>
It's an interesting question for autovectorization, for example.
Thread A:
for (i=0;i<n;++i)
if (i&1)
X[i] = 0;
Thread B:
for (i=0;i<n;++i)
if (!(i&1))
X[i] = 1;
The threads run concurrently without synchronization. As written, there is
no race. Can you vectorize either of these loops? If masked-out elements of
a predicated store are "in play" for racing, then vectorizing would
introduce a race. And, it'd be hard for an optimizer to prove that this
doesn't happen.
Dan
p.s. Yes, you could also vectorize these with a strided store or a scatter,
but then it raises a different question, of the memory semantics for
strided or scatter stores.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130509/04710130/attachment.html>
More information about the llvm-dev
mailing list