[LLVMdev] Predicated Vector Operations

Thu May 9 07:47:55 PDT 2013

Chandler Carruth <chandlerc at google.com> writes:

> What are the desired memory model semantics for a masked store?
> Specifically, let me suppose a simplified vector model of <2 x i64> on
> an i64-word-size platform.
>
> masked_store(<42, 42>, Ptr, <true, false>)
>
> Does this write to the entier <2 x i64> object stored at Ptr or not?

No.  It writes one element.

> Put another way, consider:
>
> thread A:
> ...
> masked_store(<42, 42>, Ptr, <true, false>)
> ...
>
> thread B:
> ...
> masked_store(<42, 42>, Ptr, <false, true>)
> ...
>
> Assuming there is no specific synchronization relevant to Ptr between
> these two threads and their masked stores, does this form a data race
> or not?

It entirely depends on the hardware implementation.  In most cases I
would say yes due to cache conherence issues.  From a purely theoretical
machine that doesn't have false sharing, there would be no data race.

Of course this assumes that thread B won't access the element stored by
thread A and vice versa.

> From a memory model perspective, if this does *not* form a data race,
> that makes this tremendously more complex to implement, analyze, and
> optimize... I'm somewhat hopeful that the desired semantics are for
> this to form a datarace (and thus require synchronization when
> occurring in different threads like this).

Most of the time the compiler will not know the mask value and will have
to be conservative.  As Nadav has pointed out, what constitutes
"conservative" is entirely context-dependent.

But I don't understand why defining this as not being a data race would
complicate things.  I'm assuming the mask values are statically known.
Can you explain a bit more?

                                 -David