<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 9, 2013 at 7:47 AM,  <span dir="ltr"><<a href="mailto:dag@cray.com" target="_blank">dag@cray.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> writes:<br>

<br>

> What are the desired memory model semantics for a masked store?<br>

> Specifically, let me suppose a simplified vector model of <2 x i64> on<br>

> an i64-word-size platform.<br>

><br>

> masked_store(<42, 42>, Ptr, <true, false>)<br>

><br>

> Does this write to the entier <2 x i64> object stored at Ptr or not?<br>

<br>

</div>No.  It writes one element.<br>

<div class="im"><br>

> Put another way, consider:<br>

><br>

> thread A:<br>

> ...<br>

> masked_store(<42, 42>, Ptr, <true, false>)<br>

> ...<br>

><br>

> thread B:<br>

> ...<br>

> masked_store(<42, 42>, Ptr, <false, true>)<br>

> ...<br>

><br>

> Assuming there is no specific synchronization relevant to Ptr between<br>

> these two threads and their masked stores, does this form a data race<br>

> or not?<br>

<br>

</div>It entirely depends on the hardware implementation.  In most cases I<br>

would say yes due to cache conherence issues.  From a purely theoretical<br>

machine that doesn't have false sharing, there would be no data race.<br>

<br>

Of course this assumes that thread B won't access the element stored by<br>

thread A and vice versa.<br>

<div class="im"><br>

> From a memory model perspective, if this does *not* form a data race,<br>

> that makes this tremendously more complex to implement, analyze, and<br>

> optimize... I'm somewhat hopeful that the desired semantics are for<br>

> this to form a datarace (and thus require synchronization when<br>

> occurring in different threads like this).<br>

<br>

</div>Most of the time the compiler will not know the mask value and will have<br>

to be conservative.  As Nadav has pointed out, what constitutes<br>

"conservative" is entirely context-dependent.<br>

<br>

But I don't understand why defining this as not being a data race would<br>

complicate things.  I'm assuming the mask values are statically known.<br>

Can you explain a bit more?<br></blockquote><div><br></div><div style>It's an interesting question for autovectorization, for example.</div><div style><br></div><div style>Thread A:</div><div style>   for (i=0;i<n;++i)</div>

<div style>      if (i&1)</div><div style>        X[i] = 0;</div><div style><br></div><div style>Thread B:</div><div style>   for (i=0;i<n;++i)</div><div style>      if (!(i&1))</div><div style>        X[i] = 1;</div>

<div style><br></div><div style>The threads run concurrently without synchronization. As written, there is no race. Can you vectorize either of these loops? If masked-out elements of a predicated store are "in play" for racing, then vectorizing would introduce a race. And, it'd be hard for an optimizer to prove that this doesn't happen.</div>

<div style><br></div><div style>Dan</div><div style><br></div><div style>p.s. Yes, you could also vectorize these with a strided store or a scatter, but then it raises a different question, of the memory semantics for strided or scatter stores.</div>

</div></div></div>