<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, May 8, 2016 at 12:43 PM, Matt Arsenault via cfe-commits <span dir="ltr"><<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><blockquote type="cite"><div><div class="h5"><div>On May 6, 2016, at 18:12, Richard Smith via cfe-commits <<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>> wrote:</div><br></div></div><div><div><div class="h5"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, May 6, 2016 at 4:20 PM, Matt Arsenault via cfe-commits <span dir="ltr"><<a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 05/06/2016 02:42 PM, David Majnemer via cfe-commits wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
This example looks wrong to me. It doesn't seem meaningful for a function to be both readonly and convergent, because convergent means the call has some side-effect visible to other threads and readonly means the call has no side-effects visible outside the function.<br>
</blockquote></span>
This s not correct. It is valid for convergent operations to be readonly/readnone. Barriers are a common case which do have side effects, but there are also classes of GPU instructions which do not access memory and still need the convergent semantics.<br></blockquote><div><br></div><div>Can you give an example? It's not clear to me how a function could be both convergent and satisfy the readnone requirement that it not "access[...] any mutable state (e.g. memory, control registers, etc) visible to caller functions". Synchronizing with other threads seems like it would cause such a state change in an abstract sense. Is the critical distinction here that the state mutation is visible to the code that spawned the gang of threads, but not to other threads within the gang? (This seems like a bug in the definition of readonly if so, because it means that a readonly call whose result is unused cannot be deleted.)</div><div><br></div><div>I care about this because Clang maps __attribute__((pure)) to LLVM readonly, and -- irrespective of the LLVM semantics -- a call to a function marked pure is permitted to be deleted if the return value is unused, or to have multiple calls CSE'd. As a result, inside Clang, we use that attribute to determine whether an expression has side effects, and Clang's reasoning about these things may also lead to miscompiles if a call to a function marked __attribute__((pure, convergent)) actually can have a side effect.</div></div></div></div></div></div><span class="">
_______________________________________________<br>cfe-commits mailing list<br><a href="mailto:cfe-commits@lists.llvm.org" target="_blank">cfe-commits@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br></span></div></blockquote></div><br><div>These are communication operations between lanes that do not require synchronization within the wavefront. These are mostly cross lane communication instructions. An example would be the amdgcn.mov.dpp instruction, which reads a register from a neighboring lane, or the CUDA warp vote functions.</div></div></blockquote><div><br></div><div>Those both appear to technically fail to satisfy the requirements of an __attribute__((pure)) function. If I understand correctly, the DPP function effectively stores a value into some state that is shared with another lane (from Clang and LLVM's perspectives, state that is visible to a function evaluation outside the current one), and then reads a value from another such shared storage location. The CUDA warp vote functions effectively store a value into some state that is shared with all other threads in the warp and then read some summary information about the values stored by all the threads. In both cases, the function mutates state that is visible to other functions running on other threads, and so is not __attribute__((pure)) / readonly, as far as I can see.</div><div><br></div><div>It seems to me that this change weakens the definition of these attributes when combined with the convergent attribute to mean that the function *is* still allowed to store to mutable state that's shared with other lanes / other threads in the same warp, but only via convergent combined store/load primitives. That makes some sense, given that the behavior of the *execution* model does not (necessarily) treat each notional lane as a separate thread, and from that perspective the instruction can be viewed as operating on a vector and communicating only with itself, but it doesn't match the current definitions of the semantics of these attributes (which are specified in terms of the *source* model, in which each notional lane is a separate invocation of the function). So I'd like at least for some documentation to be added for our "pure" and "const" attributes, saying something like "if this is combined with the "convergent" attribute, the function may still communicate with other lanes through convergent operations, even though such communication notionally involves modification of mutable state visible to the other lanes". I'd suggest a similar change also be made to LLVM's LangRef.</div><div><br></div><div><br></div><div>I've checked through how clang is using the "pure" attribute, and it seems like it should mostly do the right thing in this case. There are a few places where (using your amdgcn.mov.dpp example) we would cause a dpp instruction to be emitted where the source code called the relevant operation from within an operand that we do not notionally evaluate (for instance, the operand of a __assume or __builtin_object_size). Contrived example:</div><div><br></div><div> int arr[N];</div><div> int dpp(int n) __attribute__((convergent, pure));</div><div> void f(int id) {</div><div> int x = 0;</div><div> if (id % 2) x = __builtin_object_size(&arr[dpp(id)], 0);</div><div> }</div><div><br></div><div>We'll emit code to call the dpp function here, because we believe it has no side-effects. However, in both cases where we do this, we require the relevant expression to have defined behaviour (even though we say we won't perform any side-effects contained within it) so it wouldn't be valid to call a convergent function except from a convergent point in the surrounding function. So I think the worst effect of this would be that we would emit extra convergent operations; the resulting code should still be correct.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div> There is no synchronization required, and there is no other way for the same item to access that information private to the other workitem. There’s no observable global state from the perspective of a single lane. The individual registers changed aren’t visible to the spawning host program (perhaps with the exception of some debug hardware inspecting all of the individual registers). Deleting these would be perfectly acceptable if the result is unused.</div><div><br></div><div>-Matt</div></div><br>_______________________________________________<br>
cfe-commits mailing list<br>
<a href="mailto:cfe-commits@lists.llvm.org">cfe-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>
<br></blockquote></div><br></div></div>