[Clang] Convergent Attribute

Mon May 9 14:43:27 PDT 2016

On Sun, May 8, 2016 at 12:43 PM, Matt Arsenault via cfe-commits <
cfe-commits at lists.llvm.org> wrote:

> On May 6, 2016, at 18:12, Richard Smith via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
> On Fri, May 6, 2016 at 4:20 PM, Matt Arsenault via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
>> On 05/06/2016 02:42 PM, David Majnemer via cfe-commits wrote:
>>
>>> This example looks wrong to me. It doesn't seem meaningful for a
>>> function to be both readonly and convergent, because convergent means the
>>> call has some side-effect visible to other threads and readonly means the
>>> call has no side-effects visible outside the function.
>>>
>> This s not correct. It is valid for convergent operations to be
>> readonly/readnone. Barriers are a common case which do have side effects,
>> but there are also classes of GPU instructions which do not access memory
>> and still need the convergent semantics.
>>
>
> Can you give an example? It's not clear to me how a function could be both
> convergent and satisfy the readnone requirement that it not "access[...]
> any mutable state (e.g. memory, control registers, etc) visible to caller
> functions". Synchronizing with other threads seems like it would cause such
> a state change in an abstract sense. Is the critical distinction here that
> the state mutation is visible to the code that spawned the gang of threads,
> but not to other threads within the gang? (This seems like a bug in the
> definition of readonly if so, because it means that a readonly call whose
> result is unused cannot be deleted.)
>
> I care about this because Clang maps __attribute__((pure)) to LLVM
> readonly, and -- irrespective of the LLVM semantics -- a call to a function
> marked pure is permitted to be deleted if the return value is unused, or to
> have multiple calls CSE'd. As a result, inside Clang, we use that attribute
> to determine whether an expression has side effects, and Clang's reasoning
> about these things may also lead to miscompiles if a call to a function
> marked __attribute__((pure, convergent)) actually can have a side effect.
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>
> These are communication operations between lanes that do not require
> synchronization within the wavefront. These are mostly cross lane
> communication instructions. An example would be the amdgcn.mov.dpp
> instruction, which reads a register from a neighboring lane, or the CUDA
> warp vote functions.
>

Those both appear to technically fail to satisfy the requirements of an
__attribute__((pure)) function. If I understand correctly, the DPP function
effectively stores a value into some state that is shared with another lane
(from Clang and LLVM's perspectives, state that is visible to a function
evaluation outside the current one), and then reads a value from another
such shared storage location. The CUDA warp vote functions effectively
store a value into some state that is shared with all other threads in the
warp and then read some summary information about the values stored by all
the threads. In both cases, the function mutates state that is visible to
other functions running on other threads, and so is not
__attribute__((pure)) / readonly, as far as I can see.

It seems to me that this change weakens the definition of these attributes
when combined with the convergent attribute to mean that the function *is*
still allowed to store to mutable state that's shared with other lanes /
other threads in the same warp, but only via convergent combined store/load
primitives. That makes some sense, given that the behavior of the
*execution* model does not (necessarily) treat each notional lane as a
separate thread, and from that perspective the instruction can be viewed as
operating on a vector and communicating only with itself, but it doesn't
match the current definitions of the semantics of these attributes (which
are specified in terms of the *source* model, in which each notional lane
is a separate invocation of the function). So I'd like at least for some
documentation to be added for our "pure" and "const" attributes, saying
something like "if this is combined with the "convergent" attribute, the
function may still communicate with other lanes through convergent
operations, even though such communication notionally involves modification
of mutable state visible to the other lanes". I'd suggest a similar change
also be made to LLVM's LangRef.

I've checked through how clang is using the "pure" attribute, and it seems
like it should mostly do the right thing in this case. There are a few
places where (using your amdgcn.mov.dpp example) we would cause a dpp
instruction to be emitted where the source code called the relevant
operation from within an operand that we do not notionally evaluate (for
instance, the operand of a __assume or __builtin_object_size). Contrived
example:

  int arr[N];
  int dpp(int n) __attribute__((convergent, pure));
  void f(int id) {
    int x = 0;
    if (id % 2) x = __builtin_object_size(&arr[dpp(id)], 0);
  }

We'll emit code to call the dpp function here, because we believe it has no
side-effects. However, in both cases where we do this, we require the
relevant expression to have defined behaviour (even though we say we won't
perform any side-effects contained within it) so it wouldn't be valid to
call a convergent function except from a convergent point in the
surrounding function. So I think the worst effect of this would be that we
would emit extra convergent operations; the resulting code should still be
correct.

There is no synchronization required, and there is no other way for the
> same item to access that information private to the other workitem. There’s
> no observable global state from the perspective of a single lane. The
> individual registers changed aren’t visible to the spawning host program
> (perhaps with the exception of some debug hardware inspecting all of the
> individual registers). Deleting these would be perfectly acceptable if the
> result is unused.
>
> -Matt
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160509/4edb8f50/attachment-0001.html>