[Clang] Convergent Attribute

Mon May 9 16:41:43 PDT 2016

----- Original Message -----

> From: "Richard Smith via cfe-commits" <cfe-commits at lists.llvm.org>
> To: "Matt Arsenault" <arsenm2 at gmail.com>
> Cc: "Clang Commits" <cfe-commits at lists.llvm.org>
> Sent: Monday, May 9, 2016 4:45:04 PM
> Subject: Re: [Clang] Convergent Attribute

> On Mon, May 9, 2016 at 2:43 PM, Richard Smith < richard at metafoo.co.uk
> > wrote:

> > On Sun, May 8, 2016 at 12:43 PM, Matt Arsenault via cfe-commits <
> > cfe-commits at lists.llvm.org > wrote:
> 

> > > > On May 6, 2016, at 18:12, Richard Smith via cfe-commits <
> > > > cfe-commits at lists.llvm.org > wrote:
> > > 
> > 
> 

> > > > On Fri, May 6, 2016 at 4:20 PM, Matt Arsenault via cfe-commits
> > > > <
> > > > cfe-commits at lists.llvm.org > wrote:
> > > 
> > 
> 

> > > > > On 05/06/2016 02:42 PM, David Majnemer via cfe-commits wrote:
> > > > 
> > > 
> > 
> 

> > > > > > This example looks wrong to me. It doesn't seem meaningful
> > > > > > for
> > > > > > a
> > > > > > function to be both readonly and convergent, because
> > > > > > convergent
> > > > > > means the call has some side-effect visible to other
> > > > > > threads
> > > > > > and
> > > > > > readonly means the call has no side-effects visible outside
> > > > > > the
> > > > > > function.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > This s not correct. It is valid for convergent operations to
> > > > > be
> > > > > readonly/readnone. Barriers are a common case which do have
> > > > > side
> > > > > effects, but there are also classes of GPU instructions which
> > > > > do
> > > > > not
> > > > > access memory and still need the convergent semantics.
> > > > 
> > > 
> > 
> 

> > > > Can you give an example? It's not clear to me how a function
> > > > could
> > > > be
> > > > both convergent and satisfy the readnone requirement that it
> > > > not
> > > > "access[...] any mutable state (e.g. memory, control registers,
> > > > etc)
> > > > visible to caller functions". Synchronizing with other threads
> > > > seems
> > > > like it would cause such a state change in an abstract sense.
> > > > Is
> > > > the
> > > > critical distinction here that the state mutation is visible to
> > > > the
> > > > code that spawned the gang of threads, but not to other threads
> > > > within the gang? (This seems like a bug in the definition of
> > > > readonly if so, because it means that a readonly call whose
> > > > result
> > > > is unused cannot be deleted.)
> > > 
> > 
> 

> > > > I care about this because Clang maps __attribute__((pure)) to
> > > > LLVM
> > > > readonly, and -- irrespective of the LLVM semantics -- a call
> > > > to
> > > > a
> > > > function marked pure is permitted to be deleted if the return
> > > > value
> > > > is unused, or to have multiple calls CSE'd. As a result, inside
> > > > Clang, we use that attribute to determine whether an expression
> > > > has
> > > > side effects, and Clang's reasoning about these things may also
> > > > lead
> > > > to miscompiles if a call to a function marked
> > > > __attribute__((pure,
> > > > convergent)) actually can have a side effect.
> > > > _______________________________________________
> > > 
> > 
> 
> > > > cfe-commits mailing list
> > > 
> > 
> 
> > > > cfe-commits at lists.llvm.org
> > > 
> > 
> 
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
> > > 
> > 
> 

> > > These are communication operations between lanes that do not
> > > require
> > > synchronization within the wavefront. These are mostly cross lane
> > > communication instructions. An example would be the
> > > amdgcn.mov.dpp
> > > instruction, which reads a register from a neighboring lane, or
> > > the
> > > CUDA warp vote functions.
> > 
> 
> > Those both appear to technically fail to satisfy the requirements
> > of
> > an __attribute__((pure)) function. If I understand correctly, the
> > DPP function effectively stores a value into some state that is
> > shared with another lane (from Clang and LLVM's perspectives, state
> > that is visible to a function evaluation outside the current one),
> > and then reads a value from another such shared storage location.
> > The CUDA warp vote functions effectively store a value into some
> > state that is shared with all other threads in the warp and then
> > read some summary information about the values stored by all the
> > threads. In both cases, the function mutates state that is visible
> > to other functions running on other threads, and so is not
> > __attribute__((pure)) / readonly, as far as I can see.
> 
> (And just to be clear, the fact that no actual storage is used for
> this is irrelevant to the notional semantics of the operation. Note
> that the definition of the pure attribute also covers "control
> registers, etc".)

> > It seems to me that this change weakens the definition of these
> > attributes when combined with the convergent attribute to mean that
> > the function *is* still allowed to store to mutable state that's
> > shared with other lanes / other threads in the same warp, but only
> > via convergent combined store/load primitives. That makes some
> > sense, given that the behavior of the *execution* model does not
> > (necessarily) treat each notional lane as a separate thread, and
> > from that perspective the instruction can be viewed as operating on
> > a vector and communicating only with itself, but it doesn't match
> > the current definitions of the semantics of these attributes (which
> > are specified in terms of the *source* model, in which each
> > notional
> > lane is a separate invocation of the function). So I'd like at
> > least
> > for some documentation to be added for our "pure" and "const"
> > attributes, saying something like "if this is combined with the
> > "convergent" attribute, the function may still communicate with
> > other lanes through convergent operations, even though such
> > communication notionally involves modification of mutable state
> > visible to the other lanes". I'd suggest a similar change also be
> > made to LLVM's LangRef.
> 
+1 

It seems like we need to be explicit, however, that the modified state is only accessible via the return value of the function. 

-Hal 

> > I've checked through how clang is using the "pure" attribute, and
> > it
> > seems like it should mostly do the right thing in this case. There
> > are a few places where (using your amdgcn.mov.dpp example) we would
> > cause a dpp instruction to be emitted where the source code called
> > the relevant operation from within an operand that we do not
> > notionally evaluate (for instance, the operand of a __assume or
> > __builtin_object_size). Contrived example:
> 

> > int arr[N];
> 
> > int dpp(int n) __attribute__((convergent, pure));
> 
> > void f(int id) {
> 
> > int x = 0;
> 
> > if (id % 2) x = __builtin_object_size(&arr[dpp(id)], 0);
> 
> > }
> 

> > We'll emit code to call the dpp function here, because we believe
> > it
> > has no side-effects. However, in both cases where we do this, we
> > require the relevant expression to have defined behaviour (even
> > though we say we won't perform any side-effects contained within
> > it)
> > so it wouldn't be valid to call a convergent function except from a
> > convergent point in the surrounding function. So I think the worst
> > effect of this would be that we would emit extra convergent
> > operations; the resulting code should still be correct.
> 

> > > There is no synchronization required, and there is no other way
> > > for
> > > the same item to access that information private to the other
> > > workitem. There’s no observable global state from the perspective
> > > of
> > > a single lane. The individual registers changed aren’t visible to
> > > the spawning host program (perhaps with the exception of some
> > > debug
> > > hardware inspecting all of the individual registers). Deleting
> > > these
> > > would be perfectly acceptable if the result is unused.
> > 
> 

> > > -Matt
> > 
> 
> > > _______________________________________________
> > 
> 
> > > cfe-commits mailing list
> > 
> 
> > > cfe-commits at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
> > 
> 

> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160509/5276b42f/attachment-0001.html>