[LLVMdev] memory scopes in atomic instructions

Tue Nov 18 14:35:22 PST 2014

On Fri, Nov 14, 2014 at 1:09 PM, Sahasrabuddhe, Sameer <
sameer.sahasrabuddhe at amd.com> wrote:

> 1. Update the synchronization scope field in atomic instructions from a
>    single bit to a wider field, say 32-bit unsigned integer.
>

I think this should be an arbitrary bit width integer. I think baking any
size into this is a mistake unless that size is "1".

> 2. Retain the current default of zero as "system scope", replacing the
>    current "cross thread" scope.
>

I would suggest, address-space scope.

> 3. All other values are target-defined.
>

You need to define single-thread scope.

> 4. The use of "single thread scope" is not clear.
>

Consider trying to read from memory written in a thread from a signal
handler delivered to that thread. Essentially, there may be a need to write
code which we know will execute in a single hardware thread, but where the
compiler optimizations precluded by atomics need to be precluded as the
control flow within the hardware thread may arbitrarily move from one
sequence of instructions to another.

> If it is required in
>    target-independent transforms,
>

Yes, it is. sig_atomic_t.

> then it could be encoded as just "1",
>    or as "all ones" in the wider field. The latter option is a bit
>    weird, because most targets will have very few scopes. But it is
>    useful in case the next point is included in LLVM IR.
>

If we go with your proposed constraint below, I think it is essential to
model single-thread-scope as the maximum integer. It should be a strict
subset of all inter-thread scopes.

> 5. Possibly add the following constraint on memory scopes: "The scope
>    represented by a larger value is nested inside (is a proper subset
>    of) the scope represented by a smaller value." This would also imply
>    that the value used for single-thread scope must be the largest
>    value used by the target.
>    This constraint on "nesting" is easily satisfied by HSAIL (and also
>    OpenCL), where synchronization scopes increase from a single
>    work-item to the entire system. But it is conceivable that other
>    targets do not have this constraint. For example, a platform may
>    define synchronization scopes in terms of overlapping sets instead
>    of proper subsets.
>

I think this is the important thing to settle on in the design. I'd really
like to hear from a diverse set of vendors and folks operating in the GPU
space to understand whether having this constraint is critically important
or problematic for any reasons.

I think (unfortunately) it would be hard to add this later...

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/00ce74b2/attachment.html>