[LLVMdev] memory scopes in atomic instructions
Sahasrabuddhe, Sameer
sameer.sahasrabuddhe at amd.com
Wed Nov 19 09:54:23 PST 2014
On 11/19/2014 4:05 AM, Chandler Carruth wrote:
>
> On Fri, Nov 14, 2014 at 1:09 PM, Sahasrabuddhe, Sameer
> <sameer.sahasrabuddhe at amd.com <mailto:sameer.sahasrabuddhe at amd.com>>
> wrote:
>
> 1. Update the synchronization scope field in atomic instructions
> from a
> single bit to a wider field, say 32-bit unsigned integer.
>
>
> I think this should be an arbitrary bit width integer. I think baking
> any size into this is a mistake unless that size is "1".
I noticed that the LRM never specifies a width for address spaces, but
the implementation uses "unsigned" everywhere, which is clearly not an
arbitrary width integer. Is this how memory scopes should also be
implemented?
> 4. The use of "single thread scope" is not clear.
>
>
> Consider trying to read from memory written in a thread from a signal
> handler delivered to that thread. Essentially, there may be a need to
> write code which we know will execute in a single hardware thread, but
> where the compiler optimizations precluded by atomics need to be
> precluded as the control flow within the hardware thread may
> arbitrarily move from one sequence of instructions to another.
>
> If it is required in
> target-independent transforms,
>
>
> Yes, it is. sig_atomic_t.
Thanks! This also explains why SingleThread is baked into tsan. I
couldn't find a way to work around __tsan_atomic_signal_fence if I
removed SingleThread as a well-known memory scope.
>
> 5. Possibly add the following constraint on memory scopes: "The scope
> represented by a larger value is nested inside (is a proper subset
> of) the scope represented by a smaller value." This would also
> imply
> that the value used for single-thread scope must be the largest
> value used by the target.
> This constraint on "nesting" is easily satisfied by HSAIL (and also
> OpenCL), where synchronization scopes increase from a single
> work-item to the entire system. But it is conceivable that other
> targets do not have this constraint. For example, a platform may
> define synchronization scopes in terms of overlapping sets instead
> of proper subsets.
>
>
> I think this is the important thing to settle on in the design. I'd
> really like to hear from a diverse set of vendors and folks operating
> in the GPU space to understand whether having this constraint is
> critically important or problematic for any reasons.
I think "heterogenous systems" (in general, and not just HSA) might be a
better term since it covers more than just GPU devices.
Also, I don't see why this constraint in the general LLVM IR could be
critically important to some target. But I can see why it could be
problematic for a target! If I understand this correctly, the main issue
is that if we do not build nested scopes into the IR, then we can never
have target-independent optimizations that work with multiple memory
scopes. Is that correct? Is that really so important? What happens when
we do have a target that does not have nested memory scopes? Will it not
be harder to remove this assumption from the target-independent
optimizations?
> I think (unfortunately) it would be hard to add this later...
I am not sure I understand this part. The only effect I see is that
targets might use enumerations that do not follow a strict order in
their list of memory scopes. We can always encourage a future-looking
convention to list the memory scopes in nesting order. And in the worst
case, the enumerations can be reordered when the need arises, right?
Sameer.
More information about the llvm-dev
mailing list