[LLVMdev] memory scopes in atomic instructions

Sun Nov 16 22:13:41 PST 2014

On 11/17/2014 10:51 AM, Owen Anderson wrote:

> It is already the case that address spaces can (potentially) alias. 
>  As such, the combination of address spaces and memory scopes can 
> represent any combination where the sharing properties of memory are 
> statically known, simply by having (potentially aliasing) address 
> spaces to represent memory pools that are only shared with a specific 
> combinations of agents.  One can imagine a GPU that worked like this, 
> and GPU programming models do generally differentiating various 
> sharing pools statically.

I am trying to understand this with a concrete example. OpenCL 2.0 
allows atomic instructions in the global address space, which is encoded 
as "1" in the SPIR target. The possible memory scopes are work_item, 
work_group, device and all_svm_devices. We could resolve the global 
address spaces into four statically known "synchronization pools", say 
"global_work_item", "global_work_group", etc. They would all alias with 
the real global address space, and could be encoded as new address 
spaces, is that correct? Then we wouldn't even need the memory scope 
argument on the atomic instruction, right?

Note that "global_work_item" isn't even a real address space, i.e., it 
is not a well-defined sequence of addresses that is located somewhere in 
the global address space. It's actually the set of all global locations 
that can potentially be accessed by atomic instructions using 
"work_item" memory scope in a given program. It is not required to be 
contiguous, and can alias with the entire global address space in the 
worst case.

So this is what it looks like to me: The proposal is to encode memory 
scopes as a new field that is orthogonal to address spaces. Address 
spaces are defined on locations, while memory scopes are defined on 
operations. Every combination of an address space and a memory scope 
represents a set of instructions synchronizing with a set of agents 
through a set of locations in that address space. The first two sets are 
statically known (not considering the effect of control flow on the 
instructions). But the set of locations is dynamic, and could span the 
whole address space in the absence of aliasing information.

> The case that this doesn’t handle is when the sharing properties are 
> not known statically.  However, I question the utility of designing 
> this, since there are no known systems that require it.  We should 
> design the representation to cover all reasonably anticipated systems, 
> not ones that don’t, and have no prospect of, existing.

Sure. But we could just leave this undefined for now, without losing the 
ability to express what we need. The idea is to not specify any 
semantics on non-zero memory scopes (such as assuming that they have a 
nesting order).

Sameer.