[llvm-dev] Memory scope proposal

Mon Apr 18 09:12:17 PDT 2016

Here is the initial proposal with some formatting fixed:

Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to
represent synchronization scope on atomic instructions. We would like to
enhance the representation of memory scopes in LLVM IR to allow more values
than just the current two. The intention of this email is to invite
comments on our proposal. There are some discussion before and it can be
found here:
https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ

Here is our new proposal:

=================================================================
We still let the bitcode store memory scopes as "unsigned integers", since
that is the easiest way to maintain compatibility. The values 0 and 1 are
special. All other values are meaningful only within that bc file. In
addition, "a global metadata in the file" will provide a map from unsigned
integers to string symbols which should be used to interpret all the
non-standard integers. If the global metadata is empty or non-existent,
then all non-zero values will be mapped to "system", which is the current
behavior.

The proposed syntax for synchronization scope is as follows:
* Synchronization scopes are of arbitrary width, but implemented as
  unsigned in the bitcode, just like address spaces.
* Cross-thread is default.
* Keyword "singlethread" is unchanged
* New syntax "synchscope(n)" for other target-specific scopes.
* There is no keyword for cross-thread, but it can be specified as
  "synchscope(0)".

The proposed new integer implementation expanded synchronization scopes are
as follows:

***********************************************************************
| Format       | Single Thread   | System (renamed) | Intermediate    |
----------------------------------------------------------------------|
| Bitcode      | zero            | one              | unsigned n      |
| Assembly     | singlethread,   | empty (default), | synchscope(n-1) |
|                synchscope(~0U)   synchscope(0)                      | 
| In-memory    | ~0U             | zero             | unsigned n-1    |
| SelectionDAG | ~0U             | zero             | unsigned n-1    |
***********************************************************************

The choice of “~0U” for singlethread makes it easy to maintain backward
compatibility in the bitcode. The values 0 and 1 remain unchanged in the
bitcode, and the reader simply decrements them by one to compute the
correct value in the in-memory data-structure.

Name Mapping

Now we comes to name mapping from integers to strings. If a CLANG front end
wants to map a language that has memory scopes (e.g. OpenCL) to LLVM IR,
how does it determine what syncscopes to use? Without any rules, each
target can define its own meaning for the scopes, can give them any name,
and can map them to the LLVM-IR unit values in any way. In this case, I
think each target have to provide a mapping function that maps a specific
language’s name for a scope into that targets name for a scope that has
conservatively the same semantics. Namely, the act of supporting a new
language that has memory scopes requires every target to support that
language to be updated accordingly.

Therefore, in order to allow front end writers to share memory scope
definitions when they match to avoid the effort of updating all targets for
each language,it's better to define standard memory scope names. A target
is free to implement them or not, but if a target does implement them they
must have the defined relational semantics (e.g., hierarchical nesting). If
a target does implement them then it will be able to support any language
that uses them, including languages not yet invented. A new memory scope
name can be added if the existing ones are insufficient.

With the first try, we can define the standard scopes with what a common
language that has memory scopes needs, e.g., OpenCL uses system, device,
workgroup, workitem. It uses the same approach as LLVM has done for debug
information. There are standard debug entities (that a common language (C)
needs), and each new language uses those standard entities where there is a
match, and subsequently defines only the delta.

A bitcode example with the proposal
*****************************************************************
define void  <at> test(i32* %addr) {
; forward compatibility
  cmpxchg i32* %addr, i32 42, i32 0 singlethread monotonic monotonic

; new synchscope that will be defined by each backend
  cmpxchg i32* %addr, i32 42, i32 0 synchscope(2) monotonic monotonic, 2
  cmpxchg i32* %addr, i32 42, i32 0 synchscope(3) monotonic monotonic, 3

  ret void
}

!synchscope = metadata !{{i32 0, !"SingleThread"}, {i32 2, !"WorkGroup"},
...}
*****************************************************************

=================================================================