[llvm-dev] Memory scope proposal

Tue Mar 29 10:03:32 PDT 2016

> On Mar 28, 2016, at 7:17 PM, Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Ke,
> 
> I'll be the bearer of bad news here.  The radio silence this proposal has gotten probably means there is not enough interest in the community in this proposal to see it land.

FWIW, I’m very interested in seeing it go in, but haven’t had a lot of time to write a response.

> One concern I have with the current proposal is that the optimization value of these scopes is not clear to me.  Is it only the backend which is expected to support optimizations over these scopes?  Or are you expecting the middle end optimizer to understand them?  If so, I'd suspect we'd need a refined definition which allows us to discuss relative strengths of memory scopes.  

I don’t know about Ke’s use cases, but I at least am not very concerned with having any portion of LLVM optimize them.  Right now LLVM has no way to represent the information encoded here at all.

> More fundamentally, it's not clear to me that "scope" is even the right model for this.  I could see a case where we'd want something along the lines of "acquire semantics on memory space 1, release semantics on memory space 2, cst_seq semantics on address space 3”. 

Scopes are orthogonal to ordering constraints.  Scopes are about memory operation visibility, primarily in the context of a machine with non-coherent caches.  Imagine an accelerator with:

- Per HW thread load/store buffers
- Per core L1
- Accelerator-wide L2
- Whole-system DRAM

… and at any level of the hierarchy, the caching for one thread/core/accelerator may not be coherent with caches for other threads/cores/accelerators.

Scopes allow the program author to express the requisite visibility for a memory option; an that needs to be visible to other cores within the accelerator may need to bypass or flush the per-core L1.  Communication to the host CPU or other accelerators may similarly need to bypass the the L2.

—Owen

> Also, unless I'm misreading on my skim of your proposal, the current definition of scope is slightly off from what you've specified.  A "seq_cst singlethread" fence is a much weaker fence than a "seq_cst crossthread".  It's probably easiest to reason about the current scheme as having the cross product of {singlethread, crossthread} x {orderings...} distinct orderings rather than a set of orderings with two overlapping scopes.  
> 
> Philip
> 
> On 03/22/2016 01:42 PM, Ke Bai via llvm-dev wrote:
>>   Dear all,
>> 
>> Here is the plain text version of the proposal:
>> 
>> Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to represent synchronization scope on atomic instructions. We would like to enhance the representation of memory scopes in LLVM IR to allow more values than just the current two. The intention of this email is to invite comments on our proposal. There are some discussion before and it can be found here:
>> https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ <https://groups.google.com/forum/#%21searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ>
>> 
>> Here is our new proposal:
>> 
>> =================================================================
>> We still let the bitcode store memory scopes as "unsigned integers", since that is the easiest way to maintain compatibility. The values 0 and 1 are special. All other values are meaningful only within that bc file. In addition, "a global metadata in the file" will provide a map from unsigned integers to string symbols which should be used to interpret all the non-standard integers. If the global metadata is empty or non-existent, then all non-zero values will be mapped to "system", which is the current behavior.
>> 
>> The proposed syntax for synchronization scope is as follows:
>> * Synchronization scopes are of arbitrary width, but implemented as unsigned in the bitcode, just like address spaces.
>> * Cross-thread is default.
>> * Keyword "singlethread" is unchanged
>> * New syntax "synchscope(n)" for other target-specific scopes. 
>> * There is no keyword for cross-thread, but it can be specified as "synchscope(0)".
>> 
>> The proposed new integer implementation expanded synchronization scopes are as follows: 
>> ****************************************************************
>> Format                Single Thread              System (renamed)       Intermediate
>> Bitcode               zero                             one                              unsigned n
>> Assembly           singlethread,                empty (default),            synchscope(n-1)
>>                          synchscope(~0U)         synchscope(0)
>> In-memory          ~0U                             z ero                            unsigned n-1
>> SelectionDAG     ~0U                              zero                            unsigned n-1
>> ****************************************************************
>> 
>> The choice of “~0U” for singlethread makes it easy to maintain backward compatibility in the bitcode. The values 0 and 1 remain unchanged in the bitcode, and the reader simply decrements them by one to compute the correct value in the in-memory data-structure.
>> 
>> Name Mapping
>> 
>> Now we comes to name mapping from integers to strings. If a CLANG front end wants to map a language that has memory scopes (e.g. OpenCL) to LLVM IR, how does it determine what syncscopes to use? Without any rules, each target can define its own meaning for the scopes, can give them any name, and can map them to the LLVM-IR unit values in any way. In this case, I think each target have to provide a mapping function that maps a specific language’s name for a scope into that targets name for a scope that has conservatively the same semantics. Namely, the act of supporting a new language that has memory scopes requires every target to support that language to be updated accordingly.
>> 
>> Therefore, in order to allow front end writers to share memory scope definitions when they match to avoid the effort of updating all targets for each language,it's better to define standard memory scope names. A target is free to implement them or not, but if a target does implement them they must have the defined relational semantics (e.g., hierarchical nesting). If a target does implement them then it will be able to support any language that uses them, including languages not yet invented. A new memory scope name can be added if the existing ones are insufficient. 
>> 
>> With the first try, we can define the standard scopes with what a common language that has memory scopes needs, e.g., OpenCL uses system, device, workgroup, workitem. It uses the same approach as LLVM has done for debug information. There are standard debug entities (that a common language (C) needs), and each new language uses those standard entities where there is a match, and subsequently defines only the delta.
>> 
>> A bitcode example with the proposal
>> *****************************************************************
>> define void  <at> test(i32* %addr) {
>> ; forward compatibility
>>   cmpxchg i32* %addr, i32 42, i32 0 singlethread monotonic monotonic
>> 
>> ; new synchscope that will be defined by each backend
>>   cmpxchg i32* %addr, i32 42, i32 0 synchscope(2) monotonic monotonic, 2
>>   cmpxchg i32* %addr, i32 42, i32 0 synchscope(3) monotonic monotonic, 3
>> 
>>   ret void
>> }
>> 
>> !synchscope = metadata !{{i32 0, !"SingleThread"}, {i32 2, !"WorkGroup"}, ...}
>> *****************************************************************
>> 
>> =================================================================
>> 
>> On Thu, Jan 28, 2016 at 12:27 PM, Ke Bai <kebai613 at gmail.com <mailto:kebai613 at gmail.com>> wrote:
>> Hi all,
>> 
>> Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to represent synchronization scope on atomic instructions. We would like to enhance the representation of memory scopes in LLVM IR to allow more values than just the current two. The intention of this email is to invite comments on our proposal. There are some discussion before and it can be found here:
>>  <https://groups.google.com/forum/#%21searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ>https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ <https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ>
>> Here is our new proposal:
>> =================================================================
>> We still let the bitcode store memory scopes as unsigned integers, since that is the easiest way to maintain compatibility. The values 0 and 1 are special. All other values are meaningful only within that bc file. In addition, a global metadata in the file will provide a map from unsigned integers to string symbols which should be used to interpret all the non-standard integers. If the global metadata is empty or non-existent, then all non-zero values will be mapped to "system", which is the current behavior.
>> The proposed syntax for synchronization scope is as follows:
>> Synchronization scopes are of arbitrary width, but implemented as unsigned in the bitcode, just like address spaces.
>> Cross-thread is default.
>> Keyword "singlethread" is unchanged
>> New syntax "synchscope(n)" for other target-specific scopes. 
>> There is no keyword for cross-thread, but it can be specified as "synchscope(0)".
>> The proposed new integer implementation expanded synchronization scopes are as follows: 
>> Format
>> Single Thread
>> System (renamed)
>> Intermediate
>> Bitcode
>> zero
>> one
>> unsigned n
>> Assembly
>> singlethread,
>> synchscope(~0U)
>> empty (default),
>> synchscope(0)
>> synchscope(n-1)
>> In-memory
>> ~0U
>> zero
>> unsigned n-1
>> SelectionDAG
>> ~0U
>> zero
>> unsigned n-1
>> The choice of “~0U” for singlethread makes it easy to maintain backward compatibility in the bitcode. The values 0 and 1 remain unchanged in the bitcode, and the reader simply decrements them by one to compute the correct value in the in-memory data-structure.
>> Name Mapping
>> 
>> Now we comes to name mapping from integers to strings. If a CLANG front end wants to map a language that has memory scopes (e.g. OpenCL) to LLVM IR, how does it determine what syncscopes to use? Without any rules, each target can define its own meaning for the scopes, can give them any name, and can map them to the LLVM-IR unit values in any way. In this case, I think each target have to provide a mapping function that maps a specific language’s name for a scope into that targets name for a scope that has conservatively the same semantics. Namely, the act of supporting a new language that has memory scopes requires every target to support that language to be updated accordingly.
>> 
>> Therefore, in order to allow front end writers to share memory scope definitions when they match to avoid the effort of updating all targets for each language,it's better to define standard memory scope names. A target is free to implement them or not, but if a target does implement them they must have the defined relational semantics (e.g., hierarchical nesting). If a target does implement them then it will be able to support any language that uses them, including languages not yet invented. A new memory scope name can be added if the existing ones are insufficient. 
>> 
>> With the first try, we can define the standard scopes with what a common language that has memory scopes needs, e.g., OpenCL uses system, device, workgroup, workitem. It uses the same approach as LLVM has done for debug information. There are standard debug entities (that a common language (C) needs), and each new language uses those standard entities where there is a match, and subsequently defines only the delta.
>> 
>> A bitcode example with the proposal
>> define void  <at> test(i32* %addr) {
>> ; forward compatibility
>>   cmpxchg i32* %addr, i32 42, i32 0 singlethread monotonic monotonic
>> 
>> ; new synchscope that will be defined by each backend
>>   cmpxchg i32* %addr, i32 42, i32 0 synchscope(2) monotonic monotonic, 2
>>   cmpxchg i32* %addr, i32 42, i32 0 synchscope(3) monotonic monotonic, 3
>> 
>>   ret void
>> }
>> 
>> !synchscope = metadata !{{i32 0, !"SingleThread"}, {i32 2, !"WorkGroup"}, ...}
>> =================================================================
>> 
>> Thank you!
>> 
>> ---
>> Best regards,
>> Ke
>> 
>> 
>> 
>> -- 
>> Best Regard,
>> Ke Bai, Ph.D.
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160329/b73ddec2/attachment-0001.html>