[PATCH] D21723: [RFC] Enhance synchscope representation

Konstantin Zhuravlyov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 23 15:51:23 PDT 2017


kzhuravl added a reviewer: reames.
kzhuravl marked an inline comment as done.
kzhuravl added a comment.

@sanjoy is already a review. I have added @reames as a reviewer.



================
Comment at: docs/LangRef.rst:2173
+example, OpenCL supports separate memory scopes for device, work-group and
+sub-group).
+
----------------
mehdi_amini wrote:
> t-tye wrote:
> > mehdi_amini wrote:
> > > t-tye wrote:
> > > > kzhuravl wrote:
> > > > > mehdi_amini wrote:
> > > > > > I'm not convinced by the wording `it is target defined how ``<n>`` affects which other operations`, it can be read that even with the same scope it is still target define, which is not the intent right?
> > > > > > 
> > > > > > Straw man alternative:
> > > > > > 
> > > > > > ```
> > > > > > If an atomic operation is marked ``syncscope(<n>)``, then it *synchronizes with*,
> > > > > > and participates in the seq\_cst total orderings of, other operations
> > > > > > running in the same scope. It is target defined how it interacts with operations
> > > > > > in a different scope.
> > > > > > ```
> > > > > > 
> > > > > Yes, you are right. I have updated the wording.
> > > > In OpenCL, memory scopes define scope "levels" which correspond to how threads are grouped for execution. For example, a set of threads that belong to the same work-group, and specify the "workgroup" scope, only synchronize with threads in the same work-group instance. Threads in a different work-group instance do not synchronize even if they specify the "workgroup" scope.
> > > > 
> > > > So in general, if the scopes are target specific, it is also target specific how the the threads are grouped, and how scopes are related (for example inclusive).
> > > > 
> > > > I was also curious what the definition of singlethread is. My understanding was that synchronization only happens between atomic operations. So does both the signal handler and the non-signal handler code have to have atomic operations using singlethread? If used in a fence, what atomic operations pair with the fence to create the synchronizes-with relation? Generally non-atomic operations would not pair. The wording seems to imply a singlethread atomic is synchronizing with any operations in the same thread, and is presumably using thread to mean OS thread, rather than memory model notion of thread of execution which would have the signal handler and non signal handler code execution in different threads.
> > > > 
> > > > Since this text is fairly informal, how correct does it need to be?
> > > So labelling an atomic operation with a syncscope named "workgroup" would not be correct according to the definition above:
> > > 
> > > `````If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target specific synchronization scope, then it *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations marked ``syncscope("<a>")`````
> > > 
> > > That seems like an issue to me.
> > > 
> > A suggested improvement would be:
> > 
> > ```
> > If an atomic operation is marked ``syncscope("<a>")``, where ``<a>`` is target
> > specific synchronization scope, then it *synchronizes with*, and participates in
> > the seq\_cst total orderings of, other atomic operations marked
> > ``syncscope("<a>")`` that a members of the same instance of scope ``<a>``.
> > ```
> > 
> > The thing that is missing from the text is the notion of scope instances. When a thread is created, it is implicitly defined which scope instances it is a member. For example, when an OpenCL dispatch is executed, it specifies a device that it will be executed on. All threads for the dispatch are considered members of the agent scope instance for that device. Other dispatches executing on a different device will be members of the agent scope instance for that other device. Atomic operations marked ``syncscope("agent")`` will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked ``syncscope("agent")`` and are members of the same agent scope instance.
> > 
> > Furthermore, the threads of a dispatch are partitioned into separate work-group instances. Atomic operations marked ``syncscope("workgroup")`` will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked ``syncscope("workgroup")`` and are members of the same work-group scope instance.
> > 
> > Furthermore, the threads of a work-group are partitioned into separate subgroup instances. Atomic operations marked ``syncscope("wave")`` will only *synchronizes with*, and participates in the seq\_cst total orderings of, other atomic operations that are marked ``syncscope("wave")`` and are members of the same subgroup scope instance.
> > 
> > The OpenCL memory model currently requires the scope specified by atomic memory operations must exactly match for them to be considered compatible scope instances. However, other memory models exist that have scopes, and allow scope inclusion where threads are considered in compatible scope instances if they are members of the same scope instance specified in the operation, or some smaller scope in the scope hierarchy.
> > 
> > Since scopes are target specific one would need to ask the target if two atomic operations "may" synchronize.
> > The thing that is missing from the text is the notion of scope instances.
> 
> Yes and it seems to me that it is a significant issue with the design of syncscope here: we can't just have a yes/no answer when considering the synchronization of two operations, but we need consider the "may" as well (so the query answer would be {must, may, no}).
> It also means that a generic analysis without target support can't infer anything based on the syncscopes, they are purely opaque.
> 
> I think at least @sanjoy or @reames should re-review this (and the text needs to be fixed to reflect the "may").
> 
Text has been updated to include scope instances.

Once syncscopes are available, the next step could be to introduce tti queries that optimizations may need such as if two atomic operations "may" synchronize. Are there any current optimizations that need such queries now? If not, that could be done as follow up work when we know what is needed.


https://reviews.llvm.org/D21723





More information about the llvm-commits mailing list