[LLVMdev] [RFC][PATCH][OPENCL] synchronization scopes redux

Mon Jan 5 23:31:53 PST 2015

On Mon, Jan 5, 2015 at 10:51 PM, Owen Anderson <resistor at mac.com> wrote:

> Hi Sameer,
>
> > On Jan 5, 2015, at 4:51 AM, Sahasrabuddhe, Sameer <
> Sameer.Sahasrabuddhe at amd.com> wrote:
> >
> > Right. The second version of my patches fixes the bitcode encoding. But
> now I see another potential problem with future bitcode if we require an
> ordering on the scopes. What happens when a backend later introduces a new
> scope that goes into the middle of the order? If they renumber the scopes
> to accomodate this, then existing bitcode for that backend will no longer
> work. The bitcode reader/writer cannot compensate for this since the values
> are backend-specific. If we agree that this problem is real, then we cannot
> force an ordering on the scope numbers.
>
> That’s an interesting consideration, and something I hadn’t thought of.
> I’m unsure offhand of how much it matters in practice.  The alternative, I
> suppose, is having something like string-named scopes, but then we can’t do
> much with them at the IR level.
>

This has me somewhat non-plussed as well.

>
> > So far, I have refrained from proposing a keyword for cross thread scope
> in the text format, because (a) there never was one and (b) it is not
> strictly needed since it is the default anyway. I am fine either way, but
> we will first have to decide what the new keyword should be. I find
> "allthreads" to be a decent counterpart for "singlethread" ...
> "crossthread" is not good enough since intermediate scopes have multiple
> threads too.
>
> This actually raises another question.  In principle, the “most visible”
> scope ought to be something like “system” or “device”, meaning a completely
> uncached memory access that is visible to all peripherals in a
> heterogeneous system.  However, this is almost certainly not what we want
> to have for typical memory accesses.
>
> To summarize, a prototypical scope nest, from most to least visible (aka
> least to most cacheable) might look like:
>
> System  —>  AllThreads  —>  Various target-specific local scopes —>
> SingleThread
>
> If we wanted to go really gonzo, there could be a Network scope at the
> beginning for large-scale HPC systems, but I’m not sure how important that
> is to anyone.
>

I probably *should* be in a position to be very interested in such a
concept.... but honestly, I'm not. If I ever wanted to do something like
this, I would just define the large-scale HPC system as the "system" and a
single machine/node as some "local" scope.

>
> As a related question, do we actually need the local scopes to be target
> specific?  Are there systems, real or planned, that *aren’t* captured by:
>
> [Network —> ] System  —>  AllThreads  —>  ThreadGroup —> SingleThread ?
>

Sadly, I don't think this will work. In particular, there are real-world
accelerators with multiple tiers of thread groups that are visible in the
cache hierarchy subsystem.

I'm starting to think we might actually need to let the target define
acceptable strings for memory scopes and a strict weak ordering over
them.... That's really complex and heavy weight, but I'm not really
confident that we're safe committing to something more limited. The good
side is that we can add the SWO-stuff lazily as needed...

Dunno, thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150105/a2293e34/attachment.html>