[cfe-dev] OpenCL support - using metadata

Fri Mar 4 06:51:51 PST 2011

2011/3/4 Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi>:
> Hi,
>
> On 03/03/2011 10:11 PM, David Neto wrote:
>>
>> Last December I objected to the technique of transforming a __local
>> variable into a global-scope static variable. [1] [2].   In a private
>> email, Krister later explained how it all works out ok, as long as the
>
> Of course I do not know the explanation of Krister in the private email
> but doesn't this approach introduce problems in multithreaded
> execution of work groups in a single address space machine?
>
> That is, in case locals are converted to global-scope static variables,
> one cannot execute multiple work groups in parallel in the same process
> due to the shared storage locations for locals?
>
> I understand it works nicely if you have per core local address spaces
> in the machine and can execute the WGs in different cores (like in
> NVIDIA GPUs I've understood), but what about the execution in a GPP
> SMP multicore execution with threads?
>
> Is this known and accepted limitation or did I just misunderstand
> something (which is very likely the case)?
>
> --
> Pekka
>

No, it's not obvious how it works.  :-)

The front end converts "local" variables into global-scope static
variables but still retains the distinct address space.  The back end
recognizes such variables as special and collects them into a
relocatable section.  Accesses are generated as offsets from a base
pointer.  (You can discard the address space number at this point!)
When running multiple work groups in parallel, the different work
groups are given different values for the base pointer.  That is what
keeps the work groups from stomping on each other's data.

This works out even on a CPU with a single address space.  Come to
think of it, it's like using the old 8086 segment registers.

The whole system works provided the convention is consistently applied
all the way from front end to the code generator.    The need for
coordination implies a requirement that the Clang+LLVM target inform
the Clang front end that it can compile __local variables in this way.

thanks,
david