[cfe-dev] Fwd: [LLVMdev] OpenCL support
David Neto
dneto.llvm at gmail.com
Wed Dec 8 08:53:23 PST 2010
Thanks for the real compilation output.
But it does not support an environment in which two work groups are
being executed at the same time. The work groups should be isolated
from each other, and so should have different storage for each of
those variables.
For example if we have two work groups with 4 work items each and
everything is run in parallel, then we should have two storage
locations for "vint". The first copy will be shared between the 4
items/threads in work group 0, and the second copy will be shared
between the 4 items/threads in group 1.
Now, it's fine for a particular implementation to decide it only wants
to ever run one work group at a time. So this is an ok choice inside
CodeGen if you know what target you're compiling for.
My original point was that making such a lowering decision in the AST
is overly restrictive.
(I hope I'm not being unnecessarily picky.)
I see that Peter's proposed patch has not made into SVN, so I won't
file a bug. Instead I'll wait and monitor ARM's patches. (No rush,
honest!)
cheers,
david
On Wed, Dec 8, 2010 at 1:46 AM, Krister Wombell <kuwerty at gmail.com> wrote:
> I would reconsider Micah's suggestion. The simple solution is to tag the
> variable with an address space and turn it into a global. You can do that
> with a simple change in CodeGenFunction::CreateStaticBlockVarDecl. It would
> give all the benefits you describe in that the target decides how to lower
> the code but do it using concepts that LLVM and some targets may already
> support. Kernels that call kernels with locals will also work.
> Perhaps an example is useful? Our OpenCL implementation, given the code
> above, generates this bitcode (after optimizations that have eliminated the
> dead vars):
> target datalayout = "e-p:32:32:32-f64:64:64-i64:64:64"
> target triple = "zms-ziilabs-opencl10"
> @foo.auto.vint = internal addrspace(2) global i32 0, align 4
> @foo.auto.vvint = internal addrspace(2) global i32 0, align 4
> define void @foo(i32 addrspace(1)* %A) nounwind {
> entry:
> %tmp1 = load i32 addrspace(1)* %A, align 4
> store i32 %tmp1, i32 addrspace(2)* @foo.auto.vint, align 4
> volatile store i32 %tmp1, i32 addrspace(2)* @foo.auto.vvint, align 4
> %tmp7 = volatile load i32 addrspace(2)* @foo.auto.vvint, align 4
> tail call void @llvm.memory.barrier(i1 true, i1 true, i1 true, i1 true, i1
> false)
> %add = add nsw i32 %tmp7, %tmp1
> store i32 %add, i32 addrspace(1)* %A, align 4
> ret void
> }
> Krister
>
More information about the cfe-dev
mailing list