[cfe-dev] OpenCL/CUDA Interop with PTX Back-End
justin.holewinski at gmail.com
Tue Oct 4 13:26:19 PDT 2011
On Tue, Oct 4, 2011 at 3:53 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> On Tue, Oct 04, 2011 at 07:28:26PM +0100, Peter Collingbourne wrote:
> > > and the OpenCL frontend seems to respect the address
> > > mapping but does not emit complete array definitions for
> > > __local arrays. Does the front-end currently not support __local
> > > embedded in the code? It seems to work if the __local arrays are
> passed as
> > > pointers to the kernel.
> > Clang should support __local arrays, and this looks like a genuine
> > bug in the IR generator. I will investigate.
> This actually seems to be an optimisation. Since only the first
> element of the array is accessed, LLVM will only allocate storage for
> that element. If you compile your example with -O0 (OpenCL compiles
> with optimisations turned on by default), you will see that the 64
> element array is created.
I'm not really convinced this is a legal optimization. What if you
purposely allocate arrays with extra padding to prevent bank conflicts in
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev