[llvm-commits] [cfe-commits] Patch: Add get_global_id builtin/intrinsic

Tue Feb 7 12:53:29 PST 2012

On Tue, Feb 07, 2012 at 08:16:23PM +0000, Peter Collingbourne wrote:
> On Fri, Feb 03, 2012 at 05:24:38PM -0500, Tom Stellard wrote:
> > On Fri, Feb 03, 2012 at 10:07:03PM +0000, Rotem, Nadav wrote:
> > > Tom, 
> > > 
> > > Our OpenCL implementation of get_global_id is not target specific and we don't resolve it in the backend. I think that get_global_id should be implemented as a simple library call.  However, this is something that needs to be discussed with Tanya Lattner, Peter Collingbourne, Anton Lokhmotov, etc. 
> > > 
> > > Nadav
> > 
> > Nadav,
> > 
> > Sorry, I guess I should be more clear.  When I say target specific I'm
> > talking about GPU targets.  The get_global_id() implementation
> > on the GPUs we've written a backend for (Evergreen, Northern Islands)
> > requires reading values from special registers that are preloaded by the
> > hardware.  I'm guessing other GPUs do something similar, so I do think
> > it is something that would need to be resolved in the backend.
>

Hi Peter,

> Hi Tom,
> 
> I disagree.  There is no need to resolve these functions in the
> backend, as the register reads can simply be made part of the
> get_global_id implementation.  The libclc OpenCL C standard library
> already targets NVIDIA GPUs, and it implements get_global_id and the
> other work-item functions in exactly this way:
> 
> http://git.pcc.me.uk/?p=~peter/libclc.git;a=blob;f=ptx-nvidiacl/include/clc/workitem/get_global_id.h 
> 

I looked at this before and it seems like a good approach, but the one
thing I couldn't figure out was: If I have clang embedded in my OpenCL
implementation and it is using an out of tree backend, how do I get
clang to recognize my target's builtins as valid and then map them to
the appropriate target specific intrinsic?

Thanks,
Tom

> Your way, we leak language-specific details into the backends, details
> which can be dealt with by the frontend and standard library, so that
> the backends can be kept language independent.
> 
> Also, there is no one way of implementing functions like get_global_id
> on (say) PTX.  One thing that is missing from the get_global_id
> implementation in libclc is global offsets.  The precise details
> of how those offsets are passed to the kernel have varied over
> time (for example, at one point a global array was used, and now
> special registers are used).  If we encode these sorts of details
> in the backend we reduce overall flexibility and deny optimisation
> opportunities to the optimisers.  For example, say the kernel computes:
> 
> get_global_id(0) - get_global_offset(0)
> 
> On PTX the optimisers should be able to eliminate the global offset
> access altogether, but they might not be able to if get_global_id
> and get_global_offset are opaque intrinsic calls.
> 
> Thanks,
> -- 
> Peter
>