[llvm-commits] [cfe-commits] Patch: Add get_global_id builtin/intrinsic

Tue Feb 7 12:16:23 PST 2012

On Fri, Feb 03, 2012 at 05:24:38PM -0500, Tom Stellard wrote:
> On Fri, Feb 03, 2012 at 10:07:03PM +0000, Rotem, Nadav wrote:
> > Tom, 
> > 
> > Our OpenCL implementation of get_global_id is not target specific and we don't resolve it in the backend. I think that get_global_id should be implemented as a simple library call.  However, this is something that needs to be discussed with Tanya Lattner, Peter Collingbourne, Anton Lokhmotov, etc. 
> > 
> > Nadav
> 
> Nadav,
> 
> Sorry, I guess I should be more clear.  When I say target specific I'm
> talking about GPU targets.  The get_global_id() implementation
> on the GPUs we've written a backend for (Evergreen, Northern Islands)
> requires reading values from special registers that are preloaded by the
> hardware.  I'm guessing other GPUs do something similar, so I do think
> it is something that would need to be resolved in the backend.

Hi Tom,

I disagree.  There is no need to resolve these functions in the
backend, as the register reads can simply be made part of the
get_global_id implementation.  The libclc OpenCL C standard library
already targets NVIDIA GPUs, and it implements get_global_id and the
other work-item functions in exactly this way:

http://git.pcc.me.uk/?p=~peter/libclc.git;a=blob;f=ptx-nvidiacl/include/clc/workitem/get_global_id.h 

Your way, we leak language-specific details into the backends, details
which can be dealt with by the frontend and standard library, so that
the backends can be kept language independent.

Also, there is no one way of implementing functions like get_global_id
on (say) PTX.  One thing that is missing from the get_global_id
implementation in libclc is global offsets.  The precise details
of how those offsets are passed to the kernel have varied over
time (for example, at one point a global array was used, and now
special registers are used).  If we encode these sorts of details
in the backend we reduce overall flexibility and deny optimisation
opportunities to the optimisers.  For example, say the kernel computes:

get_global_id(0) - get_global_offset(0)

On PTX the optimisers should be able to eliminate the global offset
access altogether, but they might not be able to if get_global_id
and get_global_offset are opaque intrinsic calls.

Thanks,
-- 
Peter