[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces

Fri Oct 14 09:55:13 PDT 2011

On Thu, Oct 13, 2011 at 04:21:23PM -0400, Justin Holewinski wrote:
> On Thu, Oct 13, 2011 at 4:16 PM, Peter Collingbourne <peter at pcc.me.uk>wrote:
> 
> > On Thu, Oct 13, 2011 at 06:59:47PM +0000, Villmow, Micah wrote:
> > > Justin,
> > >  Out of these options, I would take the metadata approach for AA support.
> > >
> > > This doesn't solve the problem of different frontend/backends choosing
> > different
> > > address space representations for the same language, but is the correct
> > > approach for providing extra information to the optimizations.
> > >
> > > The issue about memory spaces in general is a little different. For
> > example, based on
> > > the code you posted below, address space 0(default) is global in CUDA,
> > but
> > > in OpenCL, the default address space is private. So, how does the ptx
> > backend
> > > handle the differences? I think this is problematic as address spaces
> > > are language constructs and hardcoded at the frontend, but the backend
> > needs to be
> > > able to interpret them differently based on the source language.
> > >
> > > One way this could be done is to have the backends have options, but then
> > > each backend would need to implement this. I think a better approach is
> > > to have some way to represent address spaces generically in the module.
> >
> > Address space 0 (i.e. the default address space) should always be the
> > address space on which the stack resides.  This is a requirement for
> > alloca to work correctly.  So for PTX, I think that address space 0
> > should be the local state space (but I noticed that at the moment it
> > is the global state space, which seems wrong IMHO).
> >
> 
> This is a bit hacky in the back-end at the moment.  When I started working
> with the back-end, address space 0 was already defined as global, and I have
> not broken that convention yet.
> 
> Then again, the issue is not really that big of a deal, since we need to
> specially handle all "stack" accesses anyway.  It doesn't really matter much
> what address space is used.

What kind of special handling would be required?  And how can you
always tell whether or not an access through address space 0 would
be a stack access?  For example, consider the attached .ll file,
which compiles to a global store here.

Thanks,
-- 
Peter
-------------- next part --------------
target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx32--"

@g = common global i32 0, align 4

define ptx_kernel void @foo(i32 %pred) nounwind noinline {
entry:
  %p = alloca i32, align 4
  %tobool = icmp ne i32 %pred, 0
  %g.p = select i1 %tobool, i32* @g, i32* %p
  store i32 1, i32* %g.p, align 4, !tbaa !1
  ret void
}

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}