<div class="gmail_quote">On Thu, Oct 13, 2011 at 7:56 PM, Mon P Wang <span dir="ltr"><<a href="mailto:monping@apple.com">monping@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word">Hi,<div><br></div><div>Tanya and I also prefer the extended TBAA solution as it naturally fits with LLVM.  From my understanding of TBAA, it seems to provide the power to describe the relationship between address spaces for alias analysis, i.e., it can describe if two address spaces are disjoint or one may nest within another.  For OpenCL, it is most useful to indicate that address spaces are disjoint from the point of view of alias analysis even though the underlying memory may be the same like in x86.   The question is there something missing in TBAA that it can't properly describe the semantics we want for an address space?</div>

</div></blockquote><div><br></div><div>From what I can tell, extending TBAA is perfectly fine for the alias problem.  I really just want to make sure we're providing enough hooks in the front-end and IR so that any back-end can be used for OpenCL code gen.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><br></div><div>  -- Mon Ping</div><div><br></div><div><br></div><div>

<br></div><div><div><div><div></div><div class="h5"><div>On Oct 13, 2011, at 1:14 PM, Justin Holewinski wrote:</div><br></div></div><blockquote type="cite"><div><div></div><div class="h5"><br><br><div class="gmail_quote">

On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi Justin,<br>

<br>

Thanks for bringing this up, I think it's important to discuss<br>

these issues here.<br>

<div><br>

On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:<br>

> It is becoming increasingly clear to me that LLVM address spaces are not the<br>

> general solution to OpenCL/CUDA memory spaces. They are a convenient hack to<br>

> get things working in the short term, but I think a more long-term approach<br>

> should be discussed and decided upon now before the OpenCL and CUDA<br>

> implementations in Clang/LLVM get too mature. To be clear, I am not<br>

</div>> advocating that *targets* change to a different method for representing<br>

<div>> device memory spaces. The current use of address spaces to represent<br>

> different types of device memory is perfectly valid, IMHO. However, this<br>

> knowledge should not be encoded in front-ends and pre-SelectionDAG<br>

> optimization passes.<br>

<br>

</div>I disagree.  The targets should expose all the address spaces they<br>

provide, and the frontend should know about the various address spaces<br>

it needs to know about.  It is incumbent on the frontend to deliver<br>

a valid IR for a particular language implementation, and part of<br>

that involves knowing about the ABI requirements for the language<br>

implementation (which may involve using specific address spaces)<br>

and the capabilities of each target (including the capabilities of<br>

the target's address spaces), together with the language semantics.<br>

It is not the job of the optimisers or backend to know the semantics<br>

for a specific language, a specific implementation of that language<br>

or a specific ABI.<br></blockquote><div><br></div><div>But this is assuming that a target's address spaces have a valid 1 to 1 mapping between OpenCL memory spaces and back-end address spaces.  What happens for a target such as x86?  Do we introduce pseudo address spaces into the back-end just to satisfy the front-end OpenCL requirements?</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

><br>

><br>

> *2. Solutions*<br>

<div>><br>

> A couple of solutions to this problem are presented here, with the hope that<br>

> the Clang/LLVM community will offer a constructive discussion on how best to<br>

> proceed with OpenCL/CUDA support in Clang/LLVM. The following list is in no<br>

> way meant to be exhaustive; it merely serves as a starting basis for<br>

> discussion.<br>

><br>

><br>

</div>> *2A. Extend TBAA*<br>

<div>><br>

> In theory, the type-based alias analysis pass could be extended to<br>

> (properly) support aliasing queries for pointers in OpenCL kernels.<br>

>  Currently, it has no way of knowing if two pointers in different address<br>

> spaces can alias, and in fact cannot know if this is the case given the<br>

> definition of LLVM address spaces.  Instead of programming it with<br>

> target-specific knowledge, it can be extended with language-specific<br>

> knowledge.  Instead of considering address spaces, the Clang portion of TBAA<br>

> can be programmed to use OpenCL attributes to extend its pointer metadata.<br>

>  Specifically, pointers to different memory spaces are in essence different<br>

> types and cannot alias.  For the kernel shown above, the resulting LLVM IR<br>

> could be:<br>

><br>

> ; ModuleID = '<a href="http://test1.cl/" target="_blank">test1.cl</a>'<br>

> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"<br>

> target triple = "ptx32--"<br>

><br>

> define ptx_kernel void @foo(float* nocapture %a, float addrspace(4)*<br>

> nocapture %b) nounwind noinline {<br>

> entry:<br>

>   %0 = load float* %a, align 4, !tbaa !1<br>

</div>>   store float %0, float addrspace(4)* %b, align 4, !tbaa *!2*<br>

<div>>   ret void<br>

> }<br>

><br>

> !opencl.kernels = !{!0}<br>

><br>

> !0 = metadata !{void (float*, float addrspace(4)*)* @foo}<br>

</div>> *!1 = metadata !{metadata !"float$__global", metadata !3}*<br>

> *!2 = metadata !{metadata !"float$__local", metadata !3}*<br>

<div>> !3 = metadata !{metadata !"omnipotent char", metadata !4}<br>

> !4 = metadata !{metadata !"Simple C/C++ TBAA", null}<br>

><br>

> Differences are bolded.  Here, the TBAA pass would be able to identify that<br>

> the loads and stores do not alias.  Of course, when compiling in<br>

> non-OpenCL/CUDA mode, TBAA would work just as before.<br>

<br>

</div>I have to say that I much prefer the TBAA solution, as it encodes the<br>

language semantics using the existing metadata for language semantics.<br></blockquote><div><br></div><div>It's certainly the easiest to implement and would have the least impact (practically zero) on existing passes.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

> *Pros:*<br>

><br>

> Relatively easy to implement<br>

><br>

> *Cons:*<br>

<div>><br>

> Does not solve the full problem, such as how to represent OpenCL memory<br>

> spaces in other backends, such as X86 which uses LLVM address spaces for<br>

> different purposes.<br>

<br>

</div>This presupposes that we need a way of representing OpenCL address<br>

spaces in IR targeting X86 (and targets which lack GPU-like address<br>

spaces).  As far as I can tell, the only real representations of<br>

OpenCL address spaces on such targets that we need are a way of<br>

distinguishing the different address spaces for alias analysis<br>

and a representation for __local variables allocated on the stack.<br>

TBAA metadata would solve the first problem, and we already have<br>

mechanisms in the frontend that could be used to solve the second.<br></blockquote><div><br></div><div>Which mechanisms could be used to differentiate between thread-private and __local data?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br>

> I see this solution as more of a short-term hack to solve the pointer<br>

> aliasing issue without actually addressing the larger issues.<br>

<br>

</div>I remain to be persuaded that there are any "larger issues" to solve.<br>

<br>

> *2B. Emit OpenCL/CUDA-specific Metadata or Attributes*<br>

<div>><br>

> Instead of using LLVM address spaces to represent OpenCL/CUDA memory spaces,<br>

> language-specific annotations can be provided on types.  This can take the<br>

> form of metadata, or additional LLVM IR attributes on types and parameters,<br>

> such as:<br>

><br>

> ; ModuleID = '<a href="http://test1.cl/" target="_blank">test1.cl</a>'<br>

> target datalayout = "e-p:32:32-i64:64:64-f64:64:64-n1:8:16:32:64"<br>

> target triple = "ptx32--"<br>

><br>

</div>> define *ocl_kernel* void @foo(float* nocapture *ocl_global* %a, float*<br>

> nocapture *ocl_local* %b) nounwind noinline {<br>

<div>> entry:<br>

>   %0 = load float* %a, align 4<br>

>   store float %0, float* %b, align 4<br>

>   ret void<br>

> }<br>

><br>

> Instead of extending the LLVM IR language, this information could also be<br>

> encoded as metadata by either (1) emitting some global metadata that binds<br>

> useful properties to globals and parameters, or (2) extending LLVM IR to<br>

> allow attributes on parameters and globals.<br>

><br>

> Optimization passes can make use of these additional attributes to derive<br>

> useful properties, such as %a cannot alias %b. Then, back-ends can use these<br>

> attributes to emit proper code sequences based on the pointer attributes.<br>

><br>

</div>> *Pros:*<br>

> *<br>

<div>> *<br>

> If done right, would solve the general problem<br>

><br>

</div>> *Cons:*<br>

> *<br>

<div>> *<br>

> Large implementation commitment; could potentially touch many parts of LLVM.<br>

<br>

</div>You are being vague about what is required here.  A complete solution<br>

following 2B would involve allowing these attributes on all pointer<br>

types.  It would be very expensive to allow custom attributes or<br>

metadata on pointer types, since they are used frequently in the IR,<br>

and the common case is not to have attributes or metadata.  Also,<br>

depending on how this is implemented, this would encode far too much<br>

language specific information in the IR.<br></blockquote><div><br></div><div>I agree that this would be expensive, and I'm not necessarily advocating it. If the consensus is that TBAA extensions are sufficient for all cases, then I'm fine with that.  It's much less work. :)</div>


<div><br></div><div>I just want to make sure we're covering all of our bases before we proceed too far with this.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

Thanks,<br>

<font color="#888888">--<br>

Peter<br>

</font></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br></div></div><div class="im">

_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></div></blockquote></div><br></div></div></blockquote></div><br><br clear="all"><div><br>

</div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>