<br><br><div class="gmail_quote">On Tue, Oct 4, 2011 at 2:28 PM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">On Mon, Oct 03, 2011 at 04:45:28PM +0000, Justin Holewinski wrote:<br>
> I'm looking into extending the PTX Clang target to support code generation<br>
> from OpenCL/CUDA code, so I'm wondering about the current state of these two<br>
> Clang implementations. As a test, I've implemented the AddrSpaceMap map in<br>
> the PTX target in lib/Basic/Targets.cpp, but I'm not sure what other hooks<br>
> are required. From what I can tell, basic functionality is working quite<br>
> well! I hope to commit a small patch soon to support the AddrSpaceMap for<br>
> PTX.<br>
<br>
</div>The AddrSpaceMap you added for PTX seems to be correct. The main<br>
other hook for OpenCL is the CGOpenCLRuntime class, but PTX should<br>
be able to use the default implementation.<br>
<div class="im"><br>
> I'm currently investigating the following issues/concerns:<br>
><br>
</div>> 1. What is the plan for language-specific functions and other constructs,<br>
<div class="im">> such as __syncthreads/barrier, get_local_id/threadIdx, etc.? Is it up to<br>
> the back-end to define compatible definitions of these, or is there a plan<br>
> to introduce generic LLVM intrinsics for these? Since OpenCL has<br>
> pre-defined functions that do not require header files, it may be awkward to<br>
> require OpenCL to include a back-end specific header file when compiling<br>
> with Clang.<br>
<br>
</div>For OpenCL, the implementation should provide definitions of<br>
the built-in functions described in section 6.11 of the OpenCL<br>
specification. For at least some of those functions, the definitions<br>
would be the same for any OpenCL implementation. (FWIW, I have<br>
developed a set of generic implementations of section 6.11 built-ins<br>
as part of an OpenCL implementation I have been working on, which I<br>
will be open sourcing soon.)<br>
<br>
For the rest (e.g. work-item functions), the implementation would<br>
need to be specific to the OpenCL implementation. For example, on<br>
a CPU, the exact implementation details of work-item functions would<br>
be highly dependent on how the implementation stores work-item IDs,<br>
so it would not be appropriate to use a generic intrinsic.<br></blockquote><div><br>Right. I'm wondering what the implementation plan for this is with Clang. Are you going to expose the OpenCL functions as LLVM intrinsics, and let back-ends provide appropriate implementations? Right now, I'm defining these functions in terms of PTX builtin functions, but this is obviously not optimal because you need to include an additional header in OpenCL code.<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
For CUDA, the NVIDIA header files provide appropriate declarations,<br>
but as far as I can tell, variables such as threadIdx are handled<br>
specially by nvcc, and functions such as __syncthreads are treated<br>
as builtins. Clang does not currently implement the special handling<br>
for these variables or functions.<br></blockquote><div><br>Are there any plans to implement any of these?<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
> 2. What is the status of the address space mapping? The CUDA frontend<br>
<div class="im">> does not seem to respect the mapping (I get address-space-less alloca's for<br>
> __shared__ arrays)<br>
<br>
</div>Clang does not currently implement CUDA address spaces correctly.<br>
The major challenge is that OpenCL (and LLVM) address spaces are, in<br>
C/C++ language terms, type qualifiers, while CUDA address spaces are<br>
declaration specifiers (despite being referred to as "type qualifiers"<br>
in the documentation). This means that in CUDA, pointers lack correct<br>
type information. nvcc performs type inference to determine whether<br>
a pointer is to __shared__ or __device__ memory, which Clang does<br>
not currently implement.<br></blockquote><div><br>Fair enough. OpenCL is my main interest at the moment. Ideally, I would like to create a semi-functional workflow from OpenCL -> PTX with Clang before the LLVM 3.0 branch.<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
As an alternative to implementing type inference, we could require<br>
CUDA users to use a board with a unified address space, but this<br>
is sub-optimal, and still depends on some support from Clang.<br>
<div class="im"><br>
> and the OpenCL frontend seems to respect the address<br>
> mapping but does not emit complete array definitions for locally-defined<br>
> __local arrays. Does the front-end currently not support __local arrays<br>
> embedded in the code? It seems to work if the __local arrays are passed as<br>
> pointers to the kernel.<br>
<br>
</div>Clang should support __local arrays, and this looks like a genuine<br>
bug in the IR generator. I will investigate.<br>
<br>
Thanks,<br>
<font color="#888888">--<br>
Peter<br>
</font></blockquote></div><br><br clear="all"><br>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>
<div style="visibility: hidden; left: -5000px; position: absolute; z-index: 9999; padding: 0px; margin-left: 0px; margin-top: 0px; overflow: hidden; word-wrap: break-word; color: black; font-size: 10px; text-align: left; line-height: 130%;" id="avg_ls_inline_popup">
</div>