<div class="gmail_quote">On Wed, Oct 5, 2011 at 2:39 PM, Guoping Long <span dir="ltr"><<a href="mailto:longguoping@gmail.com">longguoping@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Even if this works for Nvidia hardware, what about ATI series? By saying this, I do not quite understand the underlying motivation to implement another (workable, optimized) GPU backend. This requires non-trivial work, and there is already very good commercial support. In addition, since such backend optimizations inherently rely on specific hardware details, it's hard for me to be optimistic that an open source version can work comparably well with the vendor provided version.</blockquote>

<div><br></div><div>The idea is to harness the LLVM optimization and analysis passes to generate optimized GPU kernels.  On one hand, this has nothing to do with OpenCL/CUDA; it provides a way for front-ends to directly target NVidia GPU devices without having to first convert to OpenCL or CUDA.  Such a conversion puts you at the mercy of the vendor front-ends.  On the other hand, it provides a way to try to go beyond what nvcc can do, in terms of optimizations.  There is definitely good commercial support, but that support is in terms of black boxes that we ultimately have no control over.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div>

<br></div><div>I heartedly agree that providing support for OpenCL and CUDA on Clang is important and useful. Working on source level optimizations may be more interesting. I remember somebody proposed such an interesting idea in this community: transform CUDA to OpenCL code and do optimizations. I would love such kind of ideas, is it unfeasible or too trivial to implement? </div>

</blockquote><div><br></div><div>Converting CUDA to OpenCL would definitely be non-trivial, especially when you start considering the CUDA C++ support.  It's probably feasible, though not really in the scope of Clang.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div><br></div><div>Please let me know if I miss something important. Thanks.</div><div><br></div><div>----</div><div>Guoping</div><div><div></div><div class="h5"><div><br><div><div class="gmail_quote">2011/10/5 Justin Holewinski <span dir="ltr"><<a href="mailto:justin.holewinski@gmail.com" target="_blank">justin.holewinski@gmail.com</a>></span><br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div>On Wed, Oct 5, 2011 at 2:02 PM, Guoping Long <span dir="ltr"><<a href="mailto:longguoping@gmail.com" target="_blank">longguoping@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

At least for me, I believe this is a very interesting project. I may consider contribute later on when it actually moves on.<div>To really optimizing OpenCL codes at backend is too challenging, because this relies on very much hardware specific information, which may not be disclosed.</div>


<div>So I fully agree to transform the code back to source when it is optimized. But why at the IR level, not the AST? Isn't transforming on AST much easier and reasonable?</div></blockquote><div><br></div></div><div>


If you're interested in NVidia hardware, the OpenCL->PTX workflow is mostly implemented in LLVM ToT (will be part of LLVM 3.0).  The main missing part at the moment is OpenCL work-item function calls, which have to be implemented in terms of PTX intrinsics.  It's not difficult, but I haven't written a library to do that yet.  Once you have PTX, you can use the CUDA Driver API to load and execute the kernel.</div>


<div><br></div><div>You can play around with it by using the following Clang options:</div><div><br></div><div>$ clang -ccc-host-triple ptx32 -S <kernel>.cl</div><div><br></div><div>You can define your work-item functions as something like the following:</div>


<div><br></div><div><div><font face="'courier new', monospace">__attribute__((always_inline))</font></div><div><font face="'courier new', monospace">int get_group_id(int dim) {</font></div>

<div><font face="'courier new', monospace">  if (dim == 0)</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_x();</font></div>

<div><font face="'courier new', monospace">  else if (dim == 1)</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_y();</font></div>

<div><font face="'courier new', monospace">  else</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_z();</font></div>

<div><font face="'courier new', monospace">}</font></div><div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">__attribute__((always_inline))</font></div>


<div><font face="'courier new', monospace">int get_global_id(int dim) {</font></div><div><font face="'courier new', monospace">  if (dim == 0)</font></div>

<div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_x()*__builtin_ptx_read_ntid_x()+__builtin_ptx_read_tid_x();</font></div><div><font face="'courier new', monospace">  else if (dim == 1)</font></div>


<div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_y()*__builtin_ptx_read_ntid_y()+__builtin_ptx_read_tid_y();</font></div><div><font face="'courier new', monospace">  else</font></div>


<div><font face="'courier new', monospace">    return __builtin_ptx_read_ctaid_z()*__builtin_ptx_read_ntid_z()+__builtin_ptx_read_tid_z();</font></div><div><font face="'courier new', monospace">}</font></div>


<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">__attribute__((always_inline))</font></div><div><font face="'courier new', monospace">int get_local_id(int dim) {</font></div>


<div><font face="'courier new', monospace">  if (dim == 0)</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_tid_x();</font></div>

<div><font face="'courier new', monospace">  else if (dim == 1)</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_tid_y();</font></div>

<div><font face="'courier new', monospace">  else</font></div><div><font face="'courier new', monospace">    return __builtin_ptx_read_tid_z();</font></div>

<div><font face="'courier new', monospace">}</font></div><div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">__attribute__((always_inline))</font></div>


<div><font face="'courier new', monospace">int get_global_size(int dim) {</font></div><div><font face="'courier new', monospace">  if (dim == 0)</font></div>

<div><font face="'courier new', monospace">    return __builtin_ptx_read_nctaid_x()*__builtin_ptx_read_ntid_x();</font></div><div><font face="'courier new', monospace">  else if (dim == 1)</font></div>

<div><font face="'courier new', monospace">    return __builtin_ptx_read_nctaid_y()*__builtin_ptx_read_ntid_y();</font></div><div><font face="'courier new', monospace">  else</font></div>

<div><font face="'courier new', monospace">    return __builtin_ptx_read_nctaid_z()*__builtin_ptx_read_ntid_z();</font></div><div><font face="'courier new', monospace">}</font></div>

<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">#define barrier(kind) __builtin_ptx_bar_sync(kind)</font></div>

<div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">#define CLK_LOCAL_MEM_FENCE 0</font></div></div><div><div></div><div><div>

<br></div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br></div><div>------</div><div>Guoping<br>

<br><div class="gmail_quote">2011/10/5 Alberto Magni <span dir="ltr"><<a href="mailto:alberto.magni86@gmail.com" target="_blank">alberto.magni86@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><div></div><div>

Hi everybody,<br><br>for a research project I would like to use LLVM to optimize <br>OpenCL programs for GPUs.<br><br>Due to the lack of open-source back-ends and runtimes for<br>GPUs my idea is the following:<br>1) compile OpenCL C into LLVM-IR (for what I read on the ML<br>


full support is close, at least foreseeable),<br>2) apply LLVM transformations to the bitcode,<br>3) generate the OpenCL C code from the optimized bitcode,<br>4) use the official (Nvidia, AMD, Intel, ....) OpenCL compilers <br>


and runtimes for the actual execution of the optimized code<br><br>I know that the C backend is buggy and it is no more <br>supported but it still works with simple C programs.<br>Remeber that OpenCL programs are usually quite simple<br>


(no function pointers, etc...)<br><br>The main features to be added to the backend are:<br>1) the "__kernel" keyword,<br>2) the four address spaces keywords<br>3) vector data types<br>4) the half keyword<br><br>


My idea is to extensively verify the functionality the C-backend for <br>C programs (similar to OpenCL-C ones) and possibly add the listed features.<br><br>What do you think of this ? Is it feasible ?<br><br>Thank you,<br>


<font color="#888888">

<br>Alberto<br>

</font><br></div></div>_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

<br></blockquote></div><br></div>

<br>_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

<br></blockquote></div></div></div><font color="#888888"><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>

</font></blockquote></div><br></div></div>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>