<div class="gmail_quote">2011/12/14 Pekka Jääskeläinen <span dir="ltr"><<a href="mailto:pekka.jaaskelainen@tut.fi">pekka.jaaskelainen@tut.fi</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi all,<br>
<div class="im"><br>
On 12/13/2011 10:50 PM, Justin Holewinski wrote:<br>
> You mean having no calling convention for device functions, and a new, common<br>
> calling convention for kernels?<br>
<br>
</div>I think this might make sense.<br></blockquote><div><br></div><div>To be clear, I do like the idea of using the default calling convention for device functions. My hesitation is from the LLVM specification that says the default calling convention is the C calling convention, which supports varargs. If the spec is changed to make the supported features of the C calling convention dependent on the target, then I'm fine with this.</div>
<div><br></div><div>Any core LLVM devs have any issues with this?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
One major issue with OpenCL C (and I suppose CUDA) kernels some<br>
fail to see is that the functions are "directly callable"<br>
(just by choosing a correct the calling convention) in general only for<br>
SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).<br>
<br>
For the MIMD (with possible SIMD/vector extensions) CPU-architectures<br>
you need to transform the kernel function to a "work group function"<br>
so it retains its parallel work item semantics whenever the kernel is<br>
to be called with more than 1 parallel work items.<br>
<br>
The transformation is not completely trivial due to the work<br>
group (WG) barrier semantics. You can have barriers inside for-loops,<br>
conditional blocks, etc. which makes it a more difficult compilation<br>
problem than "just adding a loop around the whole WI kernel function".<br>
Converting the "single WI kernel semantics" to work group<br>
functions statically while avoiding threads for WI execution<br>
is the main point of complexity the pocl project [1] has to go<br>
through.<br>
<br>
For OpenCL compilation I think it's common to inline everything to<br>
the kernel functions so the "device functions" usually just disappear.<br>
This makes sense for SIMT and also when you do vectorization across<br>
WIs of a WG, or in general want to improve the DLP/ILP of the kernel.<br>
That said, you might not want to fully inline with all targets<br>
(e.g. with a CPU with SIMD + OoOE you might want to reduce the icache<br>
footprint and not inline).<br>
<br>
Therefore, the kernel functions in this sense are different from the<br>
device functions and at least the metadata that marks the kernels is<br>
still needed. In pocl the OpenCL compilation is now enabled for all<br>
(CPU) targets supported by LLVM solely depending on the kernel metadata.<br>
In case *only* the kernel functions are marked with this calling<br>
convention, the kernel metadata might not be needed. But, you still<br>
might need the calling convention for the device functions if you<br>
assume them not to get always inlined.<br></blockquote><div><br></div><div>We absolutely cannot rely on inlining. An OpenCL front-end is only one possible consumer of the PTX back-end, and general PTX supports recursion which cannot always be inlined.</div>
<div><br></div><div>I would favor calling conventions over metadata for the simple reason that this maps more cleanly to the device model. Device and kernel functions are represented differently in PTX, including (sometimes) the way parameters are passed.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
[1] <a href="https://launchpad.net/pocl" target="_blank">https://launchpad.net/pocl</a><br>
<br>
Best regards,<br>
<span class="HOEnZb"><font color="#888888">--<br>
--Pekka<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>