[LLVMdev] Changes to the PTX calling conventions

Wed Dec 14 04:41:34 PST 2011

2011/12/14 Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi>

> Hi all,
>
> On 12/13/2011 10:50 PM, Justin Holewinski wrote:
> > You mean having no calling convention for device functions, and a new,
> common
> > calling convention for kernels?
>
> I think this might make sense.
>

To be clear, I do like the idea of using the default calling convention for
device functions.  My hesitation is from the LLVM specification that says
the default calling convention is the C calling convention, which supports
varargs.  If the spec is changed to make the supported features of the C
calling convention dependent on the target, then I'm fine with this.

Any core LLVM devs have any issues with this?

>
> One major issue with OpenCL C (and I suppose CUDA) kernels some
> fail to see is that the functions are "directly callable"
> (just by choosing a correct the calling convention) in general only for
> SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).
>
> For the MIMD (with possible SIMD/vector extensions) CPU-architectures
> you need to transform the kernel function to a "work group function"
> so it retains its parallel work item semantics whenever the kernel is
> to be called with more than 1 parallel work items.
>
> The transformation is not completely trivial due to the work
> group (WG) barrier semantics. You can have barriers inside for-loops,
> conditional blocks, etc. which makes it a more difficult compilation
> problem than "just adding a loop around the whole WI kernel function".
> Converting the "single WI kernel semantics" to work group
> functions statically while avoiding threads for WI execution
> is the main point of complexity the pocl project [1] has to go
> through.
>
> For OpenCL compilation I think it's common to inline everything to
> the kernel functions so the "device functions" usually just disappear.
> This makes sense for SIMT and also when you do vectorization across
> WIs of a WG, or in general want to improve the DLP/ILP of the kernel.
> That said, you might not want to fully inline with all targets
> (e.g. with a CPU with SIMD + OoOE you might want to reduce the icache
> footprint and not inline).
>
> Therefore, the kernel functions in this sense are different from the
> device functions and at least the metadata that marks the kernels is
> still needed.  In pocl the OpenCL compilation is now enabled for all
> (CPU) targets supported by LLVM solely depending on the kernel metadata.
> In case *only* the kernel functions are marked with this calling
> convention, the kernel metadata might not be needed. But, you still
> might need the calling convention for the device functions if you
> assume them not to get always inlined.
>

We absolutely cannot rely on inlining.  An OpenCL front-end is only one
possible consumer of the PTX back-end, and general PTX supports recursion
which cannot always be inlined.

I would favor calling conventions over metadata for the simple reason that
this maps more cleanly to the device model.  Device and kernel functions
are represented differently in PTX, including (sometimes) the way
parameters are passed.

>
> [1] https://launchpad.net/pocl
>
> Best regards,
> --
> --Pekka
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111214/de4b4ec1/attachment.html>