[LLVMdev] Changes to the PTX calling conventions
pekka.jaaskelainen at tut.fi
Tue Dec 13 23:47:27 PST 2011
On 12/13/2011 10:50 PM, Justin Holewinski wrote:
> You mean having no calling convention for device functions, and a new, common
> calling convention for kernels?
I think this might make sense.
One major issue with OpenCL C (and I suppose CUDA) kernels some
fail to see is that the functions are "directly callable"
(just by choosing a correct the calling convention) in general only for
SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).
For the MIMD (with possible SIMD/vector extensions) CPU-architectures
you need to transform the kernel function to a "work group function"
so it retains its parallel work item semantics whenever the kernel is
to be called with more than 1 parallel work items.
The transformation is not completely trivial due to the work
group (WG) barrier semantics. You can have barriers inside for-loops,
conditional blocks, etc. which makes it a more difficult compilation
problem than "just adding a loop around the whole WI kernel function".
Converting the "single WI kernel semantics" to work group
functions statically while avoiding threads for WI execution
is the main point of complexity the pocl project  has to go
For OpenCL compilation I think it's common to inline everything to
the kernel functions so the "device functions" usually just disappear.
This makes sense for SIMT and also when you do vectorization across
WIs of a WG, or in general want to improve the DLP/ILP of the kernel.
That said, you might not want to fully inline with all targets
(e.g. with a CPU with SIMD + OoOE you might want to reduce the icache
footprint and not inline).
Therefore, the kernel functions in this sense are different from the
device functions and at least the metadata that marks the kernels is
still needed. In pocl the OpenCL compilation is now enabled for all
(CPU) targets supported by LLVM solely depending on the kernel metadata.
In case *only* the kernel functions are marked with this calling
convention, the kernel metadata might not be needed. But, you still
might need the calling convention for the device functions if you
assume them not to get always inlined.
More information about the llvm-dev