[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Tobias Grosser
tobias at grosser.es
Wed Apr 4 07:35:43 PDT 2012
On 04/04/2012 04:17 PM, Justin Holewinski wrote:
>
>
> On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es
> <mailto:tobias at grosser.es>> wrote:
>
> On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> > Hi Yabin,
> >
> > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> > can also the improve llc/lli or create new tools to support the code
> > generation for Heterogeneous platforms[1], i.e. generate code for
> more
> > than one target architecture at the same time. Something like this is
> > not very complicated and had been implemented[2,3] by some
> people, but
> > not available in LLVM mainstream. Implement this could make your GPU
> > project more complete.
>
> I agree with ether that we should ensure as much work as possible is
> done within generic, not Polly specific code.
>
>
> Right, this has the potential to impact more people that the users of
> Polly. By moving as much as possible to generic LLVM, that
> infrastructure can be leveraged by people doing work outside of the
> polyhedral model.
To make stuff generic it is often helpful to know the other possible use
cases. I consequently encourage everybody to point out such use cases or
to state which exact functionality they might want to reuse. Otherwise,
there it may happen that we focus a little too much on the needs of Polly.
> In terms of heterogeneous code generation the approach Yabin proposed
> seems to work, but we should discuss other approaches. For the moment,
> I believe his proposal is very similar the model of OpenCL and CUDA. He
> splits the code into host and kernel code. The host code is directly
> compiled to machine code by the existing tools (clang/llc). The kernel
> code is stored as a string and only at execution time it is compiled to
> platform specific code.
>
>
> Depending on your target, that may be the only way. If your target is
> OpenCL-compatible accelerators, then your only portable option is save
> the kernel code as OpenCL text and let the driver JIT compiler it at
> run-time. Any other approach is not guaranteed to be compatible across
> platforms or even driver versions.
> In this case, the target is the CUDA Driver API, so you're free to pass
> along any valid PTX assembly. In this case, you still pass the PTX code
> as a string to the driver, which JIT compiles it to actual GPU device
> code at run-time.
I would like to highlight that with the word 'string' I was not
referring to 'OpenCL C code'. I don't think it is a practical approach
to recover OpenCL C code, especially as the LLVM-IR C backend was
recently removed.
I meant to describe that the kernel code is stored as a global variable
in the host binary (in some intermediate representation such as LLVM-IR,
PTX or a vendor specific OpenCLBinary) and is loaded at execution time
into the OpenCL or CUDA runtime, where it is compiled down to hardware
specific machine code.
> Are there any other approaches that could be taken? What specific
> heterogeneous platform support would be needed. At the moment, it seems
> to me we actually do not need too much additional support.
>
>
> I could see this working without any additional support, if needed. It
> seems like this proposal is dealing with LLVM IR -> LLVM IR code
> generation, so the only thing that is really needed is a way to split
> the IR into multiple separate IRs (one for host, and one for each
> accelerator target). This does not really need any supporting
> infrastructure, as you could imagine an opt pass processing the input IR
> and transforming it to the host IR, and emitting the device IR as a
> separate module.
Yes. And instead of saving the two modules in separate files, we can
store the kernel modul as a 'string' in the host module and add the
necessary library calls to load it at run time. This will give a smooth
user experience and requires almost no additional infrastructure.
(At the moment this will only work with NVidia, but I am confident there
will be OpenCL vendor extensions that allow loading LLVM-IR kernels. AMD
OpenCL can e.g. load LLVM-IR, even though it is not officially supported)
> Now if you're talking about source-level support for heterogeneous
> platforms (e.g. C++ AMP), then you would need to adapt Clang to support
> emission of multiple IR modules. Basically, the AST would need to be
> split into host and device portions, and codegen'd appropriately. I
> feel that is far beyond the scope of this proposal, though.
Yes. No source level transformations or targeting anything else than
PTX, AMDIL or LLVM-IR.
Cheers
Tobi
More information about the llvm-dev
mailing list