[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Wed Apr 4 07:17:50 PDT 2012

On Wed, Apr 4, 2012 at 4:49 AM, Tobias Grosser <tobias at grosser.es> wrote:

> On 04/03/2012 03:13 PM, Hongbin Zheng wrote:
> > Hi Yabin,
> >
> > Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
> > can also the improve llc/lli or create new tools to support the code
> > generation for Heterogeneous platforms[1], i.e. generate code for more
> > than one target architecture at the same time. Something like this is
> > not very complicated and had been implemented[2,3] by some people, but
> > not available in LLVM mainstream. Implement this could make your GPU
> > project more complete.
>
> I agree with ether that we should ensure as much work as possible is
> done within generic, not Polly specific code.
>

Right, this has the potential to impact more people that the users of
Polly. By moving as much as possible to generic LLVM, that infrastructure
can be leveraged by people doing work outside of the polyhedral model.

>
> In terms of heterogeneous code generation the approach Yabin proposed
> seems to work, but we should discuss other approaches. For the moment,
> I believe his proposal is very similar the model of OpenCL and CUDA. He
> splits the code into host and kernel code. The host code is directly
> compiled to machine code by the existing tools (clang/llc). The kernel
> code is stored as a string and only at execution time it is compiled to
> platform specific code.
>

Depending on your target, that may be the only way.  If your target is
OpenCL-compatible accelerators, then your only portable option is save the
kernel code as OpenCL text and let the driver JIT compiler it at run-time.
 Any other approach is not guaranteed to be compatible across platforms or
even driver versions.

In this case, the target is the CUDA Driver API, so you're free to pass
along any valid PTX assembly.  In this case, you still pass the PTX code as
a string to the driver, which JIT compiles it to actual GPU device code at
run-time.

>
> Are there any other approaches that could be taken? What specific
> heterogeneous platform support would be needed. At the moment, it seems
> to me we actually do not need too much additional support.
>

I could see this working without any additional support, if needed.  It
seems like this proposal is dealing with LLVM IR -> LLVM IR code
generation, so the only thing that is really needed is a way to split the
IR into multiple separate IRs (one for host, and one for each accelerator
target).  This does not really need any supporting infrastructure, as you
could imagine an opt pass processing the input IR and transforming it to
the host IR, and emitting the device IR as a separate module.

Now if you're talking about source-level support for heterogeneous
platforms (e.g. C++ AMP), then you would need to adapt Clang to support
emission of multiple IR modules.  Basically, the AST would need to be split
into host and device portions, and codegen'd appropriately.  I feel that is
far beyond the scope of this proposal, though.

>
> Cheers
> Tobi
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120404/c72ac3d6/attachment.html>