[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm

Yabin Hu yabin.hwu at gmail.com
Tue Apr 3 16:02:31 PDT 2012


Hi Justin,

2012/4/3 Justin Holewinski <justin.holewinski at gmail.com>

> *Motivation*
>> With the broad proliferation of GPU computing, it is very important to
>> provide an easy and automatic tool to develop or port the applications to
>> GPU for normal developers, especially for those domain experts who want to
>> harness the huge computing power of GPU. Polly has implemented many
>> transformations, such as tiling, auto-vectorization and openmp code
>> generation. With the help of LLVM's PTX backend, I plan to extend Polly
>> with the feature of GPGPU code generation.
>>
>
> Very interesting!  I'm quite familiar with Muthu's work, and putting that
> into LLVM would be great.  If done right, it could apply to any
> heterogeneous systems, including AMD GPUs.
>
As the maintainer and primary developer on the PTX back-end, please feel
> free to contact me with any issues/suggestions you have regarding the PTX
> back-end!


Thanks for your interest and help.

I'm a bit confused by the wording here.  What do you mean by 'LLVM
> sub-function?'  I'm assuming you mean extracting the relevant code into a
> separate function, but I would just use the word 'function'.


Yes, it is indeed a function. I use this word by following the methods
naming style of polly's openmp code generation. I will fix this.

And what do you mean by a run-time library to generate the executable
> program?


The runtime library is just a wrapper of cuda driver APIs in my mind. But
we can add our debug info and make the cuda APIs changes apparent to users.


Are you proposing to side-step the LLVM code generator LLC?  It seems like
> a reasonable approach would be to write an LLVM pass (or set of passes)
> that takes as input a single IR file, and produces two: (1) the GPU
> kernel/device code, and (2) the non-translatable IR with GPU code replaced
> by appropriate CUDA Driver API calls.  Then, both of these can pass through
> the opt/llc tools with the appropriate selection for optimization passes
> and target back-end.
>
> This way, you could fairly easily create a GPGPU compiler by writing a
> simple wrapper around Clang (or better yet, improve Clang to support
> multiple targets simultaneously!)
>

Ether give a similar suggestion to this point. Here I copy the reply to him
to explain why I choose to put the transformation pass embedded in my
implementation.

The original motivation we do this, is to provide a jit compiler for our
language frontend (a subset of matlab/octave). I've extended lli to
implement a jit compiler (named gvm) to use polly dynamically. However,
preliminary results show that the overhead is heavy. I choose to offload
the dynamic optimization from the jitting process.  And also putting the
LLVM to PTX asm string pass into polly can provide a kind of one-touch
experience to users. Please imagine such a user scenario.  When a user open
a matlab source file or a folder contained source files, we can start to
compile the source statically and use polly and opt to optimize it to get
the optimal version llvm ir. Finally, when the user click run or the enter
key, we just need jit the llvm ir as normal one, minimizing the dynamic
overhead.


Thanks again!

best regards,
Yabin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120404/b2d88915/attachment.html>


More information about the llvm-dev mailing list