[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
Hongbin Zheng
etherzhhb at gmail.com
Tue Apr 3 06:13:28 PDT 2012
Hi Yabin,
Instead of compile the LLVM IR to PTX asm string in a ScopPass, you
can also the improve llc/lli or create new tools to support the code
generation for Heterogeneous platforms[1], i.e. generate code for more
than one target architecture at the same time. Something like this is
not very complicated and had been implemented[2,3] by some people, but
not available in LLVM mainstream. Implement this could make your GPU
project more complete.
best regards
ether
[1]http://en.wikipedia.org/wiki/Heterogeneous_computing
[2]http://llvm.org/devmtg/2010-11/Villmow-OpenCL.pdf
[3]http://llvm.org/devmtg/2008-08/Sander_HW-SW-CoDesignflowWithLLVM.pdf
On Mon, Apr 2, 2012 at 10:16 PM, Yabin Hu <yabin.hwu at gmail.com> wrote:
> Hi all,
>
> I am a phd student from Huazhong University of Sci&Tech, China. The
> following is my GSoC 2012 proposal.
> Comments are welcome!
>
> Title: Automatic GPGPU Code Generation for LLVM
>
> Abstract
> Very often, manually developing an GPGPU application is a time-consuming,
> complex, error-prone and iterative process. In this project, I propose to
> build an automatic GPGPU code generation framework for LLVM, based on two
> successful LLVM (sub-)projects - Polly and PTX backend. This can be very
> useful to ease the burden of the long learning curve of various GPU
> programming model.
>
> Motivation
> With the broad proliferation of GPU computing, it is very important to
> provide an easy and automatic tool to develop or port the applications to
> GPU for normal developers, especially for those domain experts who want to
> harness the huge computing power of GPU. Polly has implemented many
> transformations, such as tiling, auto-vectorization and openmp code
> generation. With the help of LLVM's PTX backend, I plan to extend Polly with
> the feature of GPGPU code generation.
>
>
> Project Detail
> In this project, we target various parallel loops which can be described by
> Polly's polyhedral model. We first translated the selected SCoPs (Static
> Control Parts) into 4-depth loops with Polly's schedule optimization. Then
> we extract the loop body (or inner non-parallel loops) into a LLVM
> sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
> that, we use PTX backend to translate the subfunctions into a string of the
> corresponding PTX codes. Finally, we provide an runtime library to generate
> the executable program.
>
> There are three key challenges in this project here.
> 1. How to get the optimal execution configure of GPU codes.
> The execution configure is essential to the performance of the GPU codes. It
> is limited by many factors, including hardware, source codes, register
> usage, local store (device) usage, original memory access patterns and so
> on. We must take all the staff into consideration.
>
> 2. How to automatically insert the synchronization codes.
> This is very important to preserve the original semantics. We must detect
> where we need insert them correctly.
>
> 3. How to automatically generate the memory copy operation between host and
> device.
> We must transport the input data to GPU and copy the
> results back. Fortunately, Polly has implemented a very expressive way to
> describe memory access.
>
> Timeline
> May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
> June 4 ~ June 11 code generation for parallel loops with non-parallel inner
> loops.
> June 11 ~ June 24 automatic memory copy insertions.
> June 25 ~ July 8 auto-tuning for GPU execution configuration.
> July 9 ~ July 15 Midterm evaluation and writing documents.
> July 16 ~ July 22 automatic synchronization insertion.
> July 23 ~ August 3 test on polybench benchmarks.
> August 4 ~ August 12 summarize and complete the final documents.
>
>
> Project experience
> I participated in several projects related to binary translation
> (optimization) and run-time system. And I implemented a frontend for
> numerical computing languages like octave/matlab, following the style of
> clang. Recently, I work very close with Polly team to contribute some
> patches and investigate lots of details about polyhedral transformation.
>
>
> References
> 1. Tobias Grosser, Ragesh A. Polly - First Successful Optimizations - How to
> proceed? LLVM Developer Meeting 2011.
> 2. Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan. Automatic
> C-to-CUDA Code Generation for Affine Programs. CC 2010.
> 3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. Putting Automatic
> Polyhedral Compilation for GPGPU to Work. In Proc. of Compilers for Parallel
> Computers (CPC), 2010.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list