<div>Hi All,</div><div><br></div><div>I am going to submit a GSoC proposal for LLVM this year, and I would like to first post it here to get constructive feedback before I submit it before the April 8 deadline. This is the first time I have submitted a GSoC proposal, so please be brutal with the feedback. :)</div>
<div><br></div><div>Additionally, Che-Liang Chiou (the code owner of the PTX back-end) has agreed to be my mentor if this is accepted. What does he need to do to become an official mentor?</div><div><br></div><div><br></div>
<div><br></div><div><br></div><div>========</div><div>Overview</div><div>========</div><div><br></div><div>The NVidia Parallel Thread eXecution (PTX) language is an assembly-like language that is used as an intermediate format for all GPU programs that execute on NVidia hardware. It is similar to many other three-address assembly formats, and hence is a great target for the LLVM code generation framework. Having a supported PTX code generator back-end in LLVM would allow users of LLVM to generate GPU code directly from LLVM IR, with appropriate use of PTX-specific intrinsics to support features such as thread/block id queries, texture sampling, and prefetching.</div>
<div><br></div><div><br></div><div>======</div><div>Status</div><div>======</div><div><br></div><div>For the last month, I have been working with Che-Liang Chiou (the code owner of the PTX back-end) to implement basic support for PTX code generation within the LLVM source tree. Currently, the back-end is capable of handling a small sub-set of LLVM IR, including integer and floating-point arithmetic, loads/stores, and basic branching. While this is enough to support basic computational kernels, there is still much to be done to support arbitrary LLVM IR.</div>
<div><br></div><div><br></div><div>==============</div><div>Qualifications</div><div>==============</div><div><br></div><div>As I have already contributed significant portions of code to the current PTX back-end, the learning curve for this project would be minimal. I am already comfortable working with the core LLVM libraries, as well as the LLVM code generation and selection DAG libraries. I have also been working with C/C++ for over 15 years.</div>
<div><br></div><div>I am currently a PhD student at the Ohio State University, pursuing a degree in Computer Science and Engineering. My research focus is high-performance code generation for multi-core and many-core architectures, specifically current GPU architectures. I am primarily interested in the compiler technology to drive this. My interest in the PTX back-end started with a research interest for generating high-performance GPU code for stencil computations. While the PTX back-end is not my research focus, it is an important part of the infrastructure needed for a planned research compiler. I also have a personal interest in GPU code generation for graphics applications. </div>
<div><br></div><div><br></div><div>========</div><div>Proposal</div><div>========</div><div><br></div><div>For the 2011 Google Summer of Code program, I propose to implement the pieces of the PTX back-end that are currently missing or error-prone. This includes, but is not limited to,</div>
<div><br></div><div> * Implementing efficient instruction selection for floating-point IR instructions</div><div> - e.g., Selecting the most efficient instructions for different hardware</div><div> * Implementing the full range of integer and floating-point comparison instructions</div>
<div> * Implementing function calls</div><div> * Implementing jump tables</div><div> * Implementing the full range of LLVM intrinsics needed for "special" PTX instructions</div><div> - e.g. texture mapping, prefetching</div>
<div> * Implementing support for v4f32 and similar vector types</div><div><br></div><div>In addition to these basic milestones, the driving goal would be to allow the PTX back-end to generate correct and efficient code for LLVM IR versions of the samples contained in the NVidia GPU Computing SDK. In other words, I want to be able to take the CUDA code from the SDK samples, generate LLVM IR with Clang (with appropriate source-level syntactic modifications), and be able to generate efficient PTX code that is close in performance to that generated by the NVidia nvcc compiler. My limited testing so far has shown that code generated from the PTX back-end in its current form is able to come within 10% of the performance of identical code compiled with nvcc, and in some cases even marginally beats nvcc.</div>
<div><br></div><div>To accomplish this goal, I propose a two-phase implementation. In the first phase, I will implement as much of the PTX ISA as is representable in LLVM IR, and produce LLVM IR intrinsics for the rest. The goal of the first phase will be to generate correct PTX code for arbitrary LLVM IR input. However, some exceptions will be necessary; for example, it is currently not feasible to implement exception handling within PTX. After the code generator is able to generate correct code for a large set of complex LLVM IR input (including real-world computational kernels originally written in CUDA), I will begin phase two. In phase two, I would like to begin optimizing the PTX back-end to generate efficient code. This will involve work on the instruction scheduler to take advantage of the instruction pipeline on the GPU hardware, as well as potentially involving the register allocator.</div>
<div><br></div><div><br></div><div>==================</div><div>Advantage for LLVM</div><div>==================</div><div><br></div><div>The advantage of this project for the LLVM community would be the creation and maintenance of a functionally-complete code generator for NVidia GPU hardware that can be eventually tied to the OpenCL and CUDA front-ends for Clang. It would be the first LLVM code generator for GPU architectures that would be a part of upstream LLVM. This would expand the range of influence of LLVM to include GPU architectures, out-of-the-box. Additionally, the work in this proposal should be complete within the LLVM 3.0 timeline.</div>
<div><br></div><div><br></div><div>===========</div><div>Future Work</div><div>===========</div><div><br></div><div>In the future, the PTX back-end can be tied to the up-and-coming CUDA and OpenCL front-ends within Clang. This would provide a completely open-source implementation of both OpenCL and CUDA for NVidia hardware, with the only dependency being the NVidia CUDA SDK. While this integration work is outside of the scope of this proposal, it is a good future use-case for the PTX back-end. However, I do not know the timelines regarding the implementation of these two front-ends, so I am unable to make any guarantees regarding this GSoC proposal.</div>
<div><br></div><div><br></div><div>======</div><div>Mentor</div><div>======</div><div><br></div><div>The code owner of the PTX back-end, Che-Liang Chiou, has agreed to mentor me for this project if it is accepted this year. However, I would love feedback from others working on the back-end code generators within LLVM.</div>
<div><br></div><br>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>