[LLVMdev] GSoC: PTX Back-End for LLVM

Mon Mar 28 06:12:42 PDT 2011

Hi All,

I am going to submit a GSoC proposal for LLVM this year, and I would like to
first post it here to get constructive feedback before I submit it before
the April 8 deadline.  This is the first time I have submitted a GSoC
proposal, so please be brutal with the feedback. :)

Additionally, Che-Liang Chiou (the code owner of the PTX back-end) has
agreed to be my mentor if this is accepted.  What does he need to do to
become an official mentor?

========
Overview
========

The NVidia Parallel Thread eXecution (PTX) language is an assembly-like
language that is used as an intermediate format for all GPU programs that
execute on NVidia hardware.  It is similar to many other three-address
assembly formats, and hence is a great target for the LLVM code generation
framework.  Having a supported PTX code generator back-end in LLVM would
allow users of LLVM to generate GPU code directly from LLVM IR, with
appropriate use of PTX-specific intrinsics to support features such as
thread/block id queries, texture sampling, and prefetching.

======
Status
======

For the last month, I have been working with Che-Liang Chiou (the code owner
of the PTX back-end) to implement basic support for PTX code generation
within the LLVM source tree.  Currently, the back-end is capable of handling
a small sub-set of LLVM IR, including integer and floating-point arithmetic,
loads/stores, and basic branching.  While this is enough to support basic
computational kernels, there is still much to be done to support arbitrary
LLVM IR.

==============
Qualifications
==============

As I have already contributed significant portions of code to the current
PTX back-end, the learning curve for this project would be minimal.  I am
already comfortable working with the core LLVM libraries, as well as the
LLVM code generation and selection DAG libraries.  I have also been working
with C/C++ for over 15 years.

I am currently a PhD student at the Ohio State University, pursuing a degree
in Computer Science and Engineering.  My research focus is high-performance
code generation for multi-core and many-core architectures, specifically
current GPU architectures.  I am primarily interested in the compiler
technology to drive this.  My interest in the PTX back-end started with a
research interest for generating high-performance GPU code for stencil
computations.  While the PTX back-end is not my research focus, it is an
important part of the infrastructure needed for a planned research compiler.
 I also have a personal interest in GPU code generation for graphics
applications.

========
Proposal
========

For the 2011 Google Summer of Code program, I propose to implement the
pieces of the PTX back-end that are currently missing or error-prone.  This
includes, but is not limited to,

    * Implementing efficient instruction selection for floating-point IR
instructions
      - e.g., Selecting the most efficient instructions for different
hardware
    * Implementing the full range of integer and floating-point comparison
instructions
    * Implementing function calls
    * Implementing jump tables
    * Implementing the full range of LLVM intrinsics needed for "special"
PTX instructions
      - e.g. texture mapping, prefetching
    * Implementing support for v4f32 and similar vector types

In addition to these basic milestones, the driving goal would be to allow
the PTX back-end to generate correct and efficient code for LLVM IR versions
of the samples contained in the NVidia GPU Computing SDK.  In other words, I
want to be able to take the CUDA code from the SDK samples, generate LLVM IR
with Clang (with appropriate source-level syntactic modifications), and be
able to generate efficient PTX code that is close in performance to that
generated by the NVidia nvcc compiler.  My limited testing so far has shown
that code generated from the PTX back-end in its current form is able to
come within 10% of the performance of identical code compiled with nvcc, and
in some cases even marginally beats nvcc.

To accomplish this goal, I propose a two-phase implementation.  In the first
phase, I will implement as much of the PTX ISA as is representable in LLVM
IR, and produce LLVM IR intrinsics for the rest.  The goal of the first
phase will be to generate correct PTX code for arbitrary LLVM IR input.
 However, some exceptions will be necessary; for example, it is currently
not feasible to implement exception handling within PTX.  After the code
generator is able to generate correct code for a large set of complex LLVM
IR input (including real-world computational kernels originally written in
CUDA), I will begin phase two.  In phase two, I would like to begin
optimizing the PTX back-end to generate efficient code.  This will involve
work on the instruction scheduler to take advantage of the instruction
pipeline on the GPU hardware, as well as potentially involving the register
allocator.

==================
Advantage for LLVM
==================

The advantage of this project for the LLVM community would be the creation
and maintenance of a functionally-complete code generator for NVidia GPU
hardware that can be eventually tied to the OpenCL and CUDA front-ends for
Clang.  It would be the first LLVM code generator for GPU architectures that
would be a part of upstream LLVM.  This would expand the range of influence
of LLVM to include GPU architectures, out-of-the-box.  Additionally, the
work in this proposal should be complete within the LLVM 3.0 timeline.

===========
Future Work
===========

In the future, the PTX back-end can be tied to the up-and-coming CUDA and
OpenCL front-ends within Clang.  This would provide a completely open-source
implementation of both OpenCL and CUDA for NVidia hardware, with the only
dependency being the NVidia CUDA SDK.  While this integration work is
outside of the scope of this proposal, it is a good future use-case for the
PTX back-end.  However, I do not know the timelines regarding the
implementation of these two front-ends, so I am unable to make any
guarantees regarding this GSoC proposal.

======
Mentor
======

The code owner of the PTX back-end, Che-Liang Chiou, has agreed to mentor me
for this project if it is accepted this year.  However, I would love
feedback from others working on the back-end code generators within LLVM.

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110328/3e543ab6/attachment.html>