[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

Thu Aug 26 05:37:00 PDT 2010

Thanks David for the comments.
Sorry for the late reply.

On Mon, Aug 23, 2010 at 11:52 PM, David A. Greene <greened at obbligato.org> wrote:
> Che-Liang Chiou <clchiou at gmail.com> writes:
>
>> Hi there,
>>
>> Thank Nick for kindly reviewing the patch.  Here is the link to the
>> source code of the PTX backend; it would help Nick review the patch.
>> http://lime.csie.ntu.edu.tw/~clchiou/llvm-ptx-backend.tar.gz
>
> Great!
>
>> I decided to take the code generator approach (referred to as codegen
>> approach) rather than C backend appraoch (referred to as cbe approach)
>> for the following reasons (in fact, I had my first prototype in cbe
>> approach, but later I abandoned it and rewrote in codegen approach).
>> This would partly answer previous questions about comparison between
>> two approaches.
>
> I think the codegen approad is the right on long-term but I don't
> necessarily agree with all of your reasons.  :)
>
>> * LLVM should not rely on nVidia's design of its CUDA toolchain.  To
>> my knowledge, nVidia does not make any commitment on how much
>> optimization would be implemented in its graphics driver compiler.  A
>> backend with few optimization supports would screw up if nVidia
>> decides move most of optimizer to its CUDA compiler from its graphics
>> driver compiler.
>
> This is true.
>
>> * nVidia's CUDA compiler has a non-trivial optimizer; this should
>> suggest that late optimization alone is not sufficient.  If LLVM's PTX
>> backend is trying to provide a comparable alternative to nVidia's CUDA
>> compiler, the backend should have a good code optimizer.  In my
>> experiment, the prototype PTX backend generates better optimized code
>> than nVidia's CUDA compiler in some cases.
>
> LLVM will never completely replace the cuda compiler because PTX is not
> the final ISA.  We'll always need some piece of the cuda compiler to
> translate to the metal ISA.
>
>> * PTX is a virtual instruction set that is not designed for an
>> optimizer; for one, it is even not in SSA form.  So graphics driver
>> compiler's optimizer might not do its job very well, and I would
>> suggest we should not rely on its optimization.
>
> Not being in SSA form is no problem.  Converting to SSA is a well-known
> transformation.  LLVM IR doesn't start out in SSA either.
>
>> * The codegen approach is actually simpler than the cbe approach.  PTX
>> is mostly RISC-based; that said, the codegen approach leverages from
>> most of *.td and from implementations of existing matured RISC
>> backends such as ARM, PowerPC, and Sparc.  Besides, I guess most
>> developers would be more familiar with *.td than C backend.  In fact,
>> it only took me two weeks to write a working prototype from scratch --
>> and I had had no any prior experience on LLVM's codegen.
>
> I believe that.  PTX is a really simple instruction set and quite
> orthogonal.
>
>> * So far my backend is less complete than other backends based on cbe
>> approach, but considering the simplicity of codegen approach, a
>> backend based on codegen approach should catch up with them in short
>> time.
>
> The one thing we'll have to add is mask support.
>
I'm a little bit confused here.  Does "masked operation" equal to
"predicated operation"?
>> * Masked operation, as well as branch folding and alike, is much
>> easier to implement in codegen approach.  I am not sure how much
>> performance improvement could be achieved from these optimizations,
>> but it is worth trying.
>
> I'm not sure why these would be easier with one model over another.
> It's a lot of hand-lowering and manual optimization either way.  Can you
> explain?
>
The codegen is smart enough to translate a simple if-else block like
  if (pred) return A; else return B;
into one instruction
  selp A, B, pred
Also codegen has branch-folding support so it would be easier (this is
my guess, I've not yet started).
I didn't try many examples, but I was convinced that it should be easier.
>> All in all, I would propose a PTX backend in codegen approach after I
>> have implemented both.
>
> The fact that PTX is a moving target seals the deal for me.  It's really
> easy to generate variants of PTX using TableGen's predicate approach.
>
>                            -Dave
>

By the way, what should I do to upstream this backend?  I submitted a
small patch to llvm-commits mailing list.  In average how long I have
to wait for code review?  Thanks.

Regards,
Che-Liang