[LLVMdev] GSoC proposal: TGSI compiler back-end.

Wed Apr 24 07:53:33 PDT 2013

On Tue, Apr 23, 2013 at 11:57:44PM +0200, Francisco Jerez wrote:
> Tom Stellard <tom at stellard.net> writes:
> >[...]
> > Hi Francisco,
> >
> 
> Hi Tom,
> 
> > I would be happy to be a mentor for this project if it is accepted.  I
> > have a few comments about your proposal:
> >
> Great.
> 
> >> I'm attaching a preliminary version of my proposal -- would be happy to
> >> get some feedback about it.
> >> 
> >
> >> GSoC proposal: TGSI compiler back-end.
> >> 
> >> - Proposal
> >> 
> >> TGSI is the intermediate representation that all open-source GPU
> >> drivers using the Gallium3D architecture understand.  Until now it's
> >> mainly been used for graphics (vertex, fragment shaders, etc.), but
> >> doing general-purpose computing with it is possible in principle
> >> (actually, necessary for GL4), and it's been the object of a number of
> >> extensions and improvements to make it more suitable for that purpose.
> >> 
> >> The TGSI IR has some peculiarities that are unusual in a typical CPU
> >> instruction architecture (and slightly annoying to deal with) -- It's
> >> a vector-centric architecture with a variable set of typeless
> >> registers, no stack and no proper support for irreducible control
> >> flow.
> >> 
> >> The objective of this project would be to write an LLVM compiler
> >> back-end with the TGSI IR as target.
> >> 
> >> - Benefits
> >> 
> >> This back-end is the last piece missing for a working and fully
> >> open-source implementation of OpenCL running on the nVidia nv50 and
> >> nve4 architectures -- though there's nothing nVidia-specific in the
> >> TGSI language, and code generated by this back-end will be expected to
> >> be usable by any other driver implementing the compute API of
> >> Gallium3D.
> >> 
> >> - Biographical background
> >> 
> >> I'm currently a masters student in the field of theoretical physics.
> >> 
> >> I've already (successfully) participated in the GSoC program with a
> >> device driver development project (which had to do with
> >> reverse-engineering nVidia's TV encoders) mentored by the X.Org
> >> Foundation in 2009, after that I've remained a frequent contributor to
> >> the Nouveau and Mesa projects for the next few years.
> >> 
> >> Last year I wrote most of an OpenCL implementation running on nVidia
> >> hardware as part of the X.Org Foundation's EVoC program [1] -- the only
> >> piece missing being the compiler.
> >> 
> >> I've gained some experience with LLVM by writing a proof-of-concept
> >> TGSI back-end which is minimally working [2] -- the goal of this
> >> project would be to bring it to a useful state.
> >> 
> >> - Timeline
> >> 
> >> Summary of the work that would be done:
> >
> > I'm not sure what the current status of your TGSI backend, but I would
> > recommend getting assembly generation working first, since this will
> > enable you to write lit tests.
> >
> 
> That already sort of works...  The only thing is that the assembly files
> that it produces are somewhat non-standard because they include section
> annotations and other unusual syntax that wouldn't be recognized by the
> normal TGSI parser...  It might be worth looking into it at some point
> but I don't think it's very high-priority, what I have seems to be
> enough to make lit happy.
> 
> >> 
> >>   * Get object file generation working.
> >>     (approx. June 17 - July 8)
> >> 
> >>     The output format will be the one expected by Mesa. The
> >>     implementation will take advantage of the existing MC assembler
> >>     API as much as possible.
> >>
> >
> > Can you elaborate a little more on the output format you will be using?
> > For example, will you be generating ELF binaries with special metadata
> > sections (This is what R600 currently does) or will you be creating your
> > own object format.
> 
> I'd be fine with using ELF, but it would definitely need special
> metadata sections as you say (for kernel prototypes and so on), and
> clover would have to be fixed to deal with it correctly -- OTOH the
> minimalistic format implemented in 'clover/core/module.cpp' seems to do
> everything we need, so another option would be to stick to it.
>

Generating ELF binaries with the LLVM API is really easy to do and you
can use R600 as a minimalistic example.  Also, keep in mind that OpenCL
1.2 has API calls for linking kernel objects so that is something that
will need to be supported.  I'm not sure if it will be easier to do
linking with ELF or with a custom format, but that's something you
should look into.

If you want to use a custom format, I would recommend doing a little
research ahead of time so you know exactly what needs to be done.
Probably the best place to start would be to look at the PureStreamer
class and figure out what additional features you might need.

> >
> >>   * Fix handling of the multiple OpenCL address spaces.
> >>     (approx. July 8 - July 22)
> >> 
> >>     Operations on __global, __local, and __private memory will be
> >>     dealt with using the resource access opcodes, __kernel function
> >>     parameters will be accessed through a special resource meant for
> >>     parameter passing, __constant memory will be mapped to constant
> >>     buffers.
> >> 
> >>   * Get function calls working reliably.
> >>     (approx. July 22 - August 5)
> >> 
> >>     This will involve fixing the passing of aggregate types and
> >>     anything that doesn't fit in a 32-bit register, fixing stack
> >>     allocations (i.e. the "alloca" instruction), and fixing calls to
> >>     functions that use the "kernel" calling convention from non-kernel
> >>     functions.
> >> 
> >>   * Get control flow working reliably.
> >>     (approx. August 5 - August 19)
> >> 
> >>     This will involve writing a control flow structurizing pass -- It
> >>     might be possible to promote the R600 one to a common analysis
> >>     pass and reuse it.
> >>
> >
> > I have a feeling this task may take longer than two weeks.  When you
> > write the final version of your proposal, I think you should have a
> > definitive plan for how you will implement the structurization.  Whether
> > it's reusing existing R600 code (this is my recommendation) or writing
> > something from scratch.
> >
> Reusing the R600 code would be possible for sure with just a few
> changes, but I think it would be nice to split the algorithm in an
> "analysis" and a "transformation" pass to leave the target the choice on
> how irreducible edges in the CFG should be handled -- Depending on the
> hardware and the specific case it might be better to remove irreducible
> edges by duplicating basic blocks, by introducing temporary "control
> flow" variables (as the SI structurizer does), or by not doing anything
> at all (e.g. on nVidia hardware arbitrary branches are actually
> supported, they're just somewhat inefficient).
> 
> It would also be nice if inter-pass dependencies were handled correctly
> and we didn't have to disable any other optimization passes that
> decanonicalize the control flow as R600 does.
> 
> I agree that two weeks might be too little for what I have in mind, but
> I guess if we drop the standard library point below (or we make it
> optional) it should be plenty of time to do it right.
> 
> > Also, I would really prefer if your structurization solution was target
> > independent and could live outside of the backend in the common code,
> > because a good structurization solution would be a great benefit to the
> > LLVM project.
> 
> Yeah, that was my idea too.
> 
> >
> >>   * Get the missing arithmetic and data conversion instructions
> >>     working.
> >>     (approx. August 19 - August 26)
> >> 
> >>     Most of the floating point, integer and vector operations required
> >>     by the OpenCL spec will be functional by the end of this period.
> >> 
> >>   * Work on the standard library and intrinsics.
> >>     (approx. August 26 - September 16)
> >> 
> >>     This will involve getting a reasonable subset of the OpenCL
> >>     standard library working, including math functions, thread
> >>     synchronisation functions, atomic functions, memory barriers and
> >>     surface sampling/write-back functions.
> >
> > I'm assuming you are planning to use libclc (http://libclc.llvm.org) for
> > this.
> >
> Yes.
> 
> > While implementing standard library builtins is important, I think this
> > task may be a little bit outside the scope of this project.  I would
> > recommend dropping this from the schedule and adding it as a task to
> > work on if you finish everything else early.  This way you can give
> > yourself more time to work on the actual backend.
> 
> OK, I'll make this one optional.
> 
> >> 
> >>   * Documentation and remaining clean-up work.
> >>     (approx. September 16 - September 23)
> >>
> >
> > I think your proposal should also include a plan for getting the backend
> > into mainline LLVM, because this is really the ultimate goal of the
> > project.  Your plan should include where the code repository will be
> > stored and how you will engage with the community to help you review
> > the code.  I think this is really important no just for you, but also
> > for the LLVM community to know what they need to do as far as helping
> > get the backend into the main tree.
> >
> I'm a little lost on this point...  My plan is just to keep working on
> it until it's good enough to be considered suitable for mainline,
> meanwhile it could live in a separate repository in freedesktop or
> github.  Not sure what else would be expected from me -- of course, I'm
> willing to keep fixing bugs, API breakages and reviewing related patches
> once it's merged to mainline.
> 

Basically what I'm trying to say is that I think the goal should be to
merge the backend by the end of the summer.  It is difficult to get
new backends accepted into the main tree mostly due to the fact that
the core developers don't usually have much spare time to do reviews.
I think if you put off trying to merge the code until after the summer,
the backend may fall off the radar of the LLVM developers, and it will
be even harder to get someone to look at it.

-Tom

> >  
> >> By the end of each period all the relevant OpenCL language tests from
> >> the piglit suite [3] and opencl-example [4] will be expected to pass.
> >> New tests will be written for implemented features that don't have
> >> sufficient coverage from the existing test suites.
> >>
> >
> > I know you'll be using the nouveau drivers to test this backend on real
> > hardware, and I think that's OK, but I do think you need to be careful
> > about not spending too much time fixing bugs in the nouveau driver.  I
> > think piglit passes is a good goal, but I would also like to see OpenCL
> > or LLVM IR based lit tests added as a goal, because TGSI code gen
> > is the main focus of this project.
> >
> Yes, good point, I agree that for now it would make more sense to focus
> on having extensive coverage in form of lit tests.
> 
> > Thank you for submitting an early draft of your proposal, I think it is
> > really good to get developer feedback early.  I would encourage you to
> > continue to submit drafts up until the deadline to maximize the input
> > you get from LLVM developers.
> >
> 
> OK, I will do that.  Thank you for taking the time to read and comment
> on my proposal.
> 
> > -Tom
> >