GSoC proposal: TGSI compiler back-end. - Proposal TGSI is the intermediate representation that all open-source GPU drivers using the Gallium3D architecture understand. Until now it's mainly been used for graphics (vertex, fragment shaders, etc.), but doing general-purpose computing with it is possible in principle (actually, necessary for GL4), and it's been the object of a number of extensions and improvements to make it more suitable for that purpose. The TGSI IR has some peculiarities that are unusual in a typical CPU instruction architecture (and slightly annoying to deal with) -- It's a vector-centric architecture with a variable set of typeless registers, no stack and no proper support for irreducible control flow. The objective of this project would be to write an LLVM compiler back-end with the TGSI IR as target, aiming to bring it to a mature enough state to be considered suitable for inclusion in mainline, ideally by the end of this summer. - Benefits This back-end is the last piece missing for a working and fully open-source implementation of OpenCL running on the nVidia nv50 and nve4 architectures -- though there's nothing nVidia-specific in the TGSI language, and code generated by this back-end will be expected to be usable by any other driver implementing the compute API of Gallium3D. - Biographical background I'm currently a masters student in the field of theoretical physics. I've already (successfully) participated in the GSoC program with a device driver development project (which had to do with reverse-engineering nVidia's TV encoders) mentored by the X.Org Foundation in 2009, after that I've remained a frequent contributor to the Nouveau and Mesa projects for the next few years. Last year I wrote most of an OpenCL implementation running on nVidia hardware as part of the X.Org Foundation's EVoC program [1] -- the only piece missing being the compiler. I've gained some experience with LLVM by writing a proof-of-concept TGSI back-end which is minimally working [2] -- the goal of this project would be to bring it to a useful state. - Timeline Summary of the work that would be done: * Get object file generation working. (approx. June 17 - July 8) The output format will be ELF with a special section used for storing kernel metadata. The implementation will take advantage of the existing MC assembler API as much as possible. * Fix handling of the multiple OpenCL address spaces. (approx. July 8 - July 22) Operations on __global, __local, and __private memory will be dealt with using the resource access opcodes, __kernel function parameters will be accessed through a special resource meant for parameter passing, __constant memory will be mapped to constant buffers. * Get function calls working reliably. (approx. July 22 - August 5) This will involve fixing the passing of aggregate types and anything that doesn't fit in a 32-bit register, fixing stack allocations (i.e. the "alloca" instruction), and fixing calls to functions that use the "kernel" calling convention from non-kernel functions. * Get the missing arithmetic and data conversion instructions working. (approx. August 5 - August 19) Most of the floating point, integer and vector operations required by the OpenCL spec will be functional by the end of this period. * Get control flow working reliably. (approx. August 19 - September 16) This will require a working a control flow structurization pass. The R600 structurization pass will be taken as a basis and split in an analysis and a transformation pass to improve the handling of inter-pass dependencies and accommodate different strategies for the removal of irreducible control flow edges. Both passes will be target independent and part of the common analysis code. * Documentation and remaining clean-up work. (approx. September 16 - September 23) * Work on the standard library and intrinsics. (out of the official GSoC period: September 23 - ...) This will involve implementing the standard library required by the OpenCL specification, including math functions, thread synchronization functions, atomic functions, memory barriers and surface sampling/write-back functions. Most of the work will be done in the libclc project [3]. By the end of each period any implemented features will be expected to have extensive coverage from a series of lit-based tests. All the relevant OpenCL language tests from the piglit suite [4] and opencl-example [5] will be expected to pass too, barring driver bugs. - Contact information Francisco Jerez [1] http://www.x.org/wiki/XorgEVoC/GalliumCompute [2] https://github.com/curro/llvm [3] http://libclc.llvm.org [4] http://cgit.freedesktop.org/piglit [5] http://cgit.freedesktop.org/~tstellar/opencl-example