[cfe-dev] Backend for C and OpenCL

Wed Oct 5 11:55:02 PDT 2011

On 10/05/2011 06:53 PM, Alberto Magni wrote:
> Hi everybody,
>
> for a research project I would like to use LLVM to optimize
> OpenCL programs for GPUs.
>
> Due to the lack of open-source back-ends and runtimes for
> GPUs my idea is the following:
> 1) compile OpenCL C into LLVM-IR (for what I read on the ML
> full support is close, at least foreseeable),
> 2) apply LLVM transformations to the bitcode,
> 3) generate the OpenCL C code from the optimized bitcode,
> 4) use the official (Nvidia, AMD, Intel, ....) OpenCL compilers
> and runtimes for the actual execution of the optimized code
>
> I know that the C backend is buggy and it is no more
> supported but it still works with simple C programs.
> Remeber that OpenCL programs are usually quite simple
> (no function pointers, etc...)
>
> The main features to be added to the backend are:
> 1) the "__kernel" keyword,
> 2) the four address spaces keywords
> 3) vector data types
> 4) the half keyword
>
> My idea is to extensively verify the functionality the C-backend for
> C programs (similar to OpenCL-C ones) and possibly add the listed features.
>
> What do you think of this ? Is it feasible ?

Hi Alberto,

this depends what you want to achieve and what kind of optimizations you 
want to apply.

Your proposal suggests you want to transform OpenCL-C programs into 
LLVM-IR to apply transformations on LLVM-IR level. What kind of LLVM-IR 
transformations are you planning to run? To my knowledge at least the 
AMD, Intel and Apple OpenCL implementations use LLVM internally, so 
existing LLVM optimizations will not give you any benefits as they are 
already run in the OpenCL compilers.

Depending on what kind of optimizations you want to perform, several 
approaches are possible. As Guoping Long suggested, you can
use clang to create an OpenCL AST and use its rewriter capabilities to
perform source to source transformations. This will be more a pattern 
match approach, but for research or for projects where you can educate
people to write canonical code this might be a good choice. It is 
definitely an approach where it is easy to get access to higher level 
constructs like for-loops. On the other hand, analysis like scalar 
evolution are not available at this layer.

Another possibility is to take an approach similar to Polly[1] (a 
project I work on). Here we use LLVM analysis passes to recover high 
level constructs from LLVM-IR and to subsequently apply higher level
transformations on the LLVM-IR.

If you want to apply your optimizations on LLVM-IR the translation 
OpenCL -> LLVM-IR should be straightforward. clang's OpenCL
support is pretty good and people continue to improve it. The way back 
from LLVM-IR to OpenCL is more difficult. One approach you pointed out 
is to generate OpenCL-C with the C backend. I must admit I never looked 
into the C-backend, but from my knowledge it seems to work OK for 
selected examples, but has some problems that are regarded unsolvable. 
People propose to rewrite it as a regular LLVM backend. You probably 
need to investigate yourself how much work it is to make it useable.

Another option is to pass LLVM-IR directly into the backends without 
ever regenerating OpenCL-C. This is a very interesting approach as most 
(all?) OpenCL implementations use LLVM and so you could skip the useless
LLVM-IR -> OpenCL-C -> LLVM-IR conversion.
As Justin pointed out LLVM includes a PTX backend that could be used to 
directly target NVIDIA hardware. Micah Villmow, from AMD recently 
submitted the first patches to open source the AMD-IL backend. As 
getting open source support for AMD will take some time and it is 
unknown how complete it will be, I will point you to another option. 
OpenCL has support to export and reimport implementation specific 
binaries through clGetProgramInfo(CL_PROGRAM_BINARIES) and 
createProgramWithBinary(). The binary of the AMD OpenCL SDK is a normal 
elf file, that contains a section called .llvmir. You should be able to 
insert an optimized OpenCL program into the AMD SDK by exporting the 
binary of the OpenCL-C program, replacing the elf '.llvmir' section with 
your optimized LLVM-IR and by finally reimporting the changed binary 
(Let me know if you need help here). On the Euro-LLVM meeting it was 
also discussed if it is worthwhile to standardize the injection of 
LLVM-IR into OpenCL backends. Several people seemed to be interested, 
but also some problems regarding non target agnostic LLVM-IR were raised.

Cheers
Tobi

P.S.: I am very interested in OpenCL optimizations. If possible, it 
would be nice if you could point me information about the stuff you plan 
to work on.