[LLVMdev] [PROPOSAL] LLVM multi-module support

Sun Jul 29 12:16:28 PDT 2012

On 07/26/2012 12:49 PM, Duncan Sands wrote:
> Hi Tobias, I didn't really get it.  Is the idea that the same bitcode is
> going to be codegen'd for different architectures, or is each sub-module
> going to contain different bitcode?  In the later case you may as well
> just use multiple modules, perhaps in conjunction with a scheme to store
> more than one module in the same file on disk as a convenience.

Hi Duncan,

thanks for your reply.

The proposal may allow both, sub-modules that contain different bitcode, 
but also sub-modules that are code generated differently.

Different bitcode may arise from sub-modules that represent different 
program parts, but also because we want to create different sub-modules 
for a single program part e.g. to optimize for specific hardware.

In the back-end, sub-modules could be code generated according to the
requirements of the run-time system that will load them. For NVIDIA
chips we could code generate PTX, for AMD systems AMD-IL may be an option.

You and several others (Justin e.g) pointed out that multi-modules in 
LLVM-IR (or the llvm.codegen intrinsics) just reinvent the tar archive 
system. I can follow your thoughts here.

Thinking of how to add cuda support to clang a possible approach here is 
to modify clang to emit device and host code to different modules, 
compile each module separately and than add logic to clang to merge the 
two modules in the end. This is a very reasonable approach and there are 
no doubts, adding multi-module support to LLVM just to simplify this 
single use case is not the right thing to do.

With multi-module support I am aiming for something else. As you know,
LLVM allows to "-load" optimizer plugins at run-time and every LLVM 
based compiler being it clang/ghc/dragonegg/lli/... can take advantage 
of them with almost no source code changes. I believe this is a very 
nice feature, as it allows to prototype and test new optimizations 
easily and without any changes to the core compilers itself. This works 
not only for simple IR transformations, but even autoparallelisation 
works well, as calls to libgomp can easily be added.

The next step we were looking into was automatically offloading some 
calculations to an accelerator. This is actually very similar to OpenMP 
parallelisation, but, instead of calls to libgomp, calls to libcuda or 
libopencl need to be scheduled. The only major difference is that the 
kernel code is not just a simple function in the host module, but a 
entirely new module. Hence an optimizer somehow needs to extract those 
modules and needs to pass a reference to them to the cuda or opencl runtime.

The driving motivation for my proposal was to extend LLVM, such that 
optimization passes for heterogeneous architectures can be run in
LLVM based compilers with no or little changes to the compiler source 
code. I think having this functionality will allow people to test new 
ideas more easily and will avoid the need for each project to create its 
own tool chain. It will also allow one optimizer to work most tools 
(clang/ghc/dragonegg/lli) without the need for larger changes.

 From the discussion about our last proposal, the llvm.codegen() 
intrinsic, I took the conclusion that people are mostly concerned about
interpreting arbitrary strings embedded into an LLVM-IR file and that 
people suggested explicit LLVM-IR extensions as one possible solution. 
So I was hoping, this proposal could address some of the previously 
raised concern. However, apparently people do not really see a need for 
stronger support for heterogeneous compilation directly within LLVM. Or 
the other way around, I fail to see how to achieve the same goals with 
the existing infrastructure or some of the suggestions people made. I 
will probably need to understand some of the ideas pointed out.

Thanks again for your feedback

Cheers
Tobi