[LLVMdev] [PROPOSAL] LLVM multi-module support
Tobias Grosser
tobias at grosser.es
Sun Jul 29 12:16:28 PDT 2012
On 07/26/2012 12:49 PM, Duncan Sands wrote:
> Hi Tobias, I didn't really get it. Is the idea that the same bitcode is
> going to be codegen'd for different architectures, or is each sub-module
> going to contain different bitcode? In the later case you may as well
> just use multiple modules, perhaps in conjunction with a scheme to store
> more than one module in the same file on disk as a convenience.
Hi Duncan,
thanks for your reply.
The proposal may allow both, sub-modules that contain different bitcode,
but also sub-modules that are code generated differently.
Different bitcode may arise from sub-modules that represent different
program parts, but also because we want to create different sub-modules
for a single program part e.g. to optimize for specific hardware.
In the back-end, sub-modules could be code generated according to the
requirements of the run-time system that will load them. For NVIDIA
chips we could code generate PTX, for AMD systems AMD-IL may be an option.
You and several others (Justin e.g) pointed out that multi-modules in
LLVM-IR (or the llvm.codegen intrinsics) just reinvent the tar archive
system. I can follow your thoughts here.
Thinking of how to add cuda support to clang a possible approach here is
to modify clang to emit device and host code to different modules,
compile each module separately and than add logic to clang to merge the
two modules in the end. This is a very reasonable approach and there are
no doubts, adding multi-module support to LLVM just to simplify this
single use case is not the right thing to do.
With multi-module support I am aiming for something else. As you know,
LLVM allows to "-load" optimizer plugins at run-time and every LLVM
based compiler being it clang/ghc/dragonegg/lli/... can take advantage
of them with almost no source code changes. I believe this is a very
nice feature, as it allows to prototype and test new optimizations
easily and without any changes to the core compilers itself. This works
not only for simple IR transformations, but even autoparallelisation
works well, as calls to libgomp can easily be added.
The next step we were looking into was automatically offloading some
calculations to an accelerator. This is actually very similar to OpenMP
parallelisation, but, instead of calls to libgomp, calls to libcuda or
libopencl need to be scheduled. The only major difference is that the
kernel code is not just a simple function in the host module, but a
entirely new module. Hence an optimizer somehow needs to extract those
modules and needs to pass a reference to them to the cuda or opencl runtime.
The driving motivation for my proposal was to extend LLVM, such that
optimization passes for heterogeneous architectures can be run in
LLVM based compilers with no or little changes to the compiler source
code. I think having this functionality will allow people to test new
ideas more easily and will avoid the need for each project to create its
own tool chain. It will also allow one optimizer to work most tools
(clang/ghc/dragonegg/lli) without the need for larger changes.
From the discussion about our last proposal, the llvm.codegen()
intrinsic, I took the conclusion that people are mostly concerned about
interpreting arbitrary strings embedded into an LLVM-IR file and that
people suggested explicit LLVM-IR extensions as one possible solution.
So I was hoping, this proposal could address some of the previously
raised concern. However, apparently people do not really see a need for
stronger support for heterogeneous compilation directly within LLVM. Or
the other way around, I fail to see how to achieve the same goals with
the existing infrastructure or some of the suggestions people made. I
will probably need to understand some of the ideas pointed out.
Thanks again for your feedback
Cheers
Tobi
More information about the llvm-dev
mailing list