[LLVMdev] [PROPOSAL] LLVM multi-module support

Thu Jul 26 00:19:27 PDT 2012

Hi Tobias, I didn't really get it.  Is the idea that the same bitcode is
going to be codegen'd for different architectures, or is each sub-module
going to contain different bitcode?  In the later case you may as well
just use multiple modules, perhaps in conjunction with a scheme to store
more than one module in the same file on disk as a convenience.

Ciao, Duncan.

> a couple of weeks ago I discussed with Peter how to improve LLVM's
> support for heterogeneous computing. One weakness we (and others) have
> seen is the absence of multi-module support in LLVM. Peter came up with
> a nice idea how to improve here. I would like to put this idea up for
> discussion.
>
> ## The problem ##
>
> LLVM-IR modules can currently only contain code for a single target
> architecture. However, there are multiple use cases where one
> translation unit could contain code for several architectures.
>
> 1) CUDA
>
> cuda source files can contain both host and device code. The absence of
> multi-module support complicates adding CUDA support to clang, as clang
> would need to perform multi-module compilation on top of a single-module
> based compiler framework.
>
> 2) C++ AMP
>
> C++ AMP [1] contains - similarly to CUDA - both host code and device
> code in the same source file. Even if C++ AMP is a Microsoft extension
> the use case itself is relevant to clang. It would be great if LLVM
> would provide infrastructure, such that front-ends could easily target
> accelerators. This would probably yield a lot of interesting experiments.
>
> 3) Optimizers
>
> To fully automatically offload computations to an accelerator an
> optimization pass needs to extract the computation kernels and schedule
> them as separate kernels on the device. Such kernels are normally
> LLVM-IR modules for different architectures. At the moment, passes have
> no way to create and store new LLVM-IR modules. There is also no way
> to reference kernel LLVM-IR modules from a host module (which is
> necessary to pass them to the accelerator run-time).
>
> ## Goals ##
>
> a) No major changes to existing tools and LLVM based applications
>
> b) Human readable and writable LLVM-IR
>
> c) FileCheck testability
>
> d) Do not force a specific execution model
>
> e) Unlimited number of embedded modules
>
> ## Detailed Goals
>
> a)
>    o No changes should be required, if a tool does not use multi-module
>      support. Each LLVM-IR file valid today, should remain valid.
>
>    o Major tools should support basic heterogeneous modules without large
>      changes. Some of the commands that should work after smaller
>      adaptions:
>
>      clang -S -emit-llvm -o out.ll
>      opt -O3 out.ll -o out.opt.ll
>      llc out.opt.ll
>      lli out.opt.ll
>      bugpoint -O3 out.opt.ll
>
> b) All (sub)modules should be directly human readable/writable.
>      There should be no need to extract single modules before modifying
>      them.
>
> c) The LLVM-IR generated from a heterogeneous multi-module should
>      easily be 'FileCheck'able. The same is true, if a multi-module is
>      the result of an optimization.
>
> d) In CUDA/OpenCL/C++ AMP kernels are scheduled from within the host
>      code. This means arbitrary host code can decide under which
>      conditions kernels are scheduled for execution. It is therefore
>      necessary to reference individual sub-modules from within the host
>      module.
>
> e) CUDA/OpenCL allows to compile and schedule an arbitrary number of
>      kernels. We do not want to put an artificial limit on the number of
>      modules they are represented in. This means a single embedded
>      submodule is not enough.
>
> ## Non Goals ##
>
> o Modeling sub-architectures on a per-function basis
>
> Functions could be specialized for a certain sub-architecture. This is
> helpful to have certain functions optimized e.g. with AVX2 enabled, but
> the general program being compiled for a more generic architecture.
> We do not address per-function annotations in this proposal.
>
> ## Proposed solution ##
>
> To bring multi-module support to LLVM, we propose to add a new type
> called 'llvmir' to LLVM-IR. It can be used to embed LLVM-IR submodules
> as global variables.
>
> ------------------------------------------------------------------------
> target datalayout = ...
> target triple = "x86_64-unknown-linux-gnu"
>
> @llvm_kernel = private unnamed_addr constant llvm_kernel {
>     target triple = nvptx64-unknown-unknown
>     define internal ptx_kernel void @gpu_kernel(i8* %Array) {
>       ...
>     }
> }
> ------------------------------------------------------------------------
>
> By default the global will be compiled to a llvm string stored in the
> object file. We could also think about translating it to PTX or AMD's
> HSA-IL, such that e.g. PTX can be passed to a run-time library.
>
>   From my point of view, Peters idea allows us to add multi-module
> support in a way that allows us to reach the goals described above.
> However, to properly design and implement it, early feedback would be
> valuable.
>
> Cheers
> Tobi
>
> [1] http://msdn.microsoft.com/en-us/library/hh265137%28v=vs.110%29
> [2]
> http://www.amd.com/us/press-releases/Pages/amd-arm-computing-innovation-2012june12.aspx
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>