[LLVMdev] [PROPOSAL] LLVM multi-module support
Duncan Sands
baldrick at free.fr
Thu Jul 26 04:23:24 PDT 2012
Hi Dmitry,
> In our project we combine regular binary code and LLVM IR code for kernels,
> embedded as a special data symbol of ELF object. The LLVM IR for kernel existing
> at compile-time is preliminary, and may be optimized further during runtime
> (pointers analysis, polly, etc.). During application startup, runtime system
> builds an index of all kernels sources embedded into the executable. Host and
> kernel code interact by means of special "launch" call, which does not only
> optimize&compile&execute the kernel, but first makes an estimation if it is
> worth to, or better to fall back to host code equivalent.
in your case it doesn't sound like any modifications to what a module can hold
are needed, it's more a question of building stuff on top of the existing
infrastructure.
> Proposal made by Tobias is very elegant, but it seems to be addressing the case
> when host and sub-architectures' code exist in the same time. May I kindly point
> out that to our experience the really efficient deeply specialized
> sub-architectures code may simply not exist at compile time, while the generic
> baseline host code always can.
I can't help feeling that Tobias is reinventing "tar", only upside down, and
rather than stuffing an archive inside modules he should be stuffing modules
inside an archive. But most likely I just completely failed to understand
where he's going.
Ciao, Duncan.
>
> Best,
> - Dima.
>
> 2012/7/26 Duncan Sands <baldrick at free.fr <mailto:baldrick at free.fr>>
>
> Hi Tobias, I didn't really get it. Is the idea that the same bitcode is
> going to be codegen'd for different architectures, or is each sub-module
> going to contain different bitcode? In the later case you may as well
> just use multiple modules, perhaps in conjunction with a scheme to store
> more than one module in the same file on disk as a convenience.
>
> Ciao, Duncan.
>
> > a couple of weeks ago I discussed with Peter how to improve LLVM's
> > support for heterogeneous computing. One weakness we (and others) have
> > seen is the absence of multi-module support in LLVM. Peter came up with
> > a nice idea how to improve here. I would like to put this idea up for
> > discussion.
> >
> > ## The problem ##
> >
> > LLVM-IR modules can currently only contain code for a single target
> > architecture. However, there are multiple use cases where one
> > translation unit could contain code for several architectures.
> >
> > 1) CUDA
> >
> > cuda source files can contain both host and device code. The absence of
> > multi-module support complicates adding CUDA support to clang, as clang
> > would need to perform multi-module compilation on top of a single-module
> > based compiler framework.
> >
> > 2) C++ AMP
> >
> > C++ AMP [1] contains - similarly to CUDA - both host code and device
> > code in the same source file. Even if C++ AMP is a Microsoft extension
> > the use case itself is relevant to clang. It would be great if LLVM
> > would provide infrastructure, such that front-ends could easily target
> > accelerators. This would probably yield a lot of interesting experiments.
> >
> > 3) Optimizers
> >
> > To fully automatically offload computations to an accelerator an
> > optimization pass needs to extract the computation kernels and schedule
> > them as separate kernels on the device. Such kernels are normally
> > LLVM-IR modules for different architectures. At the moment, passes have
> > no way to create and store new LLVM-IR modules. There is also no way
> > to reference kernel LLVM-IR modules from a host module (which is
> > necessary to pass them to the accelerator run-time).
> >
> > ## Goals ##
> >
> > a) No major changes to existing tools and LLVM based applications
> >
> > b) Human readable and writable LLVM-IR
> >
> > c) FileCheck testability
> >
> > d) Do not force a specific execution model
> >
> > e) Unlimited number of embedded modules
> >
> > ## Detailed Goals
> >
> > a)
> > o No changes should be required, if a tool does not use multi-module
> > support. Each LLVM-IR file valid today, should remain valid.
> >
> > o Major tools should support basic heterogeneous modules without large
> > changes. Some of the commands that should work after smaller
> > adaptions:
> >
> > clang -S -emit-llvm -o out.ll
> > opt -O3 out.ll -o out.opt.ll
> > llc out.opt.ll
> > lli out.opt.ll
> > bugpoint -O3 out.opt.ll
> >
> > b) All (sub)modules should be directly human readable/writable.
> > There should be no need to extract single modules before modifying
> > them.
> >
> > c) The LLVM-IR generated from a heterogeneous multi-module should
> > easily be 'FileCheck'able. The same is true, if a multi-module is
> > the result of an optimization.
> >
> > d) In CUDA/OpenCL/C++ AMP kernels are scheduled from within the host
> > code. This means arbitrary host code can decide under which
> > conditions kernels are scheduled for execution. It is therefore
> > necessary to reference individual sub-modules from within the host
> > module.
> >
> > e) CUDA/OpenCL allows to compile and schedule an arbitrary number of
> > kernels. We do not want to put an artificial limit on the number of
> > modules they are represented in. This means a single embedded
> > submodule is not enough.
> >
> > ## Non Goals ##
> >
> > o Modeling sub-architectures on a per-function basis
> >
> > Functions could be specialized for a certain sub-architecture. This is
> > helpful to have certain functions optimized e.g. with AVX2 enabled, but
> > the general program being compiled for a more generic architecture.
> > We do not address per-function annotations in this proposal.
> >
> > ## Proposed solution ##
> >
> > To bring multi-module support to LLVM, we propose to add a new type
> > called 'llvmir' to LLVM-IR. It can be used to embed LLVM-IR submodules
> > as global variables.
> >
> > ------------------------------------------------------------------------
> > target datalayout = ...
> > target triple = "x86_64-unknown-linux-gnu"
> >
> > @llvm_kernel = private unnamed_addr constant llvm_kernel {
> > target triple = nvptx64-unknown-unknown
> > define internal ptx_kernel void @gpu_kernel(i8* %Array) {
> > ...
> > }
> > }
> > ------------------------------------------------------------------------
> >
> > By default the global will be compiled to a llvm string stored in the
> > object file. We could also think about translating it to PTX or AMD's
> > HSA-IL, such that e.g. PTX can be passed to a run-time library.
> >
> > From my point of view, Peters idea allows us to add multi-module
> > support in a way that allows us to reach the goals described above.
> > However, to properly design and implement it, early feedback would be
> > valuable.
> >
> > Cheers
> > Tobi
> >
> > [1] http://msdn.microsoft.com/en-us/library/hh265137%28v=vs.110%29
> > [2]
> >
> http://www.amd.com/us/press-releases/Pages/amd-arm-computing-innovation-2012june12.aspx
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
More information about the llvm-dev
mailing list