[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

dag at cray.com dag at cray.com
Mon Apr 30 12:55:02 PDT 2012


Tobias Grosser <tobias at grosser.es> writes:

> To write optimizations that yield embedded GPU code, we also looked into 
> three other approaches:
>
> 1. Directly create embedded target code (e.g. PTX)
>
> This would mean the optimization pass extracts device code internally 
> and directly generate the relevant target code. This approach would 
> require our generic optimization pass to be directly linked with the 
> specific target back end. This is an ugly layering violation and, in 
> addition, it causes major troubles in case the new optimization should 
> be dynamically loaded.

IMHO it's a bit unrealistic to have a target-independent optimization
layer.  Almost all optimization wants to know target details at some
point.  I think we can and probably should support that.  We can allow
passes to gracefully fall back in the cases where target information is
not available.

> 2. Extend the LLVM-IR files to support heterogeneous modules
>
> This would mean we extend LLVM-IR, such that IR for different targets
> can be stored within a single IR file. This approach could be integrated 
> nicely into the LLVM code generation flow and would yield readable 
> LLVM-IR even for the device code. However, it adds another level of 
> complexity to the LLVM-IR files and does not only require massive 
> changes in the LLVM code base, but also in compilers built on top of 
> LLVM-IR.

I don't think the code base changes are all that bad.  We have a number
of them to support generating code one function at a time rather than a
whole module together.  They've been sitting around waiting for us to
send them upstream.  It would be an easy matter to simply annotate each
function with its target.  We don't currently do that because we never
write out such IR files but it seems like a simple problem to solve to
me.

> 3. Generate two independent LLVM-IR files and pass them around together
>
> The host and device LLVM-IR modules could be kept in separate files. 
> This has the benefit of being user readable and not adding additional 
> complexity to the LLVM-IR files itself. However, separate files do not 
> provide information about how those files are related. Which files are 
> kernel files, how.where do they need to be loaded, ...? Also this 
> information could probably be put into meta-data or could be hard coded
> into the generic compiler infrastructure, but this would require 
> significant additional code.

I don't think metadata would work because it would not satisfy the "no
semantic effects" requirement.  We couldn't just drop the metadata and
expect things to work.

> Another weakness of this approach is that the entire LLVM optimization 
> chain is currently built under the assumption that a single file/module 
> passed around. This is most obvious with the 'opt | llc' idiom, but in 
> general every tool that does currently exist would need to be adapted to 
> handle multiple files and would possibly even need semantic knowledge 
> about how to connect/use them together. Just running clang or
> draggonegg with -load GPGPUOptimizer.so would not be possible.

Again, we have many of the changes to make this possible.  I hope to
send them for review as we upgrade to 3.1.

> All of the previous approaches require significant changes all over the 
> code base and would cause trouble with loadable optimization passes. The 
> intrinsic based approach seems to address most of the previous problems.

I'm pretty uncomfortable with the proposed intrinsic.  It feels
tacked-on and not in the LLVM spirit.  We should be able to extend the
IR to support multiple targets.  We're going to need this kind of
support for much more than GPUs in thefuture.  Heterogenous computing is
here to stay.

                             -Dave



More information about the llvm-dev mailing list