[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Mon May 7 01:47:15 PDT 2012

On 04/30/2012 09:55 PM, dag at cray.com wrote:
> Tobias Grosser<tobias at grosser.es>  writes:
>
>> To write optimizations that yield embedded GPU code, we also looked into
>> three other approaches:
>>
>> 1. Directly create embedded target code (e.g. PTX)
>>
>> This would mean the optimization pass extracts device code internally
>> and directly generate the relevant target code. This approach would
>> require our generic optimization pass to be directly linked with the
>> specific target back end. This is an ugly layering violation and, in
>> addition, it causes major troubles in case the new optimization should
>> be dynamically loaded.
>
> IMHO it's a bit unrealistic to have a target-independent optimization
> layer.  Almost all optimization wants to know target details at some
> point.  I think we can and probably should support that.  We can allow
> passes to gracefully fall back in the cases where target information is
> not available.

Yes, I agree it makes sense to make target-information available to the 
optimizers. As you noted yourself, this is different to performing 
target code generation in the optimizers.

>> 2. Extend the LLVM-IR files to support heterogeneous modules
>>
>> This would mean we extend LLVM-IR, such that IR for different targets
>> can be stored within a single IR file. This approach could be integrated
>> nicely into the LLVM code generation flow and would yield readable
>> LLVM-IR even for the device code. However, it adds another level of
>> complexity to the LLVM-IR files and does not only require massive
>> changes in the LLVM code base, but also in compilers built on top of
>> LLVM-IR.
>
> I don't think the code base changes are all that bad.  We have a number
> of them to support generating code one function at a time rather than a
> whole module together.  They've been sitting around waiting for us to
> send them upstream.  It would be an easy matter to simply annotate each
> function with its target.  We don't currently do that because we never
> write out such IR files but it seems like a simple problem to solve to
> me.

Supporting several modules in on LLVM-IR file may not be too difficult,
but getting this in may still be controversial. The large amount of 
changes that I see are changes to the tools. At the moment all tools 
expect a single module coming from an LLVM-IR file. I pointed out the 
problems in llc and the codegen examples in my other mail.

>> 3. Generate two independent LLVM-IR files and pass them around together
>>
>> The host and device LLVM-IR modules could be kept in separate files.
>> This has the benefit of being user readable and not adding additional
>> complexity to the LLVM-IR files itself. However, separate files do not
>> provide information about how those files are related. Which files are
>> kernel files, how.where do they need to be loaded, ...? Also this
>> information could probably be put into meta-data or could be hard coded
>> into the generic compiler infrastructure, but this would require
>> significant additional code.
>
> I don't think metadata would work because it would not satisfy the "no
> semantic effects" requirement.  We couldn't just drop the metadata and
> expect things to work.

You are right, this solution requires semantic meta-data which is a 
non-trivial prerequisite.

>> Another weakness of this approach is that the entire LLVM optimization
>> chain is currently built under the assumption that a single file/module
>> passed around. This is most obvious with the 'opt | llc' idiom, but in
>> general every tool that does currently exist would need to be adapted to
>> handle multiple files and would possibly even need semantic knowledge
>> about how to connect/use them together. Just running clang or
>> draggonegg with -load GPGPUOptimizer.so would not be possible.
>
> Again, we have many of the changes to make this possible.  I hope to
> send them for review as we upgrade to 3.1.

Could you provide a list of the changes you have in the pipeline and a 
reliable timeline on when you will upstream them? How much additional 
work from other people is required to make this a valuable replacement 
of the llvm.codegen intrinsic?

>> All of the previous approaches require significant changes all over the
>> code base and would cause trouble with loadable optimization passes. The
>> intrinsic based approach seems to address most of the previous problems.
>
> I'm pretty uncomfortable with the proposed intrinsic.  It feels
> tacked-on and not in the LLVM spirit.  We should be able to extend the
> IR to support multiple targets.  We're going to need this kind of
> support for much more than GPUs in thefuture.  Heterogenous computing is
> here to stay.

Where exactly do you see problems with this intrinsic? It is not meant 
to block further work in heterogeneous computing, but to allow us to 
gradually improve LLVM to gain such features. It especially provides a 
low overhead solution that adds working heterogeneous compute 
capabilities for major GPU targets to LLVM. This working solution can 
prepare the ground for closer integrated solutions.

Tobi