[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Mon May 7 14:06:51 PDT 2012

On 05/07/2012 06:07 PM, dag at cray.com wrote:
> Tobias Grosser<tobias at grosser.es>  writes:
>
>>> Doesn't LLVM support taking the address of a function in another address
>>> space?  If not it probably should.
>>
>> Hi Dave,
>> The llvm.codegen intrinsic seems the perfect match to build up such
>> experience. It requires no changes to LLVM-IR itself and only very
>> local changes to the generic back end infrastructure. It may possibly
>> not be as generic as other solutions, but it is far from being an ugly
>> hack. Quite in contrast, it is a close match for OpenCL like run times
>> and works well with the existing PTX back end.
>
> I'll bite my tongue on the designs of OpenCL and CUDA.  :)
>
> But regardless, if those are your targets you don't need llvm.codegen at
> all.

Why is it not needed? I don't see anything that could currently replace 
it. How can I create a loadable optimizer module that creates embedded 
PTX code without the llvm.codegen intrinsic?

>> Do you have definitiv plans to add heterogeneous computing
>> capabilities to LLVM-IR within the next couple (3-4) months? Will
>> these capabilities superseed the llvm codegen intrinsic?
>
> No specific plans to change the IR.  We have not found a need such
> changes on current architectures as the runtimes provided with those
> architectures handles the ugly details.  I am thinking further into the
> future and what might be needed there.

OK. I am talking about something that is available within the next weeks 
in LLVM.

>> In case such plans do not exist, what do you think about adding the
>> llvm.codegen() intrinsic for now? If mid-term plans exist for
>> heterogeneous extensions to LLVM-IR, we can document them along the
>> intrinsic.
>
> I think it's completely unnecessary if your goal is to get something
> working on current hardware.

Again, why is it unnecessary?

> We do have certaint structural/software engineeering changes to the
> implementation of LLVM's code generator that would be useful.  This
> primarily is the ability to completely process one function before
> moving onto the next.  This is important when dealing with heterogeneous
> systems as one has to for example write out different asm for the
> various targets at a function granularity.  But that doesn't require any
> IR changes whatsoever.

At least for CUDA/OpenCL the modules are entirely independent. Is such a 
fine granularity realy required?

Tobi