[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Tobias Grosser
tobias at grosser.es
Mon May 7 14:06:51 PDT 2012
On 05/07/2012 06:07 PM, dag at cray.com wrote:
> Tobias Grosser<tobias at grosser.es> writes:
>
>>> Doesn't LLVM support taking the address of a function in another address
>>> space? If not it probably should.
>>
>> Hi Dave,
>> The llvm.codegen intrinsic seems the perfect match to build up such
>> experience. It requires no changes to LLVM-IR itself and only very
>> local changes to the generic back end infrastructure. It may possibly
>> not be as generic as other solutions, but it is far from being an ugly
>> hack. Quite in contrast, it is a close match for OpenCL like run times
>> and works well with the existing PTX back end.
>
> I'll bite my tongue on the designs of OpenCL and CUDA. :)
>
> But regardless, if those are your targets you don't need llvm.codegen at
> all.
Why is it not needed? I don't see anything that could currently replace
it. How can I create a loadable optimizer module that creates embedded
PTX code without the llvm.codegen intrinsic?
>> Do you have definitiv plans to add heterogeneous computing
>> capabilities to LLVM-IR within the next couple (3-4) months? Will
>> these capabilities superseed the llvm codegen intrinsic?
>
> No specific plans to change the IR. We have not found a need such
> changes on current architectures as the runtimes provided with those
> architectures handles the ugly details. I am thinking further into the
> future and what might be needed there.
OK. I am talking about something that is available within the next weeks
in LLVM.
>> In case such plans do not exist, what do you think about adding the
>> llvm.codegen() intrinsic for now? If mid-term plans exist for
>> heterogeneous extensions to LLVM-IR, we can document them along the
>> intrinsic.
>
> I think it's completely unnecessary if your goal is to get something
> working on current hardware.
Again, why is it unnecessary?
> We do have certaint structural/software engineeering changes to the
> implementation of LLVM's code generator that would be useful. This
> primarily is the ability to completely process one function before
> moving onto the next. This is important when dealing with heterogeneous
> systems as one has to for example write out different asm for the
> various targets at a function granularity. But that doesn't require any
> IR changes whatsoever.
At least for CUDA/OpenCL the modules are entirely independent. Is such a
fine granularity realy required?
Tobi
More information about the llvm-dev
mailing list