[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Sun Apr 29 07:26:55 PDT 2012

Hi ,
在 2012-4-29，下午9:37， Tobias Grosser 写道：

> On 04/29/2012 01:21 AM, Justin Holewinski wrote:
>> 
>> 
>> On Sat, Apr 28, 2012 at 8:27 AM, Tobias Grosser <tobias at grosser.es
>> <mailto:tobias at grosser.es>> wrote:
>>    regalloc= is different. It is global and consequently influences
>>    both host and device code generation. However, to me it is rather a
>>    debugging option. It is never set by clang and targets provide a
>>    reasonable default based on the optimization level. I believe we can
>>    assume that for our use case it is not set. In case it is really
>>    necessary to explicitly set the register allocator, the right
>>    solution would be to make regalloc a target option.
>> 
>> 
>> The regalloc= option was just an example of the types of flags that can
>> be passed to llc, which are handled as global options instead of target
>> options.
> 
> Yes, thanks for pointing us to this problem. For now I think we can ignore them as they are mostly debugging options and they can be included in the target options if needed.
> 
>> The implicit assumption seems to be that the host code wants the device
>> code as assembly text.  What happens when you need to link the device
>> binary and upload it separately?  Think automatic SPU codegen on Cell.
>>  Is it up to the host program to invoke the other target's linker?
> 
> OK, I get what you mean. The intrinsic is currently targeted at the OpenCL/CUDA model. It is the most widely used. Stuff like cell sounds interesting, but probably needs further thoughts. Even with OpenCL/CUDA,
> this intrinsic works currently only for PTX code generation, but I hope we can gain support for other GPU devices later on.
> 
>>    I agree that future work can be useful here. However, before
>>    spending a large amount of time to engineer a complex solution, I
>>    propose to start with the proposed light-weight approach. It is
>>    sufficient for our needs and will allow us to get the experience and
>>    infrastructure that can help us to choose and implement a more
>>    complex later on.
>> 
>> 
>> I agree that this approach is the best way to get short-term results,
>> especially for the GSoC project.
> 
> OK, let's go ahead.
> 
> Yabin, can you update the patch with the following changes:
> 
> - Remove the Arch flag
> - Document that we require a triple
> - Add two new arguments that take a feature string and a mcpu
>  flag (can be set to "", which means we use the default)
OK. I will do that.
Thanks for all your comments.

best,
Yabin