[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Sun Apr 29 06:37:06 PDT 2012

On 04/29/2012 01:21 AM, Justin Holewinski wrote:
>
>
> On Sat, Apr 28, 2012 at 8:27 AM, Tobias Grosser <tobias at grosser.es
> <mailto:tobias at grosser.es>> wrote:
>     regalloc= is different. It is global and consequently influences
>     both host and device code generation. However, to me it is rather a
>     debugging option. It is never set by clang and targets provide a
>     reasonable default based on the optimization level. I believe we can
>     assume that for our use case it is not set. In case it is really
>     necessary to explicitly set the register allocator, the right
>     solution would be to make regalloc a target option.
>
>
> The regalloc= option was just an example of the types of flags that can
> be passed to llc, which are handled as global options instead of target
> options.

Yes, thanks for pointing us to this problem. For now I think we can 
ignore them as they are mostly debugging options and they can be 
included in the target options if needed.

> The implicit assumption seems to be that the host code wants the device
> code as assembly text.  What happens when you need to link the device
> binary and upload it separately?  Think automatic SPU codegen on Cell.
>   Is it up to the host program to invoke the other target's linker?

OK, I get what you mean. The intrinsic is currently targeted at the 
OpenCL/CUDA model. It is the most widely used. Stuff like cell sounds 
interesting, but probably needs further thoughts. Even with OpenCL/CUDA,
this intrinsic works currently only for PTX code generation, but I hope 
we can gain support for other GPU devices later on.

>     I agree that future work can be useful here. However, before
>     spending a large amount of time to engineer a complex solution, I
>     propose to start with the proposed light-weight approach. It is
>     sufficient for our needs and will allow us to get the experience and
>     infrastructure that can help us to choose and implement a more
>     complex later on.
>
>
> I agree that this approach is the best way to get short-term results,
> especially for the GSoC project.

OK, let's go ahead.

Yabin, can you update the patch with the following changes:

- Remove the Arch flag
- Document that we require a triple
- Add two new arguments that take a feature string and a mcpu
   flag (can be set to "", which means we use the default)

Cheers
Tobi