<div class="gmail_quote">On Tue, May 8, 2012 at 2:20 AM, Tobias Grosser <span dir="ltr"><<a href="mailto:tobias@grosser.es" target="_blank">tobias@grosser.es</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 05/08/2012 12:14 AM, <a href="mailto:dag@cray.com" target="_blank">dag@cray.com</a> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Tobias Grosser<<a href="mailto:tobias@grosser.es" target="_blank">tobias@grosser.es</a>>  writes:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I forgot to address this one.  With current OpenCL and CUDA<br>

specifications, there's no need to do multiple .o files.  In my mind,<br>

llc should output one .o (one .s, etc.).  Anything else wreaks havoc on<br>

build systems.<br>

</blockquote>

<br>

Yes, that's what I am advocating for. There is no need for all this<br>

complexity. Both standards store the embedded code as a string in the<br>

host module. That is exactly what the llvm.codegen intrinsic<br>

models. It requires zero further changes to the code generation<br>

backend.<br>

</blockquote>

<br>

But why do you need an intrinsic to do that?  Just generate the code to<br>

a file and suck it into a string, maybe with an external "linker" tool.<br>

<br>

If you just want something to work, that should be sufficient.  If you<br>

want some long-term design/implementation I don't think llvm.codegen is<br>

it.<br>

</blockquote>

<br></div>

OK. I think we are on the same track. Yes, there is no need for a lot of infrastructure. Storing PTX in a string of the host module, is the only thing needed.<br>

<br>

So why the intrinsic? I want to create the PTX string from an LLVM-IR optimizer pass, that should be loaded into clang, dragonegg, opt, ..<br>

An LLVM-IR optimizer pass does not have access to the file system and it can not link to the LLVM back ends to directly create PTX. Creating PTX in an optimizer pass would be an ugly hack. The cleaner solution is to store an LLVM-IR string in the host module and to mark it with the llvm.codegen() intrinsic. When the module is processed by the backend, the string is automatically translated to PTX. This requires no additional file writing, introduces no layering violations and seems to be very simple.<br>


<br>

I don't see a better way to translate LLVM-IR to PTX. Do you stil believe introducing file writing to an optimizer module is a good and portable solution?<br></blockquote><div><br></div><div>Until any new infrastructure is implemented, I don't see it being any worse of a solution.  Don't get me wrong, I think the llvm.codegen() intrinsic is a fast way to get things up and running for the GSoC project; but I also agree with Dan and Evan that it's not appropriate for LLVM mainline. There are just too many subtle details and this really only handles the case of host code needing the device code as text assembly.</div>

<div><br></div><div>To support opt-level transforms, you could just embed the generated IR as text in the module, then invoke a separate tool to extract that you into a separate module.  The more I think about this, the more I become convinced that we could benefit from a module "container," similar to a Mac fat/universal binary.  Something like this probably wouldn't be too hard to implement; the main problem I see if what llc outputs, or maybe a single llc invocation would only process one module in the container.</div>

<div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Cheers<br>

Tobi<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>