<div class="gmail_extra"><div class="gmail_quote">On Mon, Apr 30, 2012 at 12:55 PM,  <span dir="ltr"><<a href="mailto:dag@cray.com" target="_blank">dag@cray.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">Tobias Grosser <<a href="mailto:tobias@grosser.es">tobias@grosser.es</a>> writes:<br>

<br>

> To write optimizations that yield embedded GPU code, we also looked into<br>

> three other approaches:<br>

><br>

> 1. Directly create embedded target code (e.g. PTX)<br>

><br>

> This would mean the optimization pass extracts device code internally<br>

> and directly generate the relevant target code. This approach would<br>

> require our generic optimization pass to be directly linked with the<br>

> specific target back end. This is an ugly layering violation and, in<br>

> addition, it causes major troubles in case the new optimization should<br>

> be dynamically loaded.<br>

<br>

</div>IMHO it's a bit unrealistic to have a target-independent optimization<br>

layer.  Almost all optimization wants to know target details at some<br>

point.  I think we can and probably should support that.  We can allow<br>

passes to gracefully fall back in the cases where target information is<br>

not available.<br>

<div class="im"><br>

> 2. Extend the LLVM-IR files to support heterogeneous modules<br>

><br>

> This would mean we extend LLVM-IR, such that IR for different targets<br>

> can be stored within a single IR file. This approach could be integrated<br>

> nicely into the LLVM code generation flow and would yield readable<br>

> LLVM-IR even for the device code. However, it adds another level of<br>

> complexity to the LLVM-IR files and does not only require massive<br>

> changes in the LLVM code base, but also in compilers built on top of<br>

> LLVM-IR.<br>

<br>

</div>I don't think the code base changes are all that bad.  We have a number<br>

of them to support generating code one function at a time rather than a<br>

whole module together.  They've been sitting around waiting for us to<br>

send them upstream.  It would be an easy matter to simply annotate each<br>

function with its target.  We don't currently do that because we never<br>

write out such IR files but it seems like a simple problem to solve to<br>

me.<br></blockquote><div><br></div><div>If such changes are almost ready to be up-streamed, then great!  It just seems like a fairly non-trivial task to actually implement function-level target selection, especially when you consider function call semantics, taking the address of a function, etc.  If you have a global variable, what target "sees" it?  Does it need to be annotated along with the function?  Can functions from two different targets share this pointer?  At first glance, there seems to be many non-trivial issues that are heavily dependent on the nature of the target.  For Yabin's use-case, the X86 portions need to be compiled to assembly, or even an object file, while the PTX portions need to be lowered to an assembly string and embedded in the X86 source (or written to disk somewhere).  If you're targeting Cell, in contrast, you'd want to compile both down to object files.</div>

<div><br></div><div>Don't get me wrong, I think this is something we need to do and the llvm.codegen intrinsic is a band-aid solution, but I don't see this as a simple problem.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im"><br>

> 3. Generate two independent LLVM-IR files and pass them around together<br>

><br>

> The host and device LLVM-IR modules could be kept in separate files.<br>

> This has the benefit of being user readable and not adding additional<br>

> complexity to the LLVM-IR files itself. However, separate files do not<br>

> provide information about how those files are related. Which files are<br>

> kernel files, how.where do they need to be loaded, ...? Also this<br>

> information could probably be put into meta-data or could be hard coded<br>

> into the generic compiler infrastructure, but this would require<br>

> significant additional code.<br>

<br>

</div>I don't think metadata would work because it would not satisfy the "no<br>

semantic effects" requirement.  We couldn't just drop the metadata and<br>

expect things to work.<br>

<div class="im"><br>

> Another weakness of this approach is that the entire LLVM optimization<br>

> chain is currently built under the assumption that a single file/module<br>

> passed around. This is most obvious with the 'opt | llc' idiom, but in<br>

> general every tool that does currently exist would need to be adapted to<br>

> handle multiple files and would possibly even need semantic knowledge<br>

> about how to connect/use them together. Just running clang or<br>

> draggonegg with -load GPGPUOptimizer.so would not be possible.<br>

<br>

</div>Again, we have many of the changes to make this possible.  I hope to<br>

send them for review as we upgrade to 3.1.<br>

<div class="im"><br>

> All of the previous approaches require significant changes all over the<br>

> code base and would cause trouble with loadable optimization passes. The<br>

> intrinsic based approach seems to address most of the previous problems.<br>

<br>

</div>I'm pretty uncomfortable with the proposed intrinsic.  It feels<br>

tacked-on and not in the LLVM spirit.  We should be able to extend the<br>

IR to support multiple targets.  We're going to need this kind of<br>

support for much more than GPUs in thefuture.  Heterogenous computing is<br>

here to stay.<br></blockquote><div><br></div><div>For me, the bigger question is: do we extend the IR to support multiple targets, or do we keep the one-target-per-module philosophy and derive some other way of representing how the modules fit together?  I can see pros and cons for both approaches.</div>

<div><br></div><div>What if instead of per-function annotations, we implement something like module file sections?  You could organize a module file into logical sections based on target architecture.  I'm just throwing that out there.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

                             -Dave<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>

</div>