<div class="gmail_extra">Hi Justin,<br><br>Thanks very much for your comments.<br><br><div class="gmail_quote">2012/4/28 Justin Holewinski <span dir="ltr"><<a href="mailto:justin.holewinski@gmail.com" target="_blank">justin.holewinski@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><div>On Fri, Apr 27, 2012 at 7:40 PM, Yabin Hu <span dir="ltr"><<a href="mailto:yabin.hwu@gmail.com" target="_blank">yabin.hwu@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><span style="font-family:nsimsun,monospace">The attached patch adds a new Intrinsic named "llvm.codegen" to support embedded LLVM IR code generation. </span><u style="font-family:nsimsun,monospace"></u><span style="font-family:nsimsun,monospace;text-align:left">The '</span><tt style="text-align:left">llvm.codegen</tt><span style="font-family:nsimsun,monospace;text-align:left">' intrinsic uses the LLVM back ends to generate code for embedded LLVM IR strings. The code generation target can be same or different to the one of the parent module. </span></div>
<div>
<p class="MsoNormal"><font face="nsimsun, monospace"><span style="background-color:transparent;text-align:left"><br></span></font></p><p class="MsoNormal"><font face="nsimsun, monospace" style="color:rgb(34,34,34)">The original motivation inspiring us to add this intrinsic, is to generate code for heterogeneous platform. A test case in the patch demos this. In the test case, on a X86 host, we use this intrinsic to transform an embedded LLVM IR into a string of PTX assembly. We can then employ a PTX execution engine ( on </font><span style="color:rgb(34,34,34);font-family:nsimsun,monospace">CUDA Supported GPU</span><font face="nsimsun, monospace" style="color:rgb(34,34,34)">) to execute the newly generated assembly and copy back the result later.</font></p>
</div></blockquote><div><br></div></div><div>I have to admit, I'm not sold on this solution. First, there is no clear way to pass codegen flags to the back-end. In PTX parlance, how would I embed an .ll file and compile to compute_13? </div>
</div></div></blockquote><div>We can handle this by provide a new argument (e.g. a string of properly-configured Target Machine) instead of or in addition to the Arch type string argument.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_extra"><div class="gmail_quote"><div>Second, this adds a layer of obfuscation to the system. If I look at an .ll file, I expect to see all of the assembly in a reasonably clean syntax. If the device code is squashed into a constant array, it is much harder to read.</div>
</div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>Is the motivation for the intrinsic simply to preserve the ability to pipe LLVM commands together on the command-line, e.g. opt | llc? I really feel that the cleaner solution is to split the IR into separate files, each of which can be processed independently after initial generation.</div>
</div></div></blockquote><div>Yes, it is. To preserve such an ability is the main benefit we got from this intrinsic. It means we needn't to implement another compiler driver or jit tool for our specific purpose. I agree with you that embedded llvm ir harms the readability of the .ll file. </div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_extra"><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<p class="MsoNormal"><span style="font-family:nsimsun,monospace">The usage of t</span><font face="nsimsun, monospace">his intrinsic is not limited to code generation for heterogeneous platform. It can also help lots of (run-time) optimization and security problems even when the code generation target is same as the one of the parent module.</font></p>
</blockquote><div><br></div></div><div>How does this help run-time optimization?</div></div></div></blockquote><div>We implement this intrinsic by learning the implementation style of llvm's garbage collector related intrinsics which support various GC strategies. It can help if the ASMGenerator in the patch is revised to be able to accept various optimization strategies provided by the user of this intrinsic. Then the intrinsic will do what the user wants to the input code string. When running the code with lli like jit tools, we can choose one optimization strategy at run-time. Though haven't supported this currently, we try to make the design as general as we can. The essential functionality of this intrinsic is that we get an input code string, transform it into a target-specific new one then replace the call to the intrinsic.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_extra"><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<p class="MsoNormal"><font face="nsimsun, monospace"><span style="text-align:left">Each call to the intrinsic has two arguments. One is the LLVM IR string. The other is the name of the target architecture. When running with tools like llc, lli, etc, this intrinsic transforms the input LLVM IR string to a new string of assembly code for the target architecture </span></font><span style="text-align:left;font-family:nsimsun,monospace">firstly. Then the call to the intrinsic is replaced by a pointer to the newly generated string. After this, we have in our module</span></p>
</div></blockquote><div><br></div></div><div>Is the Arch parameter to llvm.codegen really needed? Since codegen happens when lowering the intrinsic, the target architecture must be known. But if the target architecture is known, then it should be available in the triple for the embedded module.</div>
<div></div></div></div></blockquote><div>Yes. It is better that the target data is set correctly in the embedded module. It is the user's responsibility to do this.</div><div><br></div><div> </div><div>Thanks again!</div>
<div><br></div><div>best regards, </div><div>Yabin</div></div></div>