Hi Tobias,<br><br>> What is the benefit of having the kernel embedded as data symbol in the 

ELF object, in contrast to having it as a global variable<br><br>This is for conventional link step & LTO. During compilation we allow kernels to depend on each other, which is resolved during linking. The whole process is built on top of gcc and its existing collect2/lto1 mechanisms. As result, we have hybrid objects/libraries/binaries containing two independent representations: regular binary output from gcc and a set of LLVM IR of kernels operated by their own entry point. And now the code is not in the data section, but in special one, similar to __gnu_lto_v1 for gcc's LTO.<br>

<br>One question I realized while replying to yours: do you see your team more focusing on infrastructure things or on polyhedral analysis development?<br><br>The quality of CLooG/Polly is what we ultimately rely on in the _first_ place. All other things are _a_lot_ simpler. You will see: ecosystems, applications and testbeds will grow around themselves, once the core concepts is strong. There is probably no need to spend resources on leading the way for them in engineering topics. But they may <span id="result_box" class="short_text" lang="en"><span class="hps">wither</span></span> soon, if math is not growing with the same speed. Just an opinion.<br>

<br>Best,<br>- D.<br><br><div class="gmail_quote">2012/7/29 Tobias Grosser <span dir="ltr"><<a href="mailto:tobias@grosser.es" target="_blank">tobias@grosser.es</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On 07/26/2012 04:12 PM, Dmitry N. Mikushin wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In our project we combine regular binary code and LLVM IR code for<br>

kernels, embedded as a special data symbol of ELF object. The LLVM IR<br>

for kernel existing at compile-time is preliminary, and may be optimized<br>

further during runtime (pointers analysis, polly, etc.). During<br>

application startup, runtime system builds an index of all kernels<br>

sources embedded into the executable. Host and kernel code interact by<br>

means of special "launch" call, which does not only<br>

optimize&compile&execute the kernel, but first makes an estimation if it<br>

is worth to, or better to fall back to host code equivalent.<br>

<br>

Proposal made by Tobias is very elegant, but it seems to be addressing<br>

the case when host and sub-architectures' code exist in the same time.<br>

May I kindly point out that to our experience the really efficient<br>

deeply specialized sub-architectures code may simply not exist at<br>

compile time, while the generic baseline host code always can.<br>

</blockquote>

<br></div>

Hi Dimitry,<br>

<br>

the proposal did not mean to say that all code needs to be optimized and target code generate at compile time. You may very well retain some kernels as LLVM-IR code and pass this code to your runtime system (similar how CUDA or OpenCL currently accept kernel code).<br>


<br>

Btw, one question I always wanted to ask: What is the benefit of having the kernel embedded as data symbol in the ELF object, in contrast to having it as a global variable (which is then passed to the run-time). I know cell used mainly elf symbols, but e.g. OpenCL reads kernels by passing a pointer to the kernel string to the run-time library. Can you point out the difference to me?<br>


<br>

Cheers and thanks<br>

Tobi<br>

</blockquote></div><br>