<div dir="ltr"><div>Hi,</div><div><br></div><div>Heterogeneous modules seem like an important feature when targeting accelerators.</div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 27, 2020 at 11:01 PM Johannes Doerfert via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">TL;DR<br>
-----<br>
<br>
Let's allow to merge to LLVM-IR modules for different targets (with<br>
compatible data layouts) into a single LLVM-IR module to facilitate<br>
host-device code optimizations.<br></blockquote><div><br></div><div>I think the main question I have is with respect to this limitation on the datalayout: isn't it too limiting in practice?</div><div>I understand that this is much easier to implement in LLVM today, but it may get us into a fairly limited place in terms of what can be supported in the future.</div><div>Have you looked into what would it take to have heterogeneous modules that have their own DL?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
<br>
Wait, what?<br>
-----------<br>
<br>
Given an offloading programming model of your choice (CUDA, HIP, SYCL,<br>
OpenMP, OpenACC, ...), the current pipeline will most likely optimize<br>
the host and the device code in isolation. This is problematic as it<br>
makes everything from simple constant propagation to kernel<br>
splitting/fusion painfully hard. The proposal is to merge host and<br>
device code in a single module during the optimization steps. This<br>
should not induce any cost (if people don't use the functionality).<br>
<br>
<br>
But how do heterogeneous modules help?<br>
--------------------------------------<br>
<br>
Assuming we have heterogeneous LLVM-IR modules we can look at<br>
accelerator code optimization as an interprocedural optimization<br>
problem. You basically call the "kernel" but you cannot inline it. So<br>
you know the call site(s) and arguments, can propagate information back<br>
and forth (=constants, attributes, ...), and modify the call site as<br>
well as the kernel simultaneously, e.g., to split the kernel or fuse<br>
consecutive kernels. Without heterogeneous LLVM-IR modules we can do all<br>
of this, but require a lot more machinery. Given abstract call sites<br>
[0,1] and enabled interprocedural optimizations [2], host-device<br>
optimizations inside a heterogeneous module are really not (much)<br>
different than any other interprocedural optimization.<br>
<br>
[0] <a href="https://llvm.org/docs/LangRef.html#callback-metadata" rel="noreferrer" target="_blank">https://llvm.org/docs/LangRef.html#callback-metadata</a><br>
[1] <a href="https://youtu.be/zfiHaPaoQPc" rel="noreferrer" target="_blank">https://youtu.be/zfiHaPaoQPc</a><br>
[2] <a href="https://youtu.be/CzWkc_JcfS0" rel="noreferrer" target="_blank">https://youtu.be/CzWkc_JcfS0</a><br>
<br>
<br>
Where are the details?<br>
----------------------<br>
<br>
This is merely a proposal to get feedback. I talked to people before and<br>
got mixed results. I think this can be done in an "opt-in" way that is<br>
non-disruptive and without penalty. I sketched some ideas in [3] but<br>
*THIS IS NOT A PROPER PATCH*. If there is interest, I will provide more<br>
thoughts on design choices and potential problems. Since there is not<br>
much, I was hoping this would be a community effort from the very<br>
beginning :)<br>
<br>
[3] <a href="https://reviews.llvm.org/D84728" rel="noreferrer" target="_blank">https://reviews.llvm.org/D84728</a><br>
<br>
<br>
But MLIR, ...<br>
-------------<br>
<br>
I imagine MLIR can be used for this and there are probably good reasons<br>
to do so. We might not want to *only* to do it there with mainly the<br>
same arguments other things are still developed on LLVM-IR level. Feel<br>
free to ask though :)</blockquote><div> </div><div>(+1 : MLIR is not intended to be a reason to not improve LLVM!)<br></div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div></div></div>