[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Mehdi AMINI via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 28 11:03:33 PDT 2020


Hi,

Heterogeneous modules seem like an important feature when
targeting accelerators.

On Mon, Jul 27, 2020 at 11:01 PM Johannes Doerfert via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> TL;DR
> -----
>
> Let's allow to merge to LLVM-IR modules for different targets (with
> compatible data layouts) into a single LLVM-IR module to facilitate
> host-device code optimizations.
>

I think the main question I have is with respect to this limitation on the
datalayout: isn't it too limiting in practice?
I understand that this is much easier to implement in LLVM today, but it
may get us into a fairly limited place in terms of what can be supported in
the future.
Have you looked into what would it take to have heterogeneous modules that
have their own DL?


>
>
> Wait, what?
> -----------
>
> Given an offloading programming model of your choice (CUDA, HIP, SYCL,
> OpenMP, OpenACC, ...), the current pipeline will most likely optimize
> the host and the device code in isolation. This is problematic as it
> makes everything from simple constant propagation to kernel
> splitting/fusion painfully hard. The proposal is to merge host and
> device code in a single module during the optimization steps. This
> should not induce any cost (if people don't use the functionality).
>
>
> But how do heterogeneous modules help?
> --------------------------------------
>
> Assuming we have heterogeneous LLVM-IR modules we can look at
> accelerator code optimization as an interprocedural optimization
> problem. You basically call the "kernel" but you cannot inline it. So
> you know the call site(s) and arguments, can propagate information back
> and forth (=constants, attributes, ...), and modify the call site as
> well as the kernel simultaneously, e.g., to split the kernel or fuse
> consecutive kernels. Without heterogeneous LLVM-IR modules we can do all
> of this, but require a lot more machinery. Given abstract call sites
> [0,1] and enabled interprocedural optimizations [2], host-device
> optimizations inside a heterogeneous module are really not (much)
> different than any other interprocedural optimization.
>
> [0] https://llvm.org/docs/LangRef.html#callback-metadata
> [1] https://youtu.be/zfiHaPaoQPc
> [2] https://youtu.be/CzWkc_JcfS0
>
>
> Where are the details?
> ----------------------
>
> This is merely a proposal to get feedback. I talked to people before and
> got mixed results. I think this can be done in an "opt-in" way that is
> non-disruptive and without penalty. I sketched some ideas in [3] but
> *THIS IS NOT A PROPER PATCH*. If there is interest, I will provide more
> thoughts on design choices and potential problems. Since there is not
> much, I was hoping this would be a community effort from the very
> beginning :)
>
> [3] https://reviews.llvm.org/D84728
>
>
> But MLIR, ...
> -------------
>
> I imagine MLIR can be used for this and there are probably good reasons
> to do so. We might not want to *only* to do it there with mainly the
> same arguments other things are still developed on LLVM-IR level. Feel
> free to ask though :)


(+1 : MLIR is not intended to be a reason to not improve LLVM!)

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200728/f51a4b86/attachment.html>


More information about the llvm-dev mailing list