[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Mon Jul 27 23:00:02 PDT 2020

TL;DR
-----

Let's allow to merge to LLVM-IR modules for different targets (with
compatible data layouts) into a single LLVM-IR module to facilitate
host-device code optimizations.

Wait, what?
-----------

Given an offloading programming model of your choice (CUDA, HIP, SYCL,
OpenMP, OpenACC, ...), the current pipeline will most likely optimize
the host and the device code in isolation. This is problematic as it
makes everything from simple constant propagation to kernel
splitting/fusion painfully hard. The proposal is to merge host and
device code in a single module during the optimization steps. This
should not induce any cost (if people don't use the functionality).

But how do heterogeneous modules help?
--------------------------------------

Assuming we have heterogeneous LLVM-IR modules we can look at
accelerator code optimization as an interprocedural optimization
problem. You basically call the "kernel" but you cannot inline it. So
you know the call site(s) and arguments, can propagate information back
and forth (=constants, attributes, ...), and modify the call site as
well as the kernel simultaneously, e.g., to split the kernel or fuse
consecutive kernels. Without heterogeneous LLVM-IR modules we can do all
of this, but require a lot more machinery. Given abstract call sites
[0,1] and enabled interprocedural optimizations [2], host-device
optimizations inside a heterogeneous module are really not (much)
different than any other interprocedural optimization.

[0] https://llvm.org/docs/LangRef.html#callback-metadata
[1] https://youtu.be/zfiHaPaoQPc
[2] https://youtu.be/CzWkc_JcfS0

Where are the details?
----------------------

This is merely a proposal to get feedback. I talked to people before and
got mixed results. I think this can be done in an "opt-in" way that is
non-disruptive and without penalty. I sketched some ideas in [3] but
*THIS IS NOT A PROPER PATCH*. If there is interest, I will provide more
thoughts on design choices and potential problems. Since there is not
much, I was hoping this would be a community effort from the very
beginning :)

[3] https://reviews.llvm.org/D84728

But MLIR, ...
-------------

I imagine MLIR can be used for this and there are probably good reasons
to do so. We might not want to *only* to do it there with mainly the
same arguments other things are still developed on LLVM-IR level. Feel
free to ask though :)

Thanks,
   Johannes