[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Thu Jul 30 05:57:05 PDT 2020
[off topic] I'm not a fan of the "reply-to-list" default.
Thanks for the feedback! More below.
On 7/30/20 6:01 AM, David Chisnall via llvm-dev wrote:
> On 28/07/2020 07:00, Johannes Doerfert via llvm-dev wrote:
>> TL;DR
>> -----
>>
>> Let's allow to merge to LLVM-IR modules for different targets (with
>> compatible data layouts) into a single LLVM-IR module to facilitate
>> host-device code optimizations.
>
> I think it's worth taking a step back here and thinking through the
> problem. The proposed solution makes me nervous because it is quite a
> significant change to the compiler flow that comes from thinking of
> heterogeneous optimisation as an fat LTO problem, when to me it feels
> more like a thin LTO problem.
>
> At the moment, there's an implicit assumption that everything in a
> Module will flow to the same CodeGen back end. It can make global
> assumptions about cost models, can inline everything, and so on.
>
FWIW, I would expect that we split the module *before* the codegen stage
such that the back end doesn't have to deal with heterogeneous models
(right now).
I'm not sure about cost models and such though. As far as I know, we
don't do global decisions anywhere but I might be wrong. Put
differently, I hope we don't do global decisions as it seems quite easy
to disturb the result with unrelated code changes.
> It sounds as if we have a couple of use cases:
>
> - Analysis flow between modules
> - Transforms that modify two modules
>
Yes! Notably the first bullet is bi-directional and cyclic ;)
> The first case is where the motivating example of constant
> propagation. This feels like the right approach is something like
> ThinLTO, where you can collect in one module the fact that a kernel is
> invoked only with specific constant arguments in the host module and
> consume that result in the target module.
>
Except that you can have cyclic dependencies which makes this
problematic again. You might not propagate constants from the device
module to the host one, but if memory is only read/written on the device
is very interesting on the host side. You can avoid memory copies,
remove globals, etc. That is just what comes to mind right away. The
proposed heterogeneous modules should not limit you to "monolithic LTO",
or "thin LTO" for that matter.
> The second example is what you'd need for things like kernel fusion,
> where you need to both combine two kernels in the target module and
> also modify the callers to invoke the single kernel and skip some data
> flow. For this, you need a kind of pass that can work over things that
> begin in two modules.
>
Right. Splitting, fusing, moving code, etc. all require you to modify
both modules at the same time. Even if you only modify one module, you
want information from both, either direction.
> It seems that a less invasive change would be:
>
> - Use ThinLTO metadata for the first case, extend it as required.
> - Add a new kind of ModuleSetPass that takes a set of Modules and is
> allowed to modify both.
>
> This avoids any modifications for the common (single-target) case, but
> should give you the required functionality. Am I missing something?
>
This is similar to what Renato suggested early on. In addition to the
"ThinLTO metadata" inefficiencies outlined above, the problem I have
with the second part is that it requires to write completely new passes
in a different style than anything we have. It is certainly a
possibility but we can probably do it without any changes to the
infrastructure.
In addition to the analysis/optimization infrastructure reasons I would
like to point out that this would make our toolchains a lot easier. We
have some embedding of device code in host code right now (on every
level) and things like LTO for all offloading models would become much
easier if we distribute the heterogeneous modules instead of yet another
embedding. I might be biased by the way "clang offload bundler" is used
right now for OpenMP, HIP, etc. but I would very much like to replace
that with a "clean" toolchain that performs as much LTO as possible, at
least for the accelerator code.
I hope this makes some sense, feel free to ask questions :)
~ Johannes
> David
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list