[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Tue Jul 28 13:03:42 PDT 2020

On Tue, 28 Jul 2020 at 20:44, Johannes Doerfert
<johannesdoerfert at gmail.com> wrote:
> What I (tried to) describe is that you can pass an array of structs via
> a CUDA memcpy (or similar) to the device and then expect it to be
> accessible as an array of structs on the other side. I can imagine this
> property doesn't hold for *every* programming model, but the question is
> if we need to support the ones that don't have it. FWIW, I don't know if
> it is worth to build up a system that can allow this property to be
> missing or if it is better to not allow such systems to opt-in to the
> heterogeneous module merging. I guess we would need to list the
> programming models for which you cannot reasonably expect the above to
> work.

Right, this is the can of worms I think we won't see before it hits
us. My main concern is that allowing for opaque interfaces to be
defined means we'll be able to do almost anything around such simple
constraints, and the code won't be heavily tested around it (because
it's really hard to test those constraints).

For example, one constraint is: functions that cross the DL barrier
(ie. call functions in other DL) must marshall the arguments in a way
that the size in bytes is exactly what the function expects, given its
DL.

This is somewhat easy to verify, but it's not enough to guarantee that
the alignment of internal elements, structure layout, padding, etc
make sense in the target. Unless we write code that pack/unpack, we
cannot guarantee it is what we expect. And writing unpack code in GPU
may not even be meaningful. And it can change from one GPU family to
another, or one API to another.

Makes sense?

> I think that a multi-DL + multi-triple design seems like a good
> candidate.

I agree. Multiple-DL is something that comes and goes in the community
and so far the "consensus" has been that data layout is hard enough as
it is. I've always been keen on having it, but not keen on making it
happen (and fixing all the bugs that will come with it). :D

Another problem we haven't even considered is where the triple will
come from and in which form. As you know, triples don't usually mean
anything without further context, and that context isn't present in
the triple or the DL. They're lowered from the front-end in snippets
of code (pack/unpack, shift/mask, pad/store/pass pointer) or thunks
(EH, default class methods).

Once it's lowered, fine, DL should be mostly fine because everything
will be lowered anyway. But how will the user identify code from
multiple different front-ends on the same IR module? If we restrict
ourselves with a single front-end, then we'll need one front-end to
rule them all, and that would be counter productive (and fairly
limited scope for such a large change).

I fear the infrastructure issues around getting the code inside the
module will be more complicated (potentially intractable) than once we
have a multi-DL module to deal with...

> I am in doubt about the "simpler" part but it's an option.

That's an understatement. :)

But I think it's important to understand why, only if to make
multiple-DL modules more appealing.

> The one disadvantage I see is that we have to change the way passes work in this
> setting versus the single module setting.

Passes will already have to change, as they can't look on every
function or every call, if they're done to a different DL. Probably a
simpler change, though.

cheers,
--renato