[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Thu Jul 30 09:44:45 PDT 2020

On 7/30/20 11:11 AM, Renato Golin wrote:
 > On Thu, 30 Jul 2020 at 16:58, Johannes Doerfert
 > <johannesdoerfert at gmail.com> wrote:
 >> I mean, you can put the command line string that set the options into
 >> the first place, right? That is as long as it initially was, or maybe I
 >> am missing something.
 >
 > Options change with time, and this would make the IR incompatible
 > across releases without intentionally doing so.

You could arguably be forgiving when it comes to the parsing of these so
you might loose some if you mix IR across releases but right now you
cannot express this at all. I mean, IR looks as if it captures the
entire state but not quite. As a use case, the question how to reproduce
`clang -O3` with opt comes up every month or so on the list. Let's table
this for now as it seems unrelated to this proposal.

 >> To recap things that might "differ" from the original proposal:
 >>    - We          want multiple target triples.
 >>    - We probably want multiple data layouts.
 >>    - We probably want multiple pass pipelines, with different (cmd
 >>      line) options and such.
 >>    - We might want to make modules self contained wrt. target options
 >>      such that you can create TTI and friends w/o repeating driver
 >>      options.
 >
 > The extent of the separation is what made me suggest that it might be
 > easier, in the end, to carry multiple modules, from different
 > front-ends, through multiple pipelines but interacting with each
 > other.
 >
 > I guess this is why David made a parallel with LTO, as this ends up as
 > being a multi-device LTO in a sense. I think that will be easier and
 > much less intrusive than rewriting the global context, target flags,
 > IR annotation, data layout assumptions, target triple parsing, target
 > options bundling, etc.

It is definitively multi-device (link time) optimization. The link
time part is somewhat optional and might be misleading given the
popularity of single source programming models for accelerators. The
"thinLTO" idea would also not be sufficient for everything we hope to
do, the two module approach would be though.

What if we don't rewrite these things but still merge the modules?
Let me explain ;)

(I use `opt` invocations below as a placeholder for the lack of a better
  term but knowing it is not (only) the `opt` tool we talk about.)

The problem is that the `opt` invocation is primed for a single target,
everything (=pipeline, TTI, flags, ...) exists only once, right?
I imagine the two module approach to run two `opt` invocations, one for
each module, which we would synchronize at some point to do cross-module
optimizations. Given that we can run two `opt` invocations and we assume
a pass can work with two modules, that is two sets of everything, why do
we need the separation? From a tooling perspective I think it makes
things easier to have a single module. That said, it should not preclude
us to run two separate `opt` invocations on it. So we don't rewrite
everything but instead "just" need to duplicate all the information in
the IR such that each `opt` invocation can extract it's respective set
of values and run on the respective set of global symbols. This would
reduce the new stuff to more or less what we started with: device triple
& DL, and a way to link global symbol to a device triple & DL. It is the
two module approach but with "co-located" modules ;)

WDYT?

~ Johannes

P.S. This is really helpful but I won't give up so easily on the idea.
      If I do, I have to implement cross module optimizations and I would
      rather not ;)