[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Thu Jul 30 14:17:16 PDT 2020

On 7/30/20 1:02 PM, Johannes Doerfert wrote:
 >
 > On 7/30/20 12:41 PM, Renato Golin wrote:
 > > On Thu, 30 Jul 2020 at 17:46, Johannes Doerfert
 > > <johannesdoerfert at gmail.com> wrote:
 > >> So we don't rewrite
 > >> everything but instead "just" need to duplicate all the information in
 > >> the IR such that each `opt` invocation can extract it's respective set
 > >> of values and run on the respective set of global symbols. This would
 > >> reduce the new stuff to more or less what we started with: device 
triple
 > >> & DL, and a way to link global symbol to a device triple & DL. It 
is the
 > >> two module approach but with "co-located" modules ;)
 > >
 > > I think you're being overly optimistic in hoping the "triple+DL"
 > > representation will be enough to emulate multi-target.
 >
 > I might be optimistic about the impact such a representation has but I
 > don't see how it should not be enough. I argue we keep *all* the
 > information that we currently have in two modules but put it in a single
 > one. What else is there (in the IR)? With two `opt` invocations you
 > don't miss out on flags and target information either. At the end of the
 > day I am suggesting to have a single `llvm::Module` with the same
 > information that was in multiple ones before. That will require us to
 > allow duplicates for all "global" entities (triple, DL, module metadata,
 > ...) and to tie global symbols to such entities.
 >
 > Summarized, this approach would keep the modules logically separate,
 > e.g., we would have different namespaces for globals, the pass pipelines
 > actually separate, and only co-locate the representation to simplify
 > tooling in various places. For a single-device invocation of `opt`, the
 > module should not "behave" any different than the one that you get if
 > you extract the code for that device first (or not merge it in the first
 > place).
 >

I thought about this a bit more and I came to the conclusion that this
discussion needs some way of verifying certain assumptions (of mine), at
least to some degree. One of the key questions is: Can we merge two
modules, run the optimizations on both split them, and get the same
output as if we would have done it on two modules in isolation.

So is,
   llvm-link m1.bc m2.bc -o inp.bc
   opt -device=1 -O3 inp.bc -o mid.bc
   opt -device=2 -O3 mid.bc -o out.bc
   llvm-unlink -o m1.bc -o m2.bc -i out.bc
equivalent to
   opt -O3 m1.bc -o m1.bc
   opt -O3 m2.bc -o m2.bc
or not, assuming m1 and m2 have different targets and/or data layouts.

I'd say this is something worth trying out ;)

My plan is to extend the patch I have so we can link and unlink two
modules, and allow opt to run on a selected device in a heterogeneous
module. With that setup we should be able to verify basic assumptions
about the feasibility and maybe shed some light on problems that we (or
I) haven't foreseen yet.

If anyone has ideas or comments on this, or even wants to help, please
let me know :)

~ Johannes

 > > It may work for the cases you care about but it will create a host of
 > > corner cases that the community will have to maintain with no tangible
 > > additional benefit to a large portion of it.
 >
 > I would like to think that a large portion of the community actually
 > benefits; at least everyone that cares about accelerators, which is a
 > growing fraction I would assume. The reason I started this thread is to
 > discuss use and corner cases, I think it is working so far. If you feel
 > there are (known) corner cases that I somehow ignore, please let me
 > know. Similarly, I don't want to ignore any use case that is brought
 > forward. However, I am aware that the interesting corner cases might yet
 > be unknown and will only reveal themselves as we go.
 >
 >
 > > But I'd like to heat the opinion of others on the subject before
 > > making up my own mind...
 >
 > I think more input would be great :)
 >
 > ~ Johannes
 >
 >