[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver

Mon Mar 21 16:43:02 PDT 2016

Hi all,

Thanks everyone for engaging in this thread. After a nice long discussion,
it’s probably time to capture all the suggestions and produce a summary.

===========================

Things that I understand are settled:

===========================

   - The driver implementation has to flexible so as to allow a device
   toolchain to use macros and headers from the host toolchain and vice-versa.
   - The driver implementation should be modular and easy to extend to new
   programing models, each with its own Action dependence graph (I am
   proposing an implementation for this in the patches I sent in the meantime).
   - The driver Action should reflect the programming models in use (the
   patches I sent in the meantime observe this constraint).
   - File names used by save-temps need to reflect the programming model
   and contain “host” or “device” because in general the same toolchain can
   can be used for host and device.
   - It is important to support separate compilation to prevent users from
   having to change complex build systems.
   - When we have code for different devices or different programming
   models, it is not expected that the tools for a single programming model or
   device would work well on the resulting binary/file. That doesn’t mean that
   we should not pursue interoperability for the cases where it can be
   achieved.
   - We probably want to adopt a set of unified group options to specify
   offloading devices, and programming-model-specific options, but that can be
   discussed separately.

==============================

Not settled - intermediate file bundling:

==============================

Bundling is what has generated more discussion and more conflicting
opinions. Just to make sure we are all in the same page, for:

a.c -> "clang -c" -> X -> “clang"  -> Y

the issue we need to settle is “what is the format of X?”. I don’t think
there is much to discuss about the format of Y — it has to be whatever the
programming model implementation needs, usually a bunch of device code in
some section whose pointer is passed to some runtime library to be
registered and loaded.

Let me also clarify the terminology I am using (embed vs. bundle) because
reading through the previous emails, it may have been misleading:

In Y, the device code is functionally related with the host code because
the former is referred to by the latter in the form of some pointer
(address of a section) that is registered with the host runtime library. So
the process that generates Y is what I am calling “embed”.

In X, the device code does not require to be functionally related with the
host. Both device and host code should be however organized in a way
(binary sections or some other sort of metadata) that makes the extraction
of code for specific devices easy to accomplish. That process that
generates X is what I am calling “bundle”.

So, "what is the *bundle* format of X?”. There are two proposals: “use the
host object format - ELF” or  “use a target agnostic format”. As far as I
know, both would work.

The former seems to be the one that has more supporters in this thread.
I’ll try to address the Pros and Cons of each one with some implementation
considerations.  For “host object format” I am mostly referring to ELF,
because it is the format I have more experience with, and the only one I
can commit to contribute an implementation.

============

Pros/Cons

============

ELF Pros:

   - It is what other compilers are using (except for gcc that uses IR).
   - A host-device object could be linked by other compilers that may not
   be aware of how device code is bundled (what sections are used - what
   symbols should be defined). Therefore, the resulting binary may not be
   fully functional (the load of CUDA module would fail, the host version of
   the kernels would be used in OpenMP), but it can be executed.
   - It is a format that is already handled by the compiler, so we wouldn’t
   be inventing a new format.

ELF Cons:

   - Works only for object formats (I don’t think anyone was proposing to
   use ELF to bundle text/IR formats).
   - Extraction of device code is not straightforward. A user interested in
   doing that would have to learn what an ELF file is and what sections are
   being used.
   - The bundling implementation has to be aware of the specifics in the
   programming model (however, adopting a generic ELF bundle format would
   alleviate this).
   - Custom code to deal with binary formats other than ELF is required.

---

Target/format agnostic Pros:

   - Would work for any OS and target.
   - Format can be made compatible with some widely used bundling tool
   (e.g. tar).
   - Format can be chosen so that the device components can be easily
   extracted by some widely available tool.
   - Adding support for a new device wouldn’t require revisiting the
   bundling implementation.

Target/format agnostic Cons:

   - Have to invent a new format...
   - … or use an existent one that may not be portable across OS (having a
   tool in clang that deals with that format would alleviate this.)
   - Cannot link the bundled file with other host code as is.

============

Implementation

============

In order to be able to obtain Y from X, the device components of X have to
extracted, forwarded to the device linker, and the result is then embedded
in the host binary. The embedding can be implemented with a linker script
if using ld as the linker. But the extraction from X has to be enabled by a
tool — I don’t see a better way to do it.

ELF: The tool has to know how to navigate ELF files. Possible solutions:

a) Use something based on libelf - would require changes in the build
system to check for that dependency. Not sure if there is some licensing
issue here as well. Possibly not a viable solution.

b) Attempt to use the Objects library in LLVM to scan the elf files in a
new tool.

c) Add a new mode to llvm-readobj to extract binary sections from the
object file. We wouldn’t require an extra tool, llvm-readobj would be *the*
tool.

In any case the ELF bundling would have to use a convention for the name of
the sections something like
“.clang.offload.<programming-model>.<device-triple>.<architecture>”.

---

Target/format agnostic: A new tool would have to be implemented for this.
We would have to decide which format — that would be the hard part. Then,
regardless of the format, this is all about having a header implemented
with the bytes in the right place, similarly to what I proposed in
http://reviews.llvm.org/D13909. The header could be compatible with a
widely used tool to enable easy extraction of device components.

==========================

Considerations about embedding

==========================

As I mentioned before, embedding can be controlled with a linker script if
using ld. However:

   - linker scripts directives are not fully supported by gold.
   - I don’t know if a linker script can be easily used for link.exe or
   other linkers people may be interested about.

If, because of these constraints, we end up having a tool for embedding
(create host linkable objects with the required symbols device code in the
right section), that tool can be the same as the bundling tool under a
different mode. The complexity of an embedding tool is low: generate a
piece of IR that defines the right symbols and respective initializers and
call the right backend to generate an object.

==========================

Hope I captured everything, if not let me know.

My opinion:

i) I find it very convenient to have the ability to edit intermediate files
and be able to generate a binary from there. Editing a piece of IR or Asm,
is something that I and other people I work with use to explore performance
improvements and fix bugs. So I don’t see this as minor feature.

ii) Having something that works in all cases and then specialize to some
format seems to be a natural way to build things.

iii) If no one else cares about i) and host-obj bundling is the way to go,
I can work on the implementation for ELF (I think the solution c) above is
the one with more merits) and let someone else contribute support for other
object formats. If so, is the section naming I proposed above a good one?

Let me know your thoughts so that we can settle this last pending issue.

Thanks!

Samuel

2016-03-15 19:45 GMT-04:00 Justin Lebar via cfe-dev <cfe-dev at lists.llvm.org>
:

> > I don't have any other tool that would be compatible with a bundle of
> all four object files, doesn't matter how I do it,
>
> I think we've given up on letting existing tools (e.g. objdump) work
> well on our host+device*n object files, see discussions earlier in the
> thread.  Even I, probably the loudest supporter of this on thread,
> think that it's probably not an important case that cuobjdump work
> sensibly when you are embedding two different nvptx bundles into one
> object file, since this is an edge case that most people aren't going
> to encounter.
>
> > Creating linkable host objects shouldn't be that hard (clang-bundler
> -embed).
>
> This seems to be basically the same as embedding the device code into
> the object file -- just a different format.
>
> > If the bundler uses a target agnostic format, my example above would
> just work regardless of the compiler phases -- -S, -emit-ast, -emit-llvm,
> -emit-ast would work just fine.
>
> What I believe we do with CUDA compilation is say that all of these
> output code just for the host, unless you specify a device
> architecture, in which case you can only specify one.  It seems to
> work OK.  I'll grant the "wouldn't it be nice if" we could output a
> single file containing .s files for all archs, but I'm not convinced
> goes beyond wouldn't-it-be-nice.
>
> > - Extending the bundler to enable compatibility modes can then be
> discussed on a per-case basis if there are a lot of users asking for that
> feature.
>
> There are, on this very thread.  :)  I don't think in any case that
> these compatibility modes can be a second-thought -- they'll need to
> be first-class citizens in the design.
>
> The whole first half of this mail seems quite complicated to me,
> frankly.  Even if it may be convenient for complex compilation modes
> like the openmp + cuda one you described, it may not be convenient for
> simpler modes, which I expect will be far more common.
>
> -Justin
>
> On Tue, Mar 15, 2016 at 2:37 PM, Samuel F Antao <sfantao at us.ibm.com>
> wrote:
> > Let me provide a more concrete example of what I had in mind (sorry for
> the
> > long email...):
> >
> > Lets assume I have two source files a.cu and b.cu, both with OpenMP and
> CUDA
> > kernels and several device functions. Device functions defined in a.cu
> are
> > used in b.cu and vice-versa.
> >
> > I have a system that consists of a ppc64le host, an nvptx64 device and a
> > ppc64le accelerator. The exact details of this system are not relevant,
> I am
> > just using this example because it is something supported by the
> prototype
> > we have in github and exemplifies that the same target can be both host
> and
> > device (in general we can have AMD GPUs, DSPs, FPGAs, each with its own
> > binary formats).
> >
> > ###
> > Case A: We pass all the source files and compile in a single run of
> clang.
> >
> > A1) clang invocation (the openmp options are just an example, *.cu
> extension
> > tells the driver we should compile for CUDA as well):
> >   clang -target ppc64le -fopenmp -fomptargets=nvptx64,ppc64le a.cu b.cu
> >
> > A2) driver commands:
> >   // Obtain cuda obj for a.cu
> >   clang -cc1 -fcuda-is-device -target nvptx64 -S a.cu -o
> > a-device-cuda-nvptx64.s
> >   ptxas -c a-device-cuda-nvptx64.s -o a-device-cuda-nvptx64.o
> >
> >   // Obtain openmp nvptx64 obj for a.cu
> >   clang -cc1 -fopenmp-is-device  -target nvptx64 -S a.cu -o
> > a-device-openmp-nvptx64.s
> >   ptxas -c a-device-openmp-nvptx64.s -o a-device-openmp-nvptx64.o
> >
> >   // Obtain openmp ppc64le device obj for a.cu
> >   clang -cc1 -fopenmp-is-device -target ppc64le -c a.cu -o
> > a-device-openmp-ppc64le.o
> >
> >   // Obtain openmp ppc64le host obj for a.cu
> >   clang -cc1 -fopenmp -target ppc64le -c a.cu -o a-host-cuda-openmp.o
> >
> >   // Repeat everything  for b.cu
> >
> >   // Obtain cuda device image:
> >   nvlink a-device-cuda-nvptx64.o b-device-cuda-nvptx64.o -o
> > a.out-device-cuda-nvptx64
> >
> >    // Obtain openmp nvptx device image:
> >   nvlink a-device-openmp-nvptx64.o b-device-openmp-nvptx64.o -o
> > a.out-device-openmp-nvptx64
> >
> >   // Obtain openmp ppc64le device image (making it a shared library is
> just
> > easier to load, but not required):
> >   ld -shared a-device-openmp-ppc64le.o b-device-openmp-ppc64le.o -o
> > a.out-device-openmp-ppc64le
> >
> >   // Create a host compatible object for each device (this is done at
> link
> > time only, the user is not expected to deal with this tool - the name of
> the
> > tool and commands are just an example).
> >   // I am calling this "embed-mode" but is not really embedding, is
> generate
> > an host object that solely contains device code...
> >   clang-bundler -embed -host ppc64le -models=cuda,openmp,openmp
> > -targets=nvptx64,nvptx64,ppc64le
> >
> -inputs=a.out-device-cuda-nvptx64,a.out-device-openmp-nvptx64,a.out-device-openmp-ppc64le
> >
> -outputs=a.out-device-cuda-nvptx64.o,a.out-device-openmp-nvptx64.o,a.out-device-openmp-ppc64le.o
> >
> >   // Generate the host binary
> >   ld a-host-cuda-openmp.o a.out-device-cuda-nvptx64.o
> > a.out-device-openmp-nvptx64.o a.out-device-openmp-ppc64le.o -o a.out
> >
> > ###
> > Case B: We do separate compilation.
> > B1) clang invocation (the openmp options are just an example, *.cu
> extension
> > tells the driver we should compile for CUDA as well):
> >   clang -c -target ppc64le -fopenmp -fomptargets=nvptx64,ppc64le a.cu
> b.cu
> >   clang -target ppc64le -fopenmp -fomptargets=nvptx64,ppc64le a.o b.o
> >
> > B2) driver commands - 1st run:
> >
> >    // Same thing as in A2) up to the nvlink command
> >
> >    // Create the bundles a.o and b.o so that the user sees a single file.
> >    clang-bundler -bundle -models=host,cuda,openmp,openmp
> > -targets=nvptx64,nvptx64,ppc64le
> >
> -inputs=a-host-cuda-openmp.o,a-device-cuda-nvptx64.o,a-device-openmp-nvptx64.o,a-device-openmp-ppc64le.o
> > -outputs=a.o
> >    clang-bundler -bundle -models=host,cuda,openmp,openmp
> > -targets=nvptx64,nvptx64,ppc64le
> >
> -inputs=b-host-cuda-openmp.o,b-device-cuda-nvptx64.o,b-device-openmp-nvptx64.o,b-device-openmp-ppc64le.o
> > -outputs=b.o
> >
> > B2) driver commands - 2nd run:
> >    // Attempt to unbundle the inputs because the user is asking for
> > offloading support and they are not source files. If the bundler
> understands
> > the file is a bundle it creates the individual files, otherwise create
> empty
> > files for the devices.
> >    clang-bundler -unbundle -models=host,cuda,openmp,openmp
> > -targets=nvptx64,nvptx64,ppc64le -outputs
> >
> =a-host-cuda-openmp.o,a-device-cuda-nvptx64.o,a-device-openmp-nvptx64.o,a-device-openmp-ppc64le.o
> > -inputs=a.o
> >    clang-bundler -unbundle -models=host,cuda,openmp,openmp
> > -targets=nvptx64,nvptx64,ppc64le -outputs
> >
> =b-host-cuda-openmp.o,b-device-cuda-nvptx64.o,b-device-openmp-nvptx64.o,b-device-openmp-ppc64le.o
> > -inputs=b.o
> >
> >   // Same thing in A2 starting from the nvlink commands.
> >
> > ###
> > A few more comments/observations:
> >
> > - I don't have any other tool that would be compatible with a bundle of
> all
> > four object files, doesn't matter how I do it, this is just not
> supported by
> > any other compiler and I'm not doing anything wrong as per the
> programming
> > models specification. But, I have tools compatible with each object.
> That's
> > is way I agree we should give the user the ability to extract objects. I
> > agree with Justin that extracting stuff from tar is more convenient that
> > doing it from ELF (I don't think most users will now how to do it from
> ELF,
> > and it is very likely they don't even know what ELF is, for them is just
> a
> > binary, so not exposing these details is desirable). I am not trying to
> push
> > for a specific format, I just think it should be target agnostic.
> >
> > - Creating linkable host objects shouldn't be that hard (clang-bundler
> > -embed). For CUDA I can create an IR file that defines
> > "__cuda_fatbin_wrapper" with the bytes coming from the fat binary (that
> > global would have to be made external though). For OpenMP it would do
> > something similar but for ".omp_offloading.img_start/end.<device>". The
> > resulting IR could then be converted into an object by calling the host
> > backend. The tool itself wouldn't care about obj formats.
> >
> > - If the bundler uses a target agnostic format, my example above would
> just
> > work regardless of the compiler phases -- -S, -emit-ast, -emit-llvm,
> > -emit-ast would work just fine.
> >
> > - Extending the bundler to enable compatibility modes can then be
> discussed
> > on a per-case basis if there are a lot of users asking for that feature.
> If
> > that is to be supported, someone with experience with that format could
> > contribute it and there is nothing in the design that prevents it.
> Expecting
> > to have something that works for all formats obj formats and device
> > combinations from the very beginning seems not reasonable. My plan of
> action
> > was to contribute something that works and can scale to as many use
> cases as
> > possible, I can't however contribute specific code for every possible
> > device, in particular those that are not supported in clang yet.
> >
> > - The user does not have to deal with the bundling tool at any time if we
> > adopt a format that makes it easy to extract objects. I don't know if one
> > should avoid completely having a tool (again the user doesn't have to use
> > it, but it can look it up in the install tree of clang). An alternative
> is
> > supporting something like clang -ccbundle. Maybe other people are aware
> of
> > what are the precedents in clang?
> >
> > About shared libraries: I am not sure what is the concern with shared
> > libraries. They would be created as binaries would, and would contain
> > bundles on them. Nothing that is proposed here precludes that. Supporting
> > device-side shared libraries depends on whether the runtime library
> > implements a loader or not. It is very possible I am missing something
> > here...
> >
> > Thanks!
> > Samuel
> >
> > 2016-03-15 14:36 GMT-04:00 Eric Christopher via cfe-dev
> > <cfe-dev at lists.llvm.org>:
> >>
> >>
> >>
> >> On Tue, Mar 15, 2016 at 12:25 AM C Bergström <cfe-dev at lists.llvm.org>
> >> wrote:
> >>>
> >>> Afaik - The way that each vendor *embeds* is a little different and
> >>> non-conflicting. (AMD vs NV vs Intel) Embedding for each vendor in a
> >>> single object may work. Wouldn't this achieve the desired goal without
> >>> the bundling? Bundling is a non-solution - it's just delays dealing
> >>> with the offload target code, at which point you have to extract and
> >>> jump around to make sure it all magically works (fragile - This side
> >>> up)
> >>>
> >>
> >> I'm somewhat in agreement here. An alternate idea is to just "do what
> >> we've always done" and have compatibility modes where we try to do
> what's
> >> expected for the particular end use that we're trying to copy. That
> would
> >> get us maximum flexibility (at the cost of a number of implementations,
> but
> >> abstraction hopefully ftw?).
> >>
> >>>
> >>> With embedding approach we have all the pieces to at least read/embed
> >>> - it's just dealing with elf/macho/pxe objects and data sections.
> >>>
> >>
> >> Agreed.
> >>
> >>>
> >>> Also there's the question of how to deal with shared libraries - Even
> >>> if initially it's not a concern, eventually someone will request it.
> >>> If this isn't ever going to be supported or a concern it's not valid
> >>> of course..
> >>>
> >>
> >> I'm sure someone is going to want it eventually.
> >>
> >> -eric
> >>
> >>>
> >>> On Tue, Mar 15, 2016 at 2:54 PM, Justin Lebar via cfe-dev
> >>> <cfe-dev at lists.llvm.org> wrote:
> >>> >> I agree with Justin when he says that supporting all possible
> >>> >> combinations of host-device bundling, each with a custom format
> adds a lot
> >>> >> of complexity and does not necessarily helps the user.
> >>> >
> >>> > Maybe it would be helpful to speak concretely about this.  What is a
> >>> > specific case you're concerned about?
> >>> >
> >>> > Frankly if we try to build a thing that does everything -- both
> embeds
> >>> > and bundles -- I'm concerned that will be much more complex than
> >>> > either approach by itself.  But maybe there's something about
> bundling
> >>> > code for many different devices that's qualitatively different than
> >>> > bundling code for one or just a few devices?  For example, are there
> >>> > extant formats that we would like to be compatible with that are
> >>> > incompatible with each other?  Again, I think it would be helpful to
> >>> > be concrete.
> >>> >
> >>> >> Using a header compatible with tar is not something hard to do,
> having
> >>> >> the bundler tool as part of clang ensures that it works across
> operating
> >>> >> systems too.
> >>> >
> >>> > This is a minor point, but -- without getting myself back into the
> >>> > minefield of discussing whether using the tar format is or is not
> good
> >>> > -- if we do use said format, given what I understand at the moment,
> >>> > I'm not in favor of building our own user-facing tool, if that's
> >>> > possible.  If you're on Windows and want to inspect a tar file you
> >>> > can, download 7z.  It's no different than downloading objdump.  But
> we
> >>> > can postpone this discussion until later; I suspect it won't be
> >>> > necessary, given the pushback to using tar in the first place.
> >>> >
> >>> >> this leaves room to have custom-format bundling activated with
> options
> >>> >> for the cases (would have to be evaluated one by one) that would
> greatly
> >>> >> benefit of interoperability.
> >>> >
> >>> > This is, as I read it, exactly what everyone else has been arguing so
> >>> > strenuously against (and what I conceded to).
> >>> >
> >>> > On Mon, Mar 14, 2016 at 7:25 PM, Samuel F Antao <sfantao at us.ibm.com>
> >>> > wrote:
> >>> >> Hi all,
> >>> >>
> >>> >> I decided a take shot on a possible implementation for the part of
> >>> >> this
> >>> >> proposal that I think is more consensual (i.e. the part that does
> not
> >>> >> relate
> >>> >> with the bundling). I posted three patches
> >>> >> (http://reviews.llvm.org/D18170,
> >>> >> http://reviews.llvm.org/D18171, http://reviews.llvm.org/D18172)
> with a
> >>> >> possible implementation, so that we have something more concrete to
> >>> >> discuss.
> >>> >> Let me know your thoughts.
> >>> >>
> >>> >> Going back to the bundling discussion:
> >>> >>
> >>> >> I agree with Justin when he says that supporting all possible
> >>> >> combinations
> >>> >> of host-device bundling, each with a custom format adds a lot of
> >>> >> complexity
> >>> >> and does not necessarily helps the user. Therefore, I think
> reasonable
> >>> >> to
> >>> >> have intermediate files bundled in some already existent format (say
> >>> >> tar)
> >>> >> that is agnostic of the programming model. Actually, that was my
> >>> >> motivation
> >>> >> when I proposed the custom format in the bundler.
> >>> >>
> >>> >> When I look at all the different concerns I think that a possible
> >>> >> solution
> >>> >> is to have a bundler with three operation modes:
> >>> >>  i) "Embed": it generates a host object that contains the device
> image
> >>> >> and
> >>> >> properly defines the symbols a programming model requires. Therefore
> >>> >> it can
> >>> >> be linked with host objects successfully. This file is never exposed
> >>> >> to the
> >>> >> user unless save-temps is used.
> >>> >> ii) "Bundle": Combines host and device object using, by default, a
> >>> >> format
> >>> >> easy to interact with that is agnostic of the programming model.
> >>> >> iii) "Unbundle": The inverse of ii), it assumes the input uses that
> >>> >> default
> >>> >> format.
> >>> >>
> >>> >> Using a header compatible with tar is not something hard to do,
> having
> >>> >> the
> >>> >> bundler tool as part of clang ensures that it works across operating
> >>> >> systems
> >>> >> too. At the same time, this leaves room to have custom-format
> bundling
> >>> >> activated with options for the cases (would have to be evaluated one
> >>> >> by one)
> >>> >> that would greatly benefit of interoperability.
> >>> >>
> >>> >> Does this sound reasonable?
> >>> >>
> >>> >> Thanks!
> >>> >> Samuel
> >>> >>
> >>> >> 2016-03-10 13:59 GMT-05:00 Justin Lebar via cfe-dev
> >>> >> <cfe-dev at lists.llvm.org>:
> >>> >>>
> >>> >>> > Justin, is this convincing enough?
> >>> >>>
> >>> >>> Okay, okay.  Uncle.
> >>> >>>
> >>> >>> There are two things here that I find convincing.
> >>> >>>
> >>> >>> 1) Although we're not going to be compatible with the details of
> >>> >>> some,
> >>> >>> if not all, other compilers' formats, we can at least be compatible
> >>> >>> with the spirit by using object files as opposed to tar.
> >>> >>>
> >>> >>> 2) The postscript in Andrey's e-mail:
> >>> >>>
> >>> >>> > Re: objdump doesn't understand ELF format with code for multiple
> >>> >>> > targets. The same is true for fat executable files as well,
> isn't?
> >>> >>> > So if we
> >>> >>> > need to teach objdump how to recognize fat files, we already have
> >>> >>> > this
> >>> >>> > problem.
> >>> >>>
> >>> >>> It's probably much more important that objdump work on executables
> >>> >>> than on object files, since if you have object files, you can
> >>> >>> probably
> >>> >>> recompile with -save-temps, but if you only have an executable, you
> >>> >>> don't necessarily have access to intermediate files, or even a
> >>> >>> compiler for the relevant architecture, much less the specific
> >>> >>> compiler which generated the executable.
> >>> >>>
> >>> >>> Shoving device code into the host *executable* seems unavoidable.
> >>> >>> I'm
> >>> >>> still not thrilled with doing the same for object files -- it still
> >>> >>> feels like we're using ELF when we actually want an archive format
> --
> >>> >>> but (1) makes it less bad.
> >>> >>>
> >>> >>> -Justin
> >>> >>>
> >>> >>> On Wed, Mar 9, 2016 at 5:59 AM, Andrey Bokhanko
> >>> >>> <andreybokhanko at gmail.com> wrote:
> >>> >>> > All,
> >>> >>> >
> >>> >>> > I asked Intel GCC guys who implemented OpenMP offloading support
> in
> >>> >>> > GCC,
> >>> >>> > and
> >>> >>> > as they told me, GCC also employs option #4 from Hal's list -- it
> >>> >>> > puts
> >>> >>> > both
> >>> >>> > host and target code in a single ELF file. "Code" in GCC case is
> >>> >>> > always
> >>> >>> > GCC's IR (Gimple), though -- they require GCC invocation from
> >>> >>> > linker in
> >>> >>> > order to produce a multi-target executable. This makes GCC
> >>> >>> > non-interoperable
> >>> >>> > with any other offloading compiler and effectively produces its
> own
> >>> >>> > standard.
> >>> >>> >
> >>> >>> > Thus, prior art from:
> >>> >>> > * nvcc
> >>> >>> > * Pathscale
> >>> >>> > * GCC
> >>> >>> > * ICC
> >>> >>> >
> >>> >>> > indicates only one direction -- compiler driver produces a single
> >>> >>> > object
> >>> >>> > file with target code embedded in data section.
> >>> >>> >
> >>> >>> > Justin, is this convincing enough? I don't see any good reasons
> why
> >>> >>> > clang
> >>> >>> > should go against what every other compiler on the planet does.
> >>> >>> >
> >>> >>> > Re: objdump doesn't understand ELF format with code for multiple
> >>> >>> > targets.
> >>> >>> > The same is true for fat executable files as well, isn't? So if
> we
> >>> >>> > need
> >>> >>> > to
> >>> >>> > teach objdump how to recognize fat files, we already have this
> >>> >>> > problem.
> >>> >>> >
> >>> >>> > Yours,
> >>> >>> > Andrey
> >>> >>> > =====
> >>> >>> > Software Engineer
> >>> >>> > Intel Compiler Team
> >>> >>> >
> >>> >>> >
> >>> >>> _______________________________________________
> >>> >>> cfe-dev mailing list
> >>> >>> cfe-dev at lists.llvm.org
> >>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>> >>
> >>> >>
> >>> > _______________________________________________
> >>> > cfe-dev mailing list
> >>> > cfe-dev at lists.llvm.org
> >>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>> _______________________________________________
> >>> cfe-dev mailing list
> >>> cfe-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >>
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160321/2ea7bac9/attachment.html>