[cfe-dev] [RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver

Tue Mar 15 00:25:18 PDT 2016

Afaik - The way that each vendor *embeds* is a little different and
non-conflicting. (AMD vs NV vs Intel) Embedding for each vendor in a
single object may work. Wouldn't this achieve the desired goal without
the bundling? Bundling is a non-solution - it's just delays dealing
with the offload target code, at which point you have to extract and
jump around to make sure it all magically works (fragile - This side
up)

With embedding approach we have all the pieces to at least read/embed
- it's just dealing with elf/macho/pxe objects and data sections.

Also there's the question of how to deal with shared libraries - Even
if initially it's not a concern, eventually someone will request it.
If this isn't ever going to be supported or a concern it's not valid
of course..

On Tue, Mar 15, 2016 at 2:54 PM, Justin Lebar via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>> I agree with Justin when he says that supporting all possible combinations of host-device bundling, each with a custom format adds a lot of complexity and does not necessarily helps the user.
>
> Maybe it would be helpful to speak concretely about this.  What is a
> specific case you're concerned about?
>
> Frankly if we try to build a thing that does everything -- both embeds
> and bundles -- I'm concerned that will be much more complex than
> either approach by itself.  But maybe there's something about bundling
> code for many different devices that's qualitatively different than
> bundling code for one or just a few devices?  For example, are there
> extant formats that we would like to be compatible with that are
> incompatible with each other?  Again, I think it would be helpful to
> be concrete.
>
>> Using a header compatible with tar is not something hard to do, having the bundler tool as part of clang ensures that it works across operating systems too.
>
> This is a minor point, but -- without getting myself back into the
> minefield of discussing whether using the tar format is or is not good
> -- if we do use said format, given what I understand at the moment,
> I'm not in favor of building our own user-facing tool, if that's
> possible.  If you're on Windows and want to inspect a tar file you
> can, download 7z.  It's no different than downloading objdump.  But we
> can postpone this discussion until later; I suspect it won't be
> necessary, given the pushback to using tar in the first place.
>
>> this leaves room to have custom-format bundling activated with options for the cases (would have to be evaluated one by one) that would greatly benefit of interoperability.
>
> This is, as I read it, exactly what everyone else has been arguing so
> strenuously against (and what I conceded to).
>
> On Mon, Mar 14, 2016 at 7:25 PM, Samuel F Antao <sfantao at us.ibm.com> wrote:
>> Hi all,
>>
>> I decided a take shot on a possible implementation for the part of this
>> proposal that I think is more consensual (i.e. the part that does not relate
>> with the bundling). I posted three patches (http://reviews.llvm.org/D18170,
>> http://reviews.llvm.org/D18171, http://reviews.llvm.org/D18172) with a
>> possible implementation, so that we have something more concrete to discuss.
>> Let me know your thoughts.
>>
>> Going back to the bundling discussion:
>>
>> I agree with Justin when he says that supporting all possible combinations
>> of host-device bundling, each with a custom format adds a lot of complexity
>> and does not necessarily helps the user. Therefore, I think reasonable to
>> have intermediate files bundled in some already existent format (say tar)
>> that is agnostic of the programming model. Actually, that was my motivation
>> when I proposed the custom format in the bundler.
>>
>> When I look at all the different concerns I think that a possible solution
>> is to have a bundler with three operation modes:
>>  i) "Embed": it generates a host object that contains the device image and
>> properly defines the symbols a programming model requires. Therefore it can
>> be linked with host objects successfully. This file is never exposed to the
>> user unless save-temps is used.
>> ii) "Bundle": Combines host and device object using, by default, a format
>> easy to interact with that is agnostic of the programming model.
>> iii) "Unbundle": The inverse of ii), it assumes the input uses that default
>> format.
>>
>> Using a header compatible with tar is not something hard to do, having the
>> bundler tool as part of clang ensures that it works across operating systems
>> too. At the same time, this leaves room to have custom-format bundling
>> activated with options for the cases (would have to be evaluated one by one)
>> that would greatly benefit of interoperability.
>>
>> Does this sound reasonable?
>>
>> Thanks!
>> Samuel
>>
>> 2016-03-10 13:59 GMT-05:00 Justin Lebar via cfe-dev
>> <cfe-dev at lists.llvm.org>:
>>>
>>> > Justin, is this convincing enough?
>>>
>>> Okay, okay.  Uncle.
>>>
>>> There are two things here that I find convincing.
>>>
>>> 1) Although we're not going to be compatible with the details of some,
>>> if not all, other compilers' formats, we can at least be compatible
>>> with the spirit by using object files as opposed to tar.
>>>
>>> 2) The postscript in Andrey's e-mail:
>>>
>>> > Re: objdump doesn't understand ELF format with code for multiple
>>> > targets. The same is true for fat executable files as well, isn't? So if we
>>> > need to teach objdump how to recognize fat files, we already have this
>>> > problem.
>>>
>>> It's probably much more important that objdump work on executables
>>> than on object files, since if you have object files, you can probably
>>> recompile with -save-temps, but if you only have an executable, you
>>> don't necessarily have access to intermediate files, or even a
>>> compiler for the relevant architecture, much less the specific
>>> compiler which generated the executable.
>>>
>>> Shoving device code into the host *executable* seems unavoidable.  I'm
>>> still not thrilled with doing the same for object files -- it still
>>> feels like we're using ELF when we actually want an archive format --
>>> but (1) makes it less bad.
>>>
>>> -Justin
>>>
>>> On Wed, Mar 9, 2016 at 5:59 AM, Andrey Bokhanko
>>> <andreybokhanko at gmail.com> wrote:
>>> > All,
>>> >
>>> > I asked Intel GCC guys who implemented OpenMP offloading support in GCC,
>>> > and
>>> > as they told me, GCC also employs option #4 from Hal's list -- it puts
>>> > both
>>> > host and target code in a single ELF file. "Code" in GCC case is always
>>> > GCC's IR (Gimple), though -- they require GCC invocation from linker in
>>> > order to produce a multi-target executable. This makes GCC
>>> > non-interoperable
>>> > with any other offloading compiler and effectively produces its own
>>> > standard.
>>> >
>>> > Thus, prior art from:
>>> > * nvcc
>>> > * Pathscale
>>> > * GCC
>>> > * ICC
>>> >
>>> > indicates only one direction -- compiler driver produces a single object
>>> > file with target code embedded in data section.
>>> >
>>> > Justin, is this convincing enough? I don't see any good reasons why
>>> > clang
>>> > should go against what every other compiler on the planet does.
>>> >
>>> > Re: objdump doesn't understand ELF format with code for multiple
>>> > targets.
>>> > The same is true for fat executable files as well, isn't? So if we need
>>> > to
>>> > teach objdump how to recognize fat files, we already have this problem.
>>> >
>>> > Yours,
>>> > Andrey
>>> > =====
>>> > Software Engineer
>>> > Intel Compiler Team
>>> >
>>> >
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev