[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries
Jason Henline via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 28 14:47:39 PDT 2016
Hi Sergos,
Am I got it right, that SE interfaces are bound to the stream that is
passed as argument? As I can see the stream is an abstraction of the target
- hence data transfers for particular stream is limited to this stream?
As for libomptarget implementation the data once offloaded can be reused in
all offload entries, without additional data transfer. Is it possible in SE
approach?
If I understand your interpretation of streams, it does not match my
understanding. SE follows the CUDA meaning of "stream". I think of a stream
as a "work queue" and each device can have several active streams. Memory
space on the device does not belong to any stream, so any stream can access
it. The thing that does belong to the stream is the "task" of copying the
data from one place to another (or other tasks such as running a kernel).
Regarding the kernels storing in memory or on file: the design was
originally to provide offload entries within the same object file as host
code. It is intended to ease adoption of the heterogeneous approach: there
should be no changes to build scripts. The resultant executable/library
obtained from the build should be self-contained and user will have no
extra problems with target objects/files availability at rutnime.
Yes, I think the in-memory model is much nicer, but requires compiler
support. SE has modes with and without compiler support and so it can
handle storing kernels in files as well as in memory. You are right that
using files requires users to change build files; that's part of the reason
we want clang to be able to emit SE calls. That way the kernel can be
stored in memory and the user won't have to think much about it.
-Jason
On Mon, Mar 28, 2016 at 2:31 PM Sergey Ostanevich <sergos.gnu at gmail.com>
wrote:
> Jason,
>
> Am I got it right, that SE interfaces are bound to the stream that is
> passed as argument? As I can see the stream is an abstraction of the target
> - hence data transfers for particular stream is limited to this stream?
> As for libomptarget implementation the data once offloaded can be reused
> in all offload entries, without additional data transfer. Is it possible in
> SE approach?
>
> Regarding the kernels storing in memory or on file: the design was
> originally to provide offload entries within the same object file as host
> code. It is intended to ease adoption of the heterogeneous approach: there
> should be no changes to build scripts. The resultant executable/library
> obtained from the build should be self-contained and user will have no
> extra problems with target objects/files availability at rutnime.
>
> Sergos.
>
>
> On Mon, Mar 28, 2016 at 9:47 PM, Jason Henline via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Alexandre,
>>
>> Thanks for further shedding some light on the way OpenMP handles
>> dependencies between tasks. I'm sorry for leaving that out of my document,
>> it was just because I didn't know much about the way OpenMP handled its
>> workflows.
>>
>> On Mon, Mar 28, 2016 at 11:43 AM Jason Henline <jhen at google.com> wrote:
>>
>>> Hi Carlo,
>>>
>>> Thanks for helping to clarify this point about libomptarget vs
>>> liboffload, I have been getting confused about it myself. I think the open
>>> question concerns libomptarget not liboffload (others can correct me if I
>>> have misunderstood). My analysis from looking through the code was that
>>> libomptarget had some similarities with the platform support in SE, so I
>>> just wanted to consider how those two libraries compared. I didn't do a
>>> comparison with liboffload.
>>>
>>> On Mon, Mar 28, 2016 at 11:11 AM Carlo Bertolli <cbertol at us.ibm.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Reading through the comments: both Chris and Chandler referenced to
>>>> liboffload, while I thought the subject of conversation was libomptarget
>>>> and SE.
>>>> I am being picky about names because liboffload is a library available
>>>> as part of omp (llvm's openmp runtime library) that, I believe, only
>>>> targets Intel Xeon Phi.
>>>>
>>>> Did you mean liboffload or libomptarget?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> -- Carlo
>>>>
>>>> [image: Inactive hide details for Alexandre Eichenberger via Openmp-dev
>>>> ---03/28/2016 01:44:12 PM---Jason,]Alexandre Eichenberger via
>>>> Openmp-dev ---03/28/2016 01:44:12 PM---Jason,
>>>>
>>>> From: Alexandre Eichenberger via Openmp-dev <openmp-dev at lists.llvm.org>
>>>> To: jhen at google.com
>>>> Cc: llvm-dev at lists.llvm.org, cfe-dev at lists.llvm.org,
>>>> openmp-dev at lists.llvm.org
>>>> Date: 03/28/2016 01:44 PM
>>>>
>>>>
>>>> Subject: Re: [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject
>>>> for parallelism runtime and support libraries
>>>>
>>>> Sent by: "Openmp-dev" <openmp-dev-bounces at lists.llvm.org>
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Jason,
>>>>
>>>> I concur with your decision since OMP and StreamExecutor fundamentally
>>>> differ in how dependences between consecutive tasks are expressed. OMP uses
>>>> task dependences to express constraint ordering between tasks that execute
>>>> on the host and/or on a particular device. Obviously, a stream is a DAG but
>>>> with very specific constraints (one linear ordering per stream), whereas
>>>> DAG generated by OMP dependences are arbitrary DAGs. This is not a jugement
>>>> statement, as in many ways stream are much more friendly to GPUs, it is
>>>> just a decision that the OMP and StreamExecutor "language experts" settled
>>>> on a different language expressivity/efficiency data point.
>>>>
>>>> I read your blog on the similarities and differences with great
>>>> interest. I may venture to add another overlooked difference: OMP maps
>>>> objects with references counts (e.g. first time an object is mapped, its
>>>> ref count is zero, and the alloc on device and memory copy will occur;
>>>> further nested map will not generate any alloc and/or communication). In
>>>> summary, OMP primarily uses a dictionary of mapped variables to manage
>>>> allocation and data transfer, whereas StreamExecutor it appears to
>>>> explicitly allocate and move data.
>>>>
>>>> Thanks for your work on this, much appreciated
>>>>
>>>> Alexandre
>>>>
>>>>
>>>> -----------------------------------------------------------------------------------------------------
>>>> Alexandre Eichenberger, Master Inventor, Advanced Compiler Technologies
>>>> - research: compiler optimization (OpenMP, multithreading, SIMD)
>>>> - info: alexe at us.ibm.com http://www.research.ibm.com/people/a/alexe
>>>> - phone: 914-945-1812 (work) 914-312-3618 (cell)
>>>>
>>>>
>>>> ----- Original message -----
>>>> From: Jason Henline via Openmp-dev <openmp-dev at lists.llvm.org>
>>>> Sent by: "Openmp-dev" <openmp-dev-bounces at lists.llvm.org>
>>>> To: Andrey Bokhanko <andreybokhanko at gmail.com>, Chandler Carruth <
>>>> chandlerc at google.com>
>>>> Cc: llvm-dev <llvm-dev at lists.llvm.org>, cfe-dev <cfe-dev at lists.llvm.org>,
>>>> "openmp-dev at lists.llvm.org" <openmp-dev at lists.llvm.org>
>>>> Subject: Re: [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject
>>>> for parallelism runtime and support libraries
>>>> Date: Mon, Mar 28, 2016 12:38 PM
>>>>
>>>> I did a more thorough read through liboffload and wrote up a more
>>>> detailed doc describing how StreamExecutor platforms relate to libomptarget
>>>> RTL interfaces. The doc also describes why the lack of support for streams
>>>> in libomptarget makes it impossible to implement some of the most important
>>>> StreamExecutor platforms in terms of libomptarget (
>>>> *https://github.com/henline/streamexecutordoc/blob/master/se_and_openmp.rst*
>>>> <https://github.com/henline/streamexecutordoc/blob/master/se_and_openmp.rst>).
>>>> When I was originally optimistic about using liboffload to implement
>>>> StreamExecutor platforms, I was not aware of this issue with streams.
>>>> Thanks to Carlo Bertolli for bringing this to my attention.
>>>>
>>>> After having looked in detail at the liboffload code, it sounds like
>>>> the best thing to do at this point is to keep StreamExecutor and liboffload
>>>> separate, but to leave the door open to implement future StreamExecutor
>>>> platforms in terms of liboffload. From the recent messages on this subject
>>>> from Carlo and Andrey it seems like there is a general consensus on this,
>>>> so I would like to move forward with the StreamExecutor project in this
>>>> spirit.
>>>>
>>>> On Tue, Mar 15, 2016 at 5:09 PM Jason Henline <*jhen at google.com*
>>>> <jhen at google.com>> wrote:
>>>>
>>>> I created a GitHub repo that contains the documentation I have been
>>>> creating for StreamExecutor.
>>>> *https://github.com/henline/streamexecutordoc*
>>>> <https://github.com/henline/streamexecutordoc>
>>>>
>>>> It contains the design docs from the original email in this thread,
>>>> and it contains a new doc I just made that gives a more detailed sketch of
>>>> the StreamExecutor platform plugin interface. This shows which methods must
>>>> be implemented to support a new platform in StreamExecutor, or to provide a
>>>> new implementation for an existing platform (e.g. using liboffload to
>>>> implement the CUDA platform).
>>>>
>>>> I wrote up this doc in response to a lot of good questions I am
>>>> getting about the details of how StreamExecutor might work with the code
>>>> OpenMP already has in place.
>>>>
>>>> Best Regards,
>>>> -Jason
>>>>
>>>> On Tue, Mar 15, 2016 at 12:28 PM Andrey Bokhanko <
>>>> *andreybokhanko at gmail.com* <andreybokhanko at gmail.com>> wrote:
>>>> Hola Chandler,
>>>>
>>>> On Tue, Mar 15, 2016 at 1:44 PM, Chandler Carruth via Openmp-dev <
>>>> *openmp-dev at lists.llvm.org* <openmp-dev at lists.llvm.org>> wrote:
>>>> It seems like if the OpenMP folks want to add a liboffload
>>>> plugin to StreamExecutor, that would be an awesome additional platform, but
>>>> I don't see why we need to force the coupling here.
>>>>
>>>> Let me give you a reason: while user-facing sides of StreamExecutor
>>>> and OpenMP are quite different (and each warrants its place under the
>>>> sun!), internal SE's offloading interface and liboffload are doing exactly
>>>> the same thing. Why we want to duplicate code? As previous replies
>>>> demonstrated, SE can't serve OpenMP's needs, while liboffload API seems to
>>>> be general enough to serve SE well (though this has to be verified, of
>>>> course -- as I understand, Jason is going to do this).
>>>>
>>>> Sure, there is no "must have need" to couple SE and liboffload, but
>>>> this sounds like a solid software engineering decision to me. Or, quoting
>>>> Jason, who said this much better than me:
>>>>
>>>> > Although OpenMP and StreamExecutor support different programming
>>>> models,
>>>> > some of the work they perform under the hood will likely be very
>>>> similar.
>>>> > By sharing code and domain expertise, both projects will be
>>>> improved and
>>>> > strengthened as their capabilities are expanded. The
>>>> StreamExecutor
>>>> > community looks forward to much collaboration and discussion with
>>>> OpenMP
>>>> > about the best places and ways to cooperate.
>>>>
>>>> Espere veure't demà !
>>>>
>>>> Yours,
>>>> Andrey
>>>> =====
>>>> Enginyer de Software
>>>> Intel Compiler Team
>>>>
>>>> _______________________________________________
>>>> Openmp-dev mailing list
>>>> Openmp-dev at lists.llvm.org
>>>> *http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev*
>>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>>>>
>>>> _______________________________________________
>>>> Openmp-dev mailing list
>>>> Openmp-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>
>>>>
>>>>
>>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160328/e802e0e9/attachment-0001.html>
More information about the llvm-dev
mailing list