<div dir="ltr">Hi Chris,<div><br></div><div>I agree with Andrey when he says this should be a separate discussion. </div><div><br></div><div>I think that aiming at having a library that would support any possible programming model would take a long time, as it requires a lot of consensus namely from who is maintaining programming models already in clang (e.g. CUDA). We should try to have something incremental.</div><div><br></div><div>I'm happy to discuss and know more about the design and code you would like to contribute to this, but I think you should post it in a different thread.</div><div><br></div><div>Thanks,</div><div>Samuel</div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-03-03 11:20 GMT-05:00 C Bergström <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Mar 3, 2016 at 10:19 PM, Ronan Keryell <<a href="mailto:ronan@keryell.fr">ronan@keryell.fr</a>> wrote:<br>

>>>>>> On Thu, 3 Mar 2016 18:19:43 +0700, C Bergström via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> said:<br>

><br>

>     C> On Thu, Mar 3, 2016 at 5:50 PM, Ronan KERYELL via cfe-dev<br>

>     C> <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br>

><br>

>     >> Just to be sure to understand: you are thinking about being able<br>

>     >> to outline several "languages" at once, such as CUDA *and*<br>

>     >> OpenMP, right ?<br>

>     >><br>

>     >> I think it is required for serious applications. For example, in<br>

>     >> the HPC world, it is common to have hybrid multi-node<br>

>     >> heterogeneous applications that use MPI+OpenMP+OpenCL for<br>

>     >> example. Since MPI and OpenCL are just libraries, there is only<br>

>     >> OpenMP to off-load here. But if we move to OpenCL SYCL instead<br>

>     >> with MPI+OpenMP+SYCL then both OpenMP and SYCL have to be managed<br>

>     >> by the Clang off-loading infrastructure at the same time and be<br>

>     >> sure they combine gracefully...<br>

>     >><br>

>     >> I think your second proposal about (un)bundling can already<br>

>     >> manage this.<br>

>     >><br>

>     >> Otherwise, what about the code outlining itself used in the<br>

>     >> off-loading process? The code generation itself requires to<br>

>     >> outline the kernel code to some external functions to be compiled<br>

>     >> by the kernel compiler. Do you think it is up to the programmer<br>

>     >> to re-use the recipes used by OpenMP and CUDA for example or it<br>

>     >> would be interesting to have a third proposal to abstract more<br>

>     >> the outliner to be configurable to handle globally OpenMP, CUDA,<br>

>     >> SYCL...?<br>

><br>

>     C> Some very good points above and back to my broken record..<br>

><br>

>     C> If all offloading is done in a single unified library -<br>

>     C> a. Lowering in LLVM is greatly simplified since there's ***1***<br>

>     C> offload API to be supported A region that's outlined for SYCL,<br>

>     C> CUDA or something else is essentially the same thing. (I do<br>

>     C> realize that some transformation may be highly target specific,<br>

>     C> but to me that's more target hw driven than programming model<br>

>     C> driven)<br>

><br>

>     C> b. Mixing CUDA/OMP/ACC/Foo in theory may "just work" since the<br>

>     C> same runtime will handle them all. (With the limitation that if<br>

>     C> you want CUDA to *talk to* OMP or something else there needs to<br>

>     C> be some glue.  I'm merely saying that 1 application with multiple<br>

>     C> models in a way that won't conflict)<br>

><br>

>     C> c. The driver doesn't need to figure out do I link against some<br>

>     C> or a multitude of combining/conflicting libcuda, libomp,<br>

>     C> libsomething - it's liboffload - done<br>

><br>

> Yes, a unified target library would help.<br>

><br>

>     C> The driver proposal and the liboffload proposal should imnsho be<br>

>     C> tightly coupled and work together as *1*. The goals are<br>

>     C> significantly overlapping and relevant. If you get the liboffload<br>

>     C> OMP people to make that more agnostic - I think it simplifies the<br>

>     C> driver work.<br>

><br>

> So basically it is about introducing a fourth unification: liboffload.<br>

><br>

> A great unification sounds great.<br>

> My only concern is that if we tie everything together, it would increase<br>

> the entry cost: all the different components should be ready in<br>

> lock-step.<br>

> If there is already a runtime available, it would be easier to start<br>

> with and develop the other part in the meantime.<br>

> So from a pragmatic agile point-of-view, I would prefer not to impose a<br>

> strong unification.<br>

<br>

</div></div>I think may not be explaining clearly - let me elaborate by example a bit below<br>

<span class=""><br>

> In the proposal of Samuel, all the parts seem independent.<br>

><br>

>     C>   ------ More specific to this proposal - device<br>

>     C> linker vs host linker. What do you do for IPA/LTO or whole<br>

>     C> program optimizations? (Outside the scope of this project.. ?)<br>

><br>

> Ouch. I did not think about it. It sounds like science-fiction for<br>

> now. :-) Probably outside the scope of this project..<br>

<br>

</span>It should certainly not be science fiction or an after-thought. I<br>

won't go into shameless self promotion, but there are certainly useful<br>

things you can do when you have a "whole device kernel" perspective.<br>

<br>

To digress into the liboffload component of this (sorry)<br>

what we have today is basically liboffload/src/all source files mucked together<br>

<br>

What I'm proposing would look more like this<br>

<br>

liboffload/src/common_middle_layer_glue # to start this may be "best effort"<br>

liboffload/src/omp # This code should exist today, but ideally should<br>

build on top of the middle layer<br>

liboffload/src/ptx # this may exist today - not sure<br>

liboffload/src/amd_gpu # probably doesn't exist, but<br>

wouldn't/shouldn't block anything<br>

liboffload/src/phi # may exist in some form<br>

liboffload/src/cuda # may exist in some form outside of the OMP work<br>

<br>

The end result would be liboffload.<br>

<br>

Above and below the common middle layer API are programming model or<br>

hardware specific. To add a new hw backend you just implement the<br>

things the middle layer needs. To add a new programming model you<br>

build on top of the common layer. I'm not trying to force<br>

anyone/everyone to switch to this now - I'm hoping that by being a<br>

squeaky wheel this isolation of design and layers is there from the<br>

start - even if not perfect. I think it's sloppy to not consider this<br>

actually. LLVM's code generation is clean and has a nice separation<br>

per target (for the most part) - why should the offload library have<br>

bad design which just needs to be refactored later. I've seen others<br>

in the community beat up Intel to force them to have higher quality<br>

code before inclusion... some of this may actually be just minor<br>

refactoring to come close to the target. (No pun intended)<br>

-------------<br>

If others become open to this design - I'm happy to contribute more<br>

tangible details on the actual middle API.<br>

<br>

the objects which the driver has to deal with may and probably do<br>

overlap to some extent with the objects the liboffload has to load or<br>

deal with. Is there an API the driver can hook into to magically<br>

handle that or is it all per-device and 1-off..<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</div></div></blockquote></div><br></div>