<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 22, 2016, at 3:24 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div dir="ltr" class="">On Fri, Apr 22, 2016 at 3:05 PM Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" class="">mehdi.amini@apple.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="">

> On Apr 22, 2016, at 3:01 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank" class="">chandlerc@gmail.com</a>> wrote:<br class="">

><br class="">

> I feel like this thread got a bit stalled. I'd like to pick it up and try to suggest a path forward.<br class="">

><br class="">

> I don't hear any real objections to the overall idea of having an LLVM subproject for parallelism runtimes and support libraries. I think we should get that created.<br class="">

<br class="">

I think it should be clarified if "parallelism runtimes and support libraries" are intended to expose user-level APIs or if these are intended to expose APIs for the compiler generated code (this may be part of your point about "writing up its charter, scope" but I also think it shouldn't be underestimated as a task so I called it out).<br class=""></blockquote><div class=""><br class=""></div><div class="">Absolutely. I think that needs to be clearly spelled out.</div><div class=""><br class=""></div><div class="">Personally, I'd like to see the subproject open to *both*. Here are some libraries I would love to see (but don't necessarily have concrete plans around):</div><div class="">- A nice vectorized math library</div><div class="">- Linear algebra libraries like BLAS implementations or such</div><div class="">- Highly tuned FFT or other domain specific libraries for GPUs. Essentially the same is the vectorized math libraries but for GPUs and slightly higher level.</div><div class="">- Stream executor </div><div class="">- Any generic components of the OpenMP libraries.</div><div class=""><br class=""></div><div class="">Clearly each of these would need to be discussed on a case by case basis, but there seems to be a healthy mixture of both user-level APIs and compiler-level APIs. I would suggest criteria for being here along the lines of:</div><div class=""><br class=""></div><div class="">- Includes compiler-targeted APIs (maybe in addition to user-level APIs, maybe even with overlap), or</div><div class="">- Leverages compiler details for its implementation (for example, using vector extensions we know LLVM supports), or</div><div class="">- Wants to use compiler-specific packaging techniques or other integration techniques (for example shipping as bitcode), or</div><div class="">- Helps support compiler or programming language functionality</div><div class=""><br class=""></div><div class="">The first three here seem clear cut to me. If any part of the library is intended to be callable by the compiler, its a good fit. SE has such interfaces. Vectorized math libraries do too, etc. If the implementation of th elibrary really wants to use compiler internals like our vector math extensions, again, I think it makes sense to keep it reasonably co-located with the compiler.</div><div class=""><br class=""></div><div class="">The last seems a bit tricky, but I think its really important. Currently, CUDA provides a pretty big programming surface, and having a well tuned BLAS or FFT implementation for example that integrates with CUDA is pretty important. Similarly in the future, we expect C++ to get lots of parallel standard library interfaces, potentially even BLAS-looking ones and we might want a good parallel BLAS implementation or other very fundamental parallel library implementation to use when implementing it.</div><div class=""><br class=""></div><div class="">But at the same time, I think its really important to have a clear place where any library here ties back into the compiler ecosystem and/or the programming language ecosystem that are the core of LLVM.</div><div class=""><br class=""></div><div class="">Does this seem like its going in the right direction?</div></div></div></div></blockquote><div><br class=""></div><div>Yes.</div><div>I just think we need to be careful about having clear layering/decoupling between the various pieces of the libraries I think (I'm not sure if low-level/high-level is the right distinction for instance, it would require some thoughts), but the LLVM community is usually pretty good a this (even if the recent "discussions" around lld indicated it is not always a given).</div><div><br class=""></div><div>-- </div><div>Mehdi</div><div><br class=""></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div class=""> (Jason can probably take on the non-trivial task of writing this up more formally and make sure it is clearly documented.)</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br class="">

Otherwise you plan sounds good to me.<br class="">

<br class="">

--<br class="">

Mehdi<br class="">

<br class="">

<br class="">

<br class="">

><br class="">

> I don't actually see any real objections to StreamExecutor being one of the runtimes. There are some interesting questions however:<br class="">

> - Is there common code in the OpenMP runtime that could be unified with this?<br class="">

> - Could OpenMP end up using SE or some common shared library between them as a basis for offloading?<br class="">

> - Would instead it make more sense to have the OpenMP offload library be a plugin for StreamExecutor?<br class="">

><br class="">

> I don't know the answer to any of these really, but I also don't think that they should prevent us from making progress here. And I think if anything, they'll become easier to answer if we do.<br class="">

><br class="">

> So my suggestion would be:<br class="">

> 1) Create the broader scoped LLVM subproject, including writing up its charter, scope, plans, etc.<br class="">

><br class="">

> 2) Add stream executor to it<br class="">

><br class="">

> 3) Initially, leave the OpenMP offloading stuff targeted at OpenMP. Then, as it evolves, consider moving it to be another runtime in the broad project if and when it makes sense.<br class="">

><br class="">

> 4) As both OpenMP and SE evolve and are used some in the project, evaluate whether there is a common core that makes sense to extract. If so, do it and rebase them appropriately.<br class="">

><br class="">

><br class="">

> Does this make sense? Are there objections to moving forward here?<br class="">

<br class="">

</blockquote></div></div>

</div></blockquote></div><br class=""></body></html>