[cfe-dev] [libc++] Working on the parallel STL algorithms

C Bergström via cfe-dev cfe-dev at lists.llvm.org
Tue May 16 09:57:53 PDT 2017

On Wed, May 17, 2017 at 12:20 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> On 05/16/2017 02:54 AM, C Bergström wrote:
> On Tue, May 16, 2017 at 2:50 PM, Hal Finkel via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>> Hi, Erik,
>> That's great!
>> Gor, Marshall, and I discussed this after some past committee meeting. We
>> wanted to architect the implementation so that we could provide different
>> underlying concurrency mechanisms; including:
>>    a. A self-contained thread-pool-based implementation using a
>> work-stealing scheme.
>>    b. An implementation that wraps Grand Central Dispatch (for Mac and
>> any other platforms providing libdispatch).
>>    c. An implementation that uses OpenMP.
> Sorry to butt in, but I'm kinda curious how these will be substantially
> different under the hood
> No need to be sorry; this is a good question. I think that there are a few
> high-level goals here:
>  1. Provide a solution that works for everybody
>  2. Take advantage of compiler technology as appropriate
>  3. Provide useful interoperability. In practice: don't oversubscribe the
> system.
> The motivation for providing an implementation based on a libc++ thread
> pool is to satisfy (1). Your suggestion of using our OpenMP runtime's
> low-level API directly is a good one. Personally, I really like this idea.
> It does imply, however, that organizations that distribute libc++ will also
> end up distributing libomp. If libomp has matured (in the open-source
> sense) to the point where this is a suitable solution, then we should do
> this. As I recall, however, we still have at least several organizations
> that ship Clang/LLVM/libc++-based toolchains that don't ship libomp, and I
> don't know how generally comfortable people will be with this dependency.

If "people" aren't comfortable with llvm-openmp then kick it out as a
project. I use it and I know other projects that use it just fine. I can
maybe claim the title of OpenMP hater and yet I don't know any legitimate
reason against having this as a dependency. It's a portable parallel
runtime that exposes an API and works.. I hope someone does speak up about
specific concerns if they exist.

> That having been said, to point (2), using the OpenMP compiler directives
> is superior to calling the low-level API directly. OpenMP directives to
> translate into API calls, as you point out, but they also provide
> optimization hints to the compiler (e.g. about lack of loop-carried
> dependencies). Over the next couple of years, I expect to see a lot more in
> the compiler optimization capabilities around OpenMP (and perhaps other
> parallelism) directives (parallel-region fusion, etc.). OpenMP also
> provides a standard way to access many of the relevant vectorization hints,
> and taking advantage of this is useful for compiling with Clang and also
> other compilers.

If projects can't even ship llvm-openmp runtime then I have a very strong
concern with bootstrap dependencies which may start relying on external

Further, I'm not sure I understand your point here. The directives wouldn't
be in the end user code, but would be in the STL implementation side.
Wouldn't that implementation stuff be fixed and an abstract layer exposed
to the end user? It almost sounds like you're expressing the benefits of
OMP here and not the parallel STL side. (Hmm.. in the distance I
hear.. "*premature
optimization* is the root of *all evil")*

Once llvm OpenMP can do things like handle nested parallelism and a few
more advanced things properly all this might be fun (We can go down a big
list if anyone wants to digress)

> Regarding why you'd use GDC on Mac, and similarly why it is important for
> many users to use OpenMP underneath, it is important, to the extent
> possible, to use the same underlying thread pool as other things in the
> application. This is to avoid over-subscription and other issues associated
> with conflicting threading runtimes. If parts of the application are
> already using GCD, then we probably want to do this to (or at least not
> compete with it). Otherwise, OpenMP's runtime is probably better ;)

Again this detail isn't visible to the end user? We pick an implementation
that makes sense. If other applications use GCD and we use OpenMP, if
multiple thread heavy applications are running, over-subscription would be
a kernel issue and not userland. I don't see how you can always avoid that
situation and creating two implementations to try kinda seems funny. btw
GCD is a marketing term and libdispatch is really what I'm talking about
here. It's been quite a while since I hands on worked with it, but I wonder
how much the API overlaps with similar interfaces to llvm-openmp. If the
interfaces are similar and the "cost" in terms of complexity is low, who
cares, but I don't remember that being the case. (side note: I worked on an
older version of libdispatch and ported it Solaris. I also played around
and benchmarked OMP tasks lowering directly down to libdispatch calls
across multiple platforms. At the time our runtime always beat it in
performance. Maybe newer versions of libdispatch are better)

I'm not trying to be combative, but your points just don't make
sense....... (I take the blame and must be missing something)
All this aside - I'm happy to help if needed - GPU (NVIDIA or AMD) and or
llvm-openmp direct runtime api implementation. I've been involved with
sorta similar projects (C++AMP) and based on that experience may be able to
help avoid some gotchas.
