[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Thu Jun 2 05:38:29 PDT 2016

----- Original Message -----
> From: "C Bergström" <cbergstrom at pathscale.com>
> To: "Chandler Carruth" <chandlerc at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <llvm-dev at lists.llvm.org>, "openmp-dev" <openmp-dev at lists.llvm.org>,
> "cfe-dev" <cfe-dev at lists.llvm.org>
> Sent: Thursday, June 2, 2016 4:02:44 AM
> Subject: Re: [cfe-dev] [Openmp-dev] [llvm-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support
> libraries
> 
> On Thu, Jun 2, 2016 at 4:08 PM, Chandler Carruth
> <chandlerc at google.com> wrote:
> > When considering adding a new project to LLVM, I think it is
> > important to
> > consider contributors to the project as a whole and not just to
> > OpenMP,
> > offloading, or any other single part of LLVM. That's why this
> > thread is on
> > llvm-dev, cfe-dev, and openmp-dev.
> >
> > On Wed, Jun 1, 2016 at 10:21 PM C Bergström
> > <cbergstrom at pathscale.com>
> > wrote:
> >>
> >> (who has
> >> contributed to OMP btw, most of the ARM port, cmake stuff and lots
> >> of
> >> code reviews)
> >
> >
> > I'm aware of your contributions to OMP, and very much appreciate
> > them. I was
> > one of the people very interested in CMake support, and I think the
> > AArch64
> > port is great. My only statement was "more significant
> > contributors", and I
> > think that is accurate. I'm sorry if this was confusing, or I gave
> > any
> > impression that your contributions are not appreciated. That was
> > not my
> > intent. It also has no bearing on the merits of your technical
> > feedback
> > (which is excellent and appreciated) only on how we make a decision
> > when
> > there are differences of technical opinion or judgement.
> >
> >>
> >> and Intel - (who has significantly contributed to OMP).
> >
> >
> > Also just for clarification, I am definitely interested in Intel
> > contributors' opinions here. My impression from the emails was that
> > the
> > clarifications around scope and role of this project had largely
> > addressed
> > their concerns. If that's not correct, I've just misunderstood and
> > look
> > forward to clarification. =]
> 
> I don't want to mince words
> -------------
> My goal and scope is not just clang-world, but contributions from the
> whole damn programming model and offloading space - Where possible I
> try to reduce confusion and improve the quality of what we have. I'm
> not tactful, but I do try my best to get people to play nice when and
> where possible. Work goals, deadlines and lots of things get in the
> way from people trying to meet their own internal objectives, but at
> the same time meet objectives on a much grander scale. I think this
> is
> a regression on a grander scale. HOWEVER, I think it has the
> potential
> to bring more people aligned.
> 
> Nobody wants to get caught-up in long email threads, but your
> approach
> on this just doesn't seem to be "unite" - it seems way more focused
> on
> things "your way". Smart people tend to lean towards feeling their
> approach is more correct than another, but that's sometimes limited
> by
> experience. Without more projects using SE, it just seems to me
> Google
> is dumping this somewhere, since if it goes anywhere else it'll just
> die.

Given that SE is part of TensorFlow, it seems unlikely to die regardless.

> Long threads can end up being ignored.. I'm curious what other people
> from ARM think - A drive-by comment from Renato may not constitute a
> larger view. What about NVIDIA or Cray or PGI, who may not be direct
> contributors to the project, but are certainly stakeholders. What
> about Tom from AMD? He's certainly a major contributor.. I'd not
> bring
> them into this to block you or other crap tactics, but honest and
> sincere feedback. I'm relentless because I CARE.
> 
> If we stop looking at our feet and look out a bit further - we have a
> parallel runtime which handles both onloading and offloading. It
> should cover SIMD, thread parallelism, tasks(which are just wrappers
> around pthreads) and GPU threads (however you want to describe those)
> - We all probably agree we want less code duplication, right?? (If we
> can't agree on this then the rest of my argument is moot)
> 
> Again I'm injecting my personal experience and biases... forgive me
> while I give a bit more context..
> 
> When starting @pathscale I inherited an OpenMP runtime, which then
> and
> now is pretty high performance for OpenMP 2.x-2.5. (We may have open
> sourced this code at some point, I forget

You did; it was one of the first things we worked on together, and if nothing else, a copy ended up here:

  https://github.com/jeffhammond/libopenmp

>) Basic linear scalability
> for pthreads wrapper on typical Intel machine. I didn't and don't see
> people using a lot of OMP3 in the wild and investing further just
> wasn't a high priority. OMP3 brought tasks and a few other things of
> interest.
> 
> As OMP continued to move forward and the checkbox of support became
> less a pragmatic necessity and more a marketing one. Coincidently,
> Intel open sourced their runtime... we ditched our runtime and
> migrated over. The migration was for the most part relatively
> painless
> since the interfaces to our runtime for OMP and Intel's weren't all
> that drastically different.. So we added the parsing/sema and some
> extra bits and voila we had OMP3 support. The Intel runtime does a
> pretty respectable job on performance, but the cost is the code is
> really nasty to read. Our old runtime was dead simple by comparison,
> but OMP features are likely to blame for some % of the added
> complexity.
> 
> OMP saw OpenACC and others handling GPU offloading and started
> duplicating effort and going down that path.. In parallel to the OMP
> stuff above we already had GPU offloading in a *different* runtime.
> When we added OMP4 support and offloading - we stayed with our
> GPU/offloading library and continued to use the Intel runtime for OMP
> CPU.. When we added OpenACC *HOST* onloading (ARMv8 across *many*
> cores) I made the mistake and we used our GPU/offloading library. We
> did this because it already supported the OpenACC interfaces we
> needed. To further go down the rabbit hole of my failure, we debugged
> all the performance regressions and ended up making it *faster* than
> the Intel OMP runtime on cpu scaling... (Doing this was not easy...)
> 
> You may not care one bit about my internal woes, but this experience
> has left an impression.
> ---------------
> Bottom line: If you don't plan things carefully from the start -
> you'll end up with a big mess of complicated *** and lots of
> duplicated code. Your project is basically throwing away many
> man-years of investment by not leveraging the llvm openmp runtime
> more. Short term it may seem easy to do your own thing, but long term
> it will likely complicate things for everyone around here who cares.
> Users will not benefit from 2 CPU parallel runtimes and multiple
> offloading libraries. I can point to a few offloading libraries
> already and most aren't production quality - I suspect because it's
> not a team effort, but 1-2 guys hacking here and there.

You make a lot of good points here, although I'll point out that the OpenMP offloading library is new, and I think a major challenge will be getting good code reviews for it more than properly leveraging previous investment in the host OpenMP runtime.

> I would love to have an in-depth and friendly conversation (probably
> offline) about the nice things about SE. (Notice in all my emails I
> didn't trash the programming model as garbage) Personally, I'd love
> to
> take the good things from SE and see how they can be incorporated
> into
> existing standards.

I see no better way to encourage that kind of interaction than by having a group of people actively involved in OpenMP standardization and development engaged in technical discussions with the SE developers. From our previous e-mail exchange, I understand that you may disagree.

Thanks again,
Hal

> If that fails, I concede and yet another
> "standard" is born.
> 
> I don't know if you can take your Google hat off and look big
> picture,
> but do you see what I'm getting at here?
> 
> Side note: have you talked with Ian Buck or others on the NVIDIA CUDA
> team? I'm sure they would love better C++ support in CUDA. I bet your
> feedback is quite valuable..
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory