[cfe-dev] [Openmp-dev] [llvm-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Thu Jun 2 02:02:44 PDT 2016

On Thu, Jun 2, 2016 at 4:08 PM, Chandler Carruth <chandlerc at google.com> wrote:
> When considering adding a new project to LLVM, I think it is important to
> consider contributors to the project as a whole and not just to OpenMP,
> offloading, or any other single part of LLVM. That's why this thread is on
> llvm-dev, cfe-dev, and openmp-dev.
>
> On Wed, Jun 1, 2016 at 10:21 PM C Bergström <cbergstrom at pathscale.com>
> wrote:
>>
>> (who has
>> contributed to OMP btw, most of the ARM port, cmake stuff and lots of
>> code reviews)
>
>
> I'm aware of your contributions to OMP, and very much appreciate them. I was
> one of the people very interested in CMake support, and I think the AArch64
> port is great. My only statement was "more significant contributors", and I
> think that is accurate. I'm sorry if this was confusing, or I gave any
> impression that your contributions are not appreciated. That was not my
> intent. It also has no bearing on the merits of your technical feedback
> (which is excellent and appreciated) only on how we make a decision when
> there are differences of technical opinion or judgement.
>
>>
>> and Intel - (who has significantly contributed to OMP).
>
>
> Also just for clarification, I am definitely interested in Intel
> contributors' opinions here. My impression from the emails was that the
> clarifications around scope and role of this project had largely addressed
> their concerns. If that's not correct, I've just misunderstood and look
> forward to clarification. =]

I don't want to mince words
-------------
My goal and scope is not just clang-world, but contributions from the
whole damn programming model and offloading space - Where possible I
try to reduce confusion and improve the quality of what we have. I'm
not tactful, but I do try my best to get people to play nice when and
where possible. Work goals, deadlines and lots of things get in the
way from people trying to meet their own internal objectives, but at
the same time meet objectives on a much grander scale. I think this is
a regression on a grander scale. HOWEVER, I think it has the potential
to bring more people aligned.

Nobody wants to get caught-up in long email threads, but your approach
on this just doesn't seem to be "unite" - it seems way more focused on
things "your way". Smart people tend to lean towards feeling their
approach is more correct than another, but that's sometimes limited by
experience. Without more projects using SE, it just seems to me Google
is dumping this somewhere, since if it goes anywhere else it'll just
die.

Long threads can end up being ignored.. I'm curious what other people
from ARM think - A drive-by comment from Renato may not constitute a
larger view. What about NVIDIA or Cray or PGI, who may not be direct
contributors to the project, but are certainly stakeholders. What
about Tom from AMD? He's certainly a major contributor.. I'd not bring
them into this to block you or other crap tactics, but honest and
sincere feedback. I'm relentless because I CARE.

If we stop looking at our feet and look out a bit further - we have a
parallel runtime which handles both onloading and offloading. It
should cover SIMD, thread parallelism, tasks(which are just wrappers
around pthreads) and GPU threads (however you want to describe those)
- We all probably agree we want less code duplication, right?? (If we
can't agree on this then the rest of my argument is moot)

Again I'm injecting my personal experience and biases... forgive me
while I give a bit more context..

When starting @pathscale I inherited an OpenMP runtime, which then and
now is pretty high performance for OpenMP 2.x-2.5. (We may have open
sourced this code at some point, I forget) Basic linear scalability
for pthreads wrapper on typical Intel machine. I didn't and don't see
people using a lot of OMP3 in the wild and investing further just
wasn't a high priority. OMP3 brought tasks and a few other things of
interest.

As OMP continued to move forward and the checkbox of support became
less a pragmatic necessity and more a marketing one. Coincidently,
Intel open sourced their runtime... we ditched our runtime and
migrated over. The migration was for the most part relatively painless
since the interfaces to our runtime for OMP and Intel's weren't all
that drastically different.. So we added the parsing/sema and some
extra bits and voila we had OMP3 support. The Intel runtime does a
pretty respectable job on performance, but the cost is the code is
really nasty to read. Our old runtime was dead simple by comparison,
but OMP features are likely to blame for some % of the added
complexity.

OMP saw OpenACC and others handling GPU offloading and started
duplicating effort and going down that path.. In parallel to the OMP
stuff above we already had GPU offloading in a *different* runtime.
When we added OMP4 support and offloading - we stayed with our
GPU/offloading library and continued to use the Intel runtime for OMP
CPU.. When we added OpenACC *HOST* onloading (ARMv8 across *many*
cores) I made the mistake and we used our GPU/offloading library. We
did this because it already supported the OpenACC interfaces we
needed. To further go down the rabbit hole of my failure, we debugged
all the performance regressions and ended up making it *faster* than
the Intel OMP runtime on cpu scaling... (Doing this was not easy...)

You may not care one bit about my internal woes, but this experience
has left an impression.
---------------
Bottom line: If you don't plan things carefully from the start -
you'll end up with a big mess of complicated *** and lots of
duplicated code. Your project is basically throwing away many
man-years of investment by not leveraging the llvm openmp runtime
more. Short term it may seem easy to do your own thing, but long term
it will likely complicate things for everyone around here who cares.
Users will not benefit from 2 CPU parallel runtimes and multiple
offloading libraries. I can point to a few offloading libraries
already and most aren't production quality - I suspect because it's
not a team effort, but 1-2 guys hacking here and there.

I would love to have an in-depth and friendly conversation (probably
offline) about the nice things about SE. (Notice in all my emails I
didn't trash the programming model as garbage) Personally, I'd love to
take the good things from SE and see how they can be incorporated into
existing standards. If that fails, I concede and yet another
"standard" is born.

I don't know if you can take your Google hat off and look big picture,
but do you see what I'm getting at here?

Side note: have you talked with Ian Buck or others on the NVIDIA CUDA
team? I'm sure they would love better C++ support in CUDA. I bet your
feedback is quite valuable..