[cfe-dev] parallel C++

Wed Nov 28 13:20:38 PST 2018

Presumably, you have ~4 choices here:

1-      Figure out how to do the work on your own.

2-      Figure out how to convince enough people to help you implement it with this for free.

3-      Pay a bunch of people to implement it for you.

4-      Figure out how to convince a company/multiple companies to either:

a.       Pay a bunch of people to implement it for you.

b.      Pay a bunch of people to put together a standards document for a new language, then presumably pay a bunch of people to implement it.

From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of Edward Givelberg via cfe-dev
Sent: Wednesday, November 28, 2018 1:15 PM
To: jfbastien at apple.com
Cc: cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] parallel C++

I like Clang a lot, but it seems to me that creating a serious prototype needs more than
one person's effort. I feel that the best approach may be to first create a dialect of C++ and see its adoption by the community.

Is LLVM/Clang suitable for the project?
Naively, it seems to me that LLVM is a sequential VM, so perhaps its architecture needs be extended.
I am proposing an intermediate representation which encodes object operations,
let's call it IOR. IOR is translated to interconnect hardware instructions, as well as LLVM's IR.
I am proposing a dedicated back end to generate code for the interconnect fabric.
I am just throwing some ideas here, but people here should know better than me what needs to be done. I'd like to know what people think on this.

Since I don't think I can do this on my own, I'm interested in any ideas about how this
could be done. I think that many different kind of companies should be interested in this for different reasons.
For example, if you are planning to manufacture a 1000 core CPU, you have a problem programming it. You probably are implementing some kind of communications libraries for
the network on the chip, but with a compiler like this you'll be able to
(1) run standard sequential C++ code, where the compiler allocates objects onto many different cores. This may not be much slower than sequential execution, and it would be possible to run many such applications simultaneously, so there should be large savings in electricity.
(2) do programming in parallel C++

On Wed, Nov 28, 2018 at 12:27 AM JF Bastien <jfbastien at apple.com<mailto:jfbastien at apple.com>> wrote:
AFAICT, it seems like what you’re suggesting to build isn’t a good fit for C++. I maybe be wrong.

If you think it should in fact be how C++ evolves, you should take your proposal to the C++ standards committee. To do so, you really need to dig into the references folks have sent and integrate them into your thinking, which you seem on track to do. I also suggest having some believable implementation, because that type of idea basically doesn’t see the light of day without someone committing to implementing it.

If you think it’s a new language you’re creating, do you think LLVM has the right model for what you’re trying to do? Again, it seems like implementation experience would help figure that out.

On Nov 27, 2018, at 5:12 PM, Edward Givelberg via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

Bruce, Jeff, Bryce,

I am just a user of C++ and am not familiar with all the work that goes into language development and standardization, so this is certainly interesting to me.
Perhaps due to my being an outsider, and unaware of all the difficulties,
my proposal is radical.
It is not about extending C++, but redefining it at its core.
The only definition of an object that I found in the standard is "a region of storage".
Perhaps there is a better definition somewhere. I'd be interested to read it.
I am proposing that an object is a virtual machine.
When I talk about a remote pointer, it is not just another pointer into some memory.
It is a way to access a virtual machine. I don't know what the architecture will look like,
and I don't know what is memory and where is it.
If I may be so rude, I propose to get rid of the C++ memory model and the C++ abstract machine enitirely. I did not know about the existence of SG1, the C++ subcommittee for concurrency and parallelism. An object-oriented language (like C++) is conceptually inherently parallel, and it is bizarre that for several decades it has been a sequential language. SG1 should not even exist, or it should be the committee for the whole of C++ language.

I have sketched a trivial abstract model of object-oriented computation in my paper. It does not mention memory and it does not mention network. It doesn't even mention a processor. I propose it as a starting point for a more detailed and workable model.
I wrote in my paper that
"We perceive the world primarily as a world of objects,
a world where a multitude of objects interact simultaneously.
From the programmer's point of view
interactions between objects are meaningful and memorable,
unlike interactions of processes exchanging messages."
The fact that C++ is a sequential language is kinda perverse!

To summarize, I am not proposing a mechanism, a language extension, a library, a tool, a framework or a paradigm. I am proposing to redefine the existing language.
I know it sounds completely crazy and impractical, but I believe that continuing to build on the current base is not going to work. The hardware can only go in the direction of parallelism. There is a need to build desktops with thousands, or millions, of processors, and the obstacle is programmability. The sequential framework of C++ won't work, and adding threads, language extensions, libraries, tools is futile.

The reason that I so presumptuously named my paper as a solution to the problem of parallel programming is that while my proposal is radical, I also think that a fairly smooth and gradual  transition is possible from sequential C++ to parallel C++. Parallel interepretation will break a lot of code, but a lot of code can be translated to a parallel interpretation automatically, or with a relatively small programming effort. Moreover, I have shown in my paper that you can run your sequential code on parallel hardware without breaking it, and since the hardware will be more energy efficient, you will be able to run your code at a lower cost.

Bryce, thanks for the pointers. I am looking over them.
Again, thanks to everybody for their remarks. The topic is too complex for one man, and this input has helped me tremendously already.

On Tue, Nov 27, 2018 at 6:36 PM Bryce Adelstein Lelbach aka wash <brycelelbach at gmail.com<mailto:brycelelbach at gmail.com>> wrote:
> I propose to introduce remote pointers into C++. I am very surprised nobody thought of this before.

0.) People have thought of this before. The earliest work I know of
was published in 1993.
1.) People have proposed this before.*

I read your paper this afternoon. I think your paper needs to explain
how it differentiates itself from the plethora of related work in this
problem space.

SG1, the C++ subcommittee for concurrency and parallelism, has decided
to not pursue "inaccessible memory" (e.g. remote memory) at this time;
we'd like to tackle affinity, memory access, and heterogeneous memory
first. This decision was made via a straw poll at the 2017 Toronto
committee meeting:

Straw poll: In the near term, ignore inaccessible memory except for
cross-process sharing on the same device.
SF F N A SA
7 7 5 2 1

Before any work can be done on extending C++ to support diverse memory
kinds, we must first explore how the C++ memory model and abstract
machine will be impacted - what breaks, what needs to be extended,
etc. We still don't have a proposal for that. I would welcome an
SG1-targeted paper on that subject matter.

*: I don't have a full list of prior proposals in this space, but
here's some material to get you started:

https://wg21.link/P0009
https://wg21.link/P0234
https://wg21.link/P0367
https://wg21.link/P0567
https://wg21.link/P0687
https://wg21.link/P0796
https://wg21.link/P1026
http://www.nersc.gov/assets/Uploads/IJHPCA-paper.pdf
http://stellar.cct.lsu.edu/pubs/pgas14.pdf
http://charm.cs.illinois.edu/newPapers/93-02/paper.pdf
On Tue, Nov 27, 2018 at 8:14 AM Edward Givelberg via cfe-dev
<cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
>
> Jeff,
> Multi-core CPUs and all the associated software technologies (shared memory, threads, etc) are a technological dead end.
> I argue more than that: all software technologies that use processes
> are dead on arrival. This includes the technologies you mention in
> your presentation
> https://www.ixpug.org/images/docs/KAUST_Workshop_2018/IXPUG_Invited2_Hammond.pdf
> People got used to processes over decades, so when they talk about parallelism they immediately talk about processes, but this is the root of the problem. I propose object-level parallelism. An object is more than a process. It is a virtual machine.
>
> I propose to introduce remote pointers into C++. I am very surprised nobody thought of this before. I'd be curious to know how much work
> people think this would be to do it in LLVM. I know it may be possible to introduce something like remote_ptr, but I don't think it is a good idea.
>
> I am also proposing a new way of executing code, which I call causal asynchronous execution. I'd like to know if people find it natural to write code like this.
>
>
>
>
>
>
>
>
> On Mon, Nov 26, 2018 at 10:26 PM Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>> wrote:
>>
>> I’ll probably have more detailed comments later but the related work you may wish to consider includes:
>> - UPC and Berkeley UPC++
>> - Charm++
>> - HPX from LSU
>> - DASH (http://www.dash-project.org/)
>> - MADNESS (https://arxiv.org/abs/1507.01888)
>>
>> There are quite a few dead parallel C++ dialects from the last millennium but it’s probably not worth your time to find and read about them.
>>
>> I’m very glad that you used MPI as your communication runtime. This will save you lots of pain.
>>
>> Jeff
>>
>> On Mon, Nov 26, 2018 at 2:57 PM Edward Givelberg via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
>>>
>>>
>>> Chris Lattner suggested that I post to this mailing list.
>>>
>>> I used Clang/LLVM to build a prototype for parallel
>>> interpretation of C++. It's based on the idea that C++
>>> objects can be constructed remotely, and accessed via
>>> remote pointers, without changing the C++ syntax.
>>> I am a mathematician, not an expert on compilers.
>>> I am proposing changes to the C++ standard and to the
>>> compiler architecture, so I'm very interested to hear from
>>> experts.
>>> My paper is
>>> https://arxiv.org/abs/1811.09303
>>> Best regards,
>>> Ed
>>>
>>> -----------------------------------------------------------------
>>> A solution to the problem of parallel programming
>>> E. Givelberg
>>>
>>> The problem of parallel programming is the most important
>>> open problem of computer engineering.
>>> We show that object-oriented languages, such as C++,
>>> can be interpreted as parallel programming languages,
>>> and standard sequential programs can be parallelized automatically.
>>> Parallel C++ code is typically more than ten times shorter than
>>> the equivalent C++ code with MPI.
>>> The large reduction in the number of lines of code in parallel C++
>>> is primarily due to the fact that communications instructions,
>>> including packing and unpacking of messages, are automatically
>>> generated in the implementation of object operations.
>>> We believe that implementation and standardization of parallel
>>> object-oriented languages will drastically reduce the cost of
>>> parallel programming.
>>> his work provides the foundation for building a new computer
>>> architecture, the multiprocessor computer, including
>>> an object-oriented operating system and more energy-efficient,
>>> and easily programmable, parallel hardware architecture.
>>> The key software component of this architecture is a compiler
>>> for object-oriented languages.  We describe a novel compiler
>>> architecture with a dedicated back end for the interconnect fabric,
>>> making the network a part of a multiprocessor computer,
>>> rather than a collection of pipes between processor nodes.
>>> Such a compiler exposes the network hardware features
>>> to the application, analyzes its network utilization, optimizes the
>>> application as a whole, and generates the code for the
>>> interconnect fabric and for the processors.
>>> Since the information technology sector's electric power consumption
>>> is very high, and rising rapidly, implementation and widespread
>>> adoption of multiprocessor computer architecture
>>> will significantly reduce the world's energy consumption.
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com<mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

--
Bryce Adelstein Lelbach aka wash
ISO C++ Committee Member
HPX and Thrust Developer
CUDA Convert and Reformed AVX Junkie
--
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181128/48b40839/attachment.html>