[cfe-dev] parallel C++

Tue Nov 27 12:15:17 PST 2018

Arthur,

Your post is very helpful to me, with the exception of remarks like
"if you don't understand why I wrote `remote_ptr<T>` instead of `T*`,
you're probably in trouble already, C++-language-wise"
because although you cannot avoid using something like remote_ptr today,
I don't understand why the ordinary star cannot be interpreted as a remote
pointer.
In fact, I am sure it can be.

The rest of your post actually provides reasons why it should not be
possible, but I think
it is worth tackling them. I think it is a good list. As I wrote to you in
private, I think there are many issues that I am not addressing in my
paper, and there are likely many issues that I
am missing entirely. The reason I posted the paper here was to get this
kind of feedback.

The first issue of loss of connectivity is an operating system issue. it is
external to the application. A similar issue is: what if the application
deadlocks? The problem is that on clusters you don't have a concept of an
application, and you don't have an OS. I define an application as a
collection of objects. You need an OS that can track applications (=
collections of objects). In that sense the situation is not different from
a single CPU where processes can hang. In the new OS you get applications
that hang.

How does the remote machine get a copy of the code? Good question, lots of
possible answers. Let's start with the simplest: the compiler does it, like
in MPI.

About destructors: yes, like constructors they run on the hardware where
the remote object lives. As you say, it is likely that there are lots of
dependencies that force sequential execution,
not parallel. Well, that's life. We get parallelism elsewhere.

The problem of references is interesting. I'm not sure I understand what
you mean.
I mention this very briefly in my paper. When you pass an object by
reference, you will
need to copy the entire object to the remote processor and then copy the
entire object back
in the end. I suppose this sounds monstrous to you... in some cases it may
be possible to send less than the full object, but this is a matter of
optimization.
Basically, there is no choice: your objects cannot "all live in the single
CPU", and you cannot pretend that you can modify a remote object without
copying it. I think this is a big topic for discussion.

On subobjects: yes, it's a good question where do subobjects live. I think
this can also be relegated to the compiler and the OS. (In other words, let
other people solve this... :-))

Seriously, I'd like to think of a desktop computer with many thousands of
processors (or cores), and I'd like to run code on such a thing. For this,
all ideas of shared memory and processes are irrelevant. I am proposing a
way to write ordinary code, which is almost like the code that you're used
to, but you will probably need to make some changes.
Basically, you'll write your code as a collection of many objects that talk
to each other.
Because there are so many processors in the system, you will not be handing
how and where these objects live: the compiler and the OS will do this for
you. They will map millions of your objects to thousands of processors.
The concerns that you raise about speed are very serious. I think it should
be possible to run such applications as fast, and perhaps faster than they
run now, but I admit this is a difficult and huge project. However, I think
the benefit of lower power consumption is huge and this is why I said that
this can affect the world energy consumption. it may sound bombastic, but
this is likely easier to achive than actual
speed-up from parallelization.

Thank you very much for your input.

On Tue, Nov 27, 2018 at 1:52 PM Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
wrote:

> On Tue, Nov 27, 2018 at 12:41 PM Edward Givelberg via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>> About remote pointers: my question is specifically why can't we write
>> Object * some_object = new Object(a, b, c, d);
>> in C++ where some_object is not an ordinary pointer, but a remote pointer?
>>
>
> I gave you a couple of trouble areas in my private-email response, but
> I'll repeat them for the record here too.
>
> Circa 2012 I worked for a startup [...] allowing Objective-C objects to
> live anywhere, even on other machines. Then we used hooks in the
> Objective-C runtime to intercept messages passed to those objects, marshal
> them, and transfer them across the wire to where the object lived. We were
> doing this for the purpose of "Mobile Device Management" — that is, we
> wanted to allow an employer to provide basically an "iPhone app as a
> service," so that the app would run on the employer's server but all the
> UIKit objects would live on the employee's mobile device.
> We had two problems with this:
> - First, what does the app do when you go into a tunnel and lose
> connectivity? You have something that looks to the program like a method
> call — say, `result = object->ExecuteMethod(some, parameters)` — which can
> return a value, or throw an exception, or abort; but which can also "hang"
> due to a lost network connection. And if the caller treats this as a
> TimeoutException, then we have the problem that the callee might
> unexpectedly "resume" sometime later with a return value that the caller
> (who has unwound the stack and moved on) is no longer equipped to deal
> with. `ExecuteMethod` is acting spookily like a coroutine, here.
> - Second, how does the marshaller deal with memory, and how does it deal
> with behaviors? We need to be able to implement `qsort` across a wire
> boundary. That means we need to be able to pass an arbitrarily large chunk
> of memory (the array to sort), and we also need to be able to pass a
> function pointer (the comparator). These are both extremely intractable
> problems. Our startup solved these problems by cheating. You need to solve
> them for real.
>
>
> To elaborate on the "behaviors" part: Let's suppose I have
>
> class Object {
>     int a, b, c, d;
> public:
>     Object(int a, int b, int c, int d) : a(a), b(b), c(c), d(d) {}
>     virtual int method() { return a+b+c+d; }
>     ~Object();
> };
>
> int test() {
>     remote_ptr<Object> some_object = handwave(new Object(1,2,3,4));
>     int x = some_object->method();
>     if (x < 0) throw "oops";
>     return x;
> }
>
> First of all, if you don't understand why I wrote `remote_ptr<T>` instead
> of `T*`, you're probably in trouble already, C++-language-wise.
> Second, you're trying to make `method` execute on the remote machine,
> right? How does the remote machine get a copy of the code of
> `Object::method`?
> Third, when we hit the `throw` and unwind the stack, destructors get
> called. `Object` has a non-virtual destructor. Where does it run: on our
> machine, or on the remote machine? Presumably it must run on the remote
> machine, which means our stack-unwind is held up waiting for the result of
> each destructor we have to run (and those destructors must happen in
> serial, not in parallel).
> Fourth, consider
>
>     void helper(Object& o) {
>        o.Object::method();
>     }
>     void nexttest() {
>         remote_ptr<Object> some_object = handwave(new Object(1,2,3,4));
>         Object onstack(5,6,7,8);
>         helper(*some_object);
>         helper(onstack);
>     }
>
> Any attempt to invent non-trivial "fancy pointers" needs to come with a
> full-fledged idea of how to invent "fancy references," or it will not be
> able to get off the ground in C++. (See P0773R0
> <http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0773r0.html>.)
> Also, I snuck a non-virtual method call in there; you need a way to handle
> that.
> This problem is easier in Objective-C, where every handle is the same kind
> of pointer and every method call is virtual by definition. It is *very
> hard* in C++. And when I say "very hard," I'm fairly confident that I
> mean "impossible."
>
> Fifth, consider subobjects of remote objects:
>
>     struct FifthObject { Object m{1,2,3,4}; };
>     void fifthtest() {
>         remote_ptr<FifthObject> some_object = handwave(new FifthObject());
>         helper(some_object->m);
>     }
>
> Finally, even if you invent a new system for dealing with remote objects,
> you still have to figure out where the code is going to physically run: on
> which CPU, which process, which thread... avoiding cache ping-pong(*)...
> all this real-world-multi-processing kind of stuff. And you must be able to
> handle it all with *zero runtime cost*, or else people concerned about
> performance will just go under you and use those "dead-end" but efficient
> mechanisms, leaving your neat abstraction without a userbase.
>
> (* — In a domain without global shared memory, the analogue of "cache
> ping-ponging" would be "marshalling the entire array across a wire boundary
> every time someone calls `qsort`." We hit this problem, and, as I said,
> solved it by cheating on the demo.)
>
> –Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181127/13ae2256/attachment.html>