[LLVMdev] Integer handling

Tue Sep 30 01:28:04 PDT 2008

On Mon, Sep 29, 2008 at 9:48 PM, Matt Giuca <mattgiuca at gmail.com> wrote:
> It seems like your language is very high level indeed (it almost sounds
> dynamic). If you can pass arbitrary messages without needing to statically
> know the type of anything, and pass data transparently from one machine to
> another.
It is in no way dynamic, everything is compiled down.  All Actors are
fully compiled code, each could be akin to its own Process.  Each
Actor has its own event loop, parses its own messages, etc...
When a Message is sent you just build up a structure and send it off.
At compile time the structure is stuffed into a bitstream with some
metadata at the front describing the structure (it uses 4 or 5 bits
per structure 'element', so there is a little overhead, but it is very
little).  The Message is then sent off and the receiving Actor will
receive it in some amount of time (the Actor model uses unbounded
nondeterminism, yea, try imagining that), when it does receive it then
in the event loop it parses it like this (this is more erlangish
syntax since I have not finalized my syntax for the receiving loop in
my language):
receive ->
    {"Calc", MyDataType newData, PID sender} -> // calculate something
from newData and send the 'Resp'onse back to the sender
        sender.sendMessage({"Resp", SomeFunc(newData)});
    {"Get", PID sender} -> // they just want the current data status
sent back to them as a 'Resp'onse
        sender.sendMessage({"Resp", currentData});

Where the {"Calc", MyDataType newData, PID sender} structure will
match any structure that has three elements, where the first is an i8
array of 4 elements of specific variables (specifically matching
"Calc", you could just as well use an integer or anything...), the
second element is of type MyDataType (maybe an i32 for example?), and
the third element is of a PID type (a link to another actor).  You can
guess how the second one is matched.  But at compile time this whole
structure is compiled down to its base type and builds up a comparison
tree of matching elements (which in this case the first element is the
only 'compared' one and the others are just matched based on type).
If a match is successful then the variables of the Type matches passed
in (if no variable is specified then the Type is still matched at that
point, but nothing is set) are set to the values of those positions in
the message, so if the sender originally sent a message like:
anActor.send({"Calc", i32(18), self}); // builds a struct of { [i8 x
4], i32, {someOtherStructThatRepresentsAPid} }, which is compiled down
to a bitstream and sent out.  If the receive loop above matched
something like:
    {i8[4] theName, MyDataType, PID}
Then you could get what the array is as theName, and ignore the other
two elements.  If a message is not matched with anything right now (an
Actor can have many such receive switches for receiving different
messages at different times) then it is stored in a queue for the
Actor (not really a Queue, in the Actor world a new Actor would then
be created that does nothing but hold the message, resending it out
every little bit, and Actors can be 'link'ed in such a way so that if
one dies all 'link'ed Actors die too with a death message, which can
be handled to do some cleanup, but cannot be ignored).

In reality, Messages are not actually compressed down to a bitstream
unless it needs to be, if the Actor is on the local machine then the
Message is handed over directly to it if the Actor is not in a running
state, if it is in a running state then a new Actor is created to keep
trying to send it to it and sent to the scheduler to send it periodic
updates to keep trying, so all of this is basically just a copy of a
pointer size to the message struct, else if the PID is to something
that cannot be passed directly, like another cpu on a non-shared
memory system or another process or over the Internet, then it is
bitstreamed up and sent to the handler for the destination (another
Actor that just takes a bitstream in a message, figures out the remote
PID for it, sends it off to a corresponding handler on the other
machine, maybe even hopping across multiple machines like router, and
is eventually de-bitstreamed and passed to the receiving Actor.

So there is matching involved (like many dynamic functional languages
used), but it is all resolved at compile time, all comparison and
matching functions become base instructions that test for each type,
usually with a single test skipping any specific one (the first
element will probably just become a switch test since the majority of
the time, in a well made Actor system, that is the only part that has
to be compared to determine what a message is, if they are smart and
use integers, or an array of chars that will fit into a machine word
so an integer compare can be used, like "Calc" becomes the size of an
i32, nice and fast compare).

The Actor model has been pretty well researched and really only sees
heavy use in the Telecom industry (which relies on 100% uptime, quite
literally, the flapship Erlang system is the system that runs
Ericcson, it has had 4 minutes of downtime in over ten years thanks to
it being well designed; quite literally, if code needs to be updated
then can just start a new node and link it to a global PID to handle
all messages of whatever Actor and send a kill message to the old
Actor, which will die once it finishes handling the messages it
currently is dealing with; when an Actor dies it can send a kill
message to other Actors that registered themselves as such a listener
with an Actor; and by default, in Erlang, when an Actor creates
another Actor they are auto linked, you have to explicitly unlink them
if you do not want one dieing one to bring down the whole set, but by
explicitly unlinking them you know where you are setting safety bounds
in the system so you can register an Actor to, for example, do nothing
but just wait for a death message from such a 'system' of linked
Actors, and if they die record the death reason and re-construct it as
necessary, usually called Monitor or Guard Actors, depending on their
exact purpose).

It is actually a very interesting way to program in.  I learned Erlang
about a year back, and loved the style (although I hate the
'functional' programming), and I could find no other language that
used the Actor model (the Object-Oriented model took over because of
C++ and language like that, kind of killed off the Actor model except
for Erlang).  Erlang is also an utter horror to integrate with C++
applications as the C++ application has to 'pretend' to be an endpoint
node, handling all of the special Erlang types, doing the pattern
matching explicitly, etc...  I already have a setup API for
integrating my language into a C++ app directly so functions can be
registered and so forth (my preferred way).  So now my Actor language
can call exposed C/C++ functions directly, so, if it was a game for
example, there could be an Actor that just takes messages from all the
'object/Actors' of the game world about their position and other
rendering updates, and passes that to the Renderer to tell it to
update.  There can be Actors that handle specific Zones, a whole
hierarchy of message passing, where things only handle things that
need to be handled.  The nice thing about this model, for example if
you implemented an MMO server in it (*hack*cough*), it could be kept
up persistently, no downtime needed for patches, the single system can
handle the entire game world, regardless of the amount of players.  If
you need more server power you can literally just toss on another few
computers, link them to the system, and let some Monitor Actors
suddenly notice that there is a lot of unused CPU power on such
systems so to start sending code over to be compiled and some Actors
started.  If the new servers have access to the outside then the
Monitors can start up some more Actors on those machines to handle
more players and have them register with a single login system so they
can handle some more player flow.  Have a good Fiber network back-end
and you have a robust, scalable, fast system.

Even just on a single computer, a program made in the Actor model can
scale to any number of CPU's, so it is 'future-proof' with how CPU's
are advancing.  Even if CPU's have different capabilities (and LLVM
supports compiling to those other sets) then you can have specialized
Actors running on them (like the Cell processor with its little
secondary PPU's with non-shared memory, but using a Message Passing
central bus, the Actor model represents these style CPU's perfectly,
unlike C++).

Er, I think I made this too long.  Either way, nothing is really
dynamic in this setup, everything is pretty well set-in-stone at
compile time.  I am mostly making the language for myself, but a
couple have shown their interest in it so I am trying to overall make
it easier to use for others, rather then just me.  I did not describe
a great amount of details about how Messages are handled, but I would
guess you could glean most of that from what was already stated.

As you can see though, it is best to have only built-in system types.
Letting the user create custom types means that the Bitstream encoding
becomes more complex, hence slower, and it means more code has to be
linked in, meaning if a new node is setup then even more code has to
be sent over, rather then just a file containing how an Actor works.
Currently I do support an "alias" keyword that can map a complex type
to a simple name, but something made with that name and something made
with its base type are identical in all operations, not treated any
different, just like the typedef keyword in C/C++ (unlike the typedef
keyword in D, which actually does make a completely new non-compatible
type, my alias keyword works like typedef in C/C++ or the alias
keyword in D) for ease of typing.

Anyone feel free to pick this all apart though.  Any problems, bugs,
bad designs, etc... all need to be figured out before I get too far
into this.