[LLVMdev] LLVM based Virtual Machine "Environment" idea sanity check.

Tue Sep 5 22:29:20 PDT 2006

> If you don't mind my asking, can you tell us a little more about your 
> overall goal for this project?
 > Snip
 > I can't really give any
> feedback on which is more appropriate until I know more about the 
> constraints of your project.
> 
> -- John T.

Hopefully I can explain my project more fully (without being too wordy):

"CRAZY CRACKPOT IDEA"

What I want to do is create an idealized "processing node."  This 
virtual machine would include:

1) A processing core
2) Access to "permanent storage"
3) Access to a "Network device"

With any luck, you should be able to run this virtual "processing node" 
on an old Pentium II, on a Apple G3, on a soon-to-be-extinct PS2 
(because everyone'll be selling them to get the PS3), on a Gumstix 
Waysmall computer... basically anything that you can get your hands on.

Each of the computers above would run the "idealized processing node" in 
their native operating system.

An outside observer would see a group of heterogeneous computers 
networked with each other through the internet.

An "inside" observer would see a group of homogeneous "processing nodes" 
networked with each other through Jabber (see below).

"WHAT I WANT TO USE"

I'd like to use LLVM for the Processing Core part of this "idealized 
processing node."  This would allow me to exploit the programming 
language frontends available and the optimizing and JIT compiling 
abilities on the backend.  The key here is that the "node operator" 
might want to help a project, but might not want to fully trust the code 
for the project.  For example, if I run Folding at home, I have to fully 
trust the programmers of the Folding at home client... trust that they 
haven't hidden something evil in their program, or have a hidden 
vulnerability that hurts me.

However, if the Folding at home process was running in a virtual machine, 
they could run just about any code they wanted in it, without me having 
to worry (in general).  Of course, this is where JIT compiling should 
help out...  Basically, the process should have access to the computing 
abilities of the host... but no access to any of the hosts actual 
hardware devices.  As far as this hypothetical Folding at home process is 
concerned... it's running directly on a simple LLVM processor with a 
hard drive and a NIC... that's it.

Also, the Folding at home team has to be able to handle multiple operating 
system's way of handling files and permissions.  My "processing node" 
would give them a complete "raw hard drive" that they could write to, 
without worrying about permissions.  They could use whatever data 
storage scheme they found useful, and I would know that no matter what 
they wrote, they couldn't clobber any of MY data.  Worst case, their 
hard drive image could fill my physical hard drive... but that can be 
remedied in the somewhat harsh but forceful way of killing the offending 
process and deleting their hard drive image.

As for Networking... I was thinking of using something high level, like 
Jabber.  While Jabber was designed for instant messenging... it could 
easily be used for "interprocess communication."  The hypothetical 
Folding at home client would contact "ProjectManager at foldingathome.org" to 
get it's data.  The processing node's "Jabber ID" would be something 
like "node at domain/foldingathome".  The server could send updates to the 
node, and the node could send updates to the server... all without 
worrying about "IP addresses."

The general all around idea *from the user's side* is to make a way for 
a node to be able to run trusted/untrusted code as fast as possible in 
as safe a way as possible.

The general all around idea *from the project developer's side* is to 
have a system that allows the developer to write one version of a client 
and run it on as many "virtual processing nodes" as possible.

The general all around idea *from the point of view of the program being 
run* is that it's the only process running on top of very simple 
"hardware"... a LLVM processor, a hard drive, and a NIC that speaks Jabber.

My goal is to write the blank-slate VM that is as simple and powerful as 
possible... and let the application writers string the "processing 
nodes" up how they want.  If they want a more complex Interprocess 
Communication... they can write it on top of the 
plain-text-through-Jabber protocol (maybe with XML or YAML or something 
even more exciting).  Or they can write a library that makes a full UNIX 
style file system on top of my raw-block "hard drive."  They can make 
the nodes run single file or in a tree or any way they want... I'm just 
building the "hardware platform."

As for the "processing node" itself... the user can kill it, or restart 
it if it seems to have become "hung..." or the server requesting the 
processing job could ask to have it's job reset or killed.  The most 
important thing is that a badly written program can "nuke" and "crash" 
the VM, but it shouldn't "nuke" or "crash" the host computer.

This is where I'm stumped... how can I use LLVM inside my "simplified 
processing node," allowing it only access to my "virtual devices," 
getting it to run "as fast as possible" with as little possibility of it 
clobbering the host computer should something untoward happen to the 
LLVM code?  I'm still sifting through the documentation and feel like 
I'm 90% there to understanding out LLVM works as a Virtual Machine, and 
not a Compiler backend or Intermediate Language.  Or, if LLVM isn't my 
"perfect match," where should I look?

-- 
   +----+       Shawn Boles        +------+/
  /(  )/|     "Chief Engineer"     |     \/
+----+ +      AutoDMC Labs        |Cert. |
|oooo|/                           |Video |
+----+   autodmc at autodmclabs.com  +------+