[LLVMdev] Secure Virtual Machine

Fri Jun 15 11:46:16 PDT 2007

Sandro Magi wrote:
> Let me cut it down to the core problem: I'm asking about the
> feasibility of extending LLVM with constructs to manage separate
> heaps. Given my current understanding of LLVM, I can see this done in
> two ways:
>   
If you just need to partition the heap into multiple heaps, then the
easiest thing to do would be to replace the use of malloc/free
instructions with calls to library functions that implement your
segmented heap allocation/free functions.

For example, in the Automatic Pool Allocation work
(http://llvm.org/pubs/2005-05-21-PLDI-PoolAlloc.html), we have an LLVM
pass that changes malloc instructions:

%tmp = malloc struct {i8}

... into calls to an allocation function that takes a pool identifier
and an allocation size as arguments (in this work, we segregated the
heap based upon pointer analysis results):

%tmp = call %poolalloc (sbyte * PoolID, uint 8)

The poolalloc function is then implemented as a run-time library
(written in C) that is compiled and linked into the program (either as a
native code library or an LLVM bytecode library).

You could do something similar to implement multiple heaps.

Your proposed methods below (adding intrinsics or new core instructions)
would work too, but using memory allocator functions does the same thing
with less work.  Adding intrinsics or new core instructions is only
useful in a few rare cases, such as when you need special code generator
support or need to extend the type system.
> 1. Add heap management instructions to the core instructions, modify
> allocation routines to explicitly name heaps or modify the runtime to
> rebind the allocation routines depending on some VM-level context that
> names a heap (thread-local storage?).
>
> 2. Add instrinsics to start a new heap (via a new ExecutionEngine?).
> This would involve modifying the VM to accept allocation primitives as
> function pointers.
>
> So a program or language with real-time constraints where an
> incremental GC is preferable, and where an efficient, non-incremental
> GC is used for other tasks, can be expressed as partitioned heaps each
> with their own GC.
>   
Doing GC may require using the LLVM GC intrinsics as described in this
document (http://llvm.org/docs/GarbageCollection.html), but just
segmenting the heap into multiple heaps should not require any new
instructions or intrinsics to be added.

-- John T.
> Sandro
>
> On 6/2/07, Sandro Magi <naasking at gmail.com> wrote:
>   
>> Many VMs focus on performance, optimizations, memory consumption, etc.
>> but very few, if any, focus on fault isolation and security. Given
>> memory safety, any VM reduces to capability security, which is
>> sufficient to implement most security policies of interest; however,
>> most such VMs still ignore two main attack vectors from malicious
>> code: DoS attack on memory allocation, and DoS against the CPU.
>>
>> I've been mulling over how LLVM could be extended to provide a degree
>> of isolation from these two attack vectors [3].
>>
>> Preventing a DoS against memory allocation involves controlling access
>> to allocation in some way. Fine-grained control over every single
>> allocation is likely infeasible [1]. Similarly, preventing a DoS
>> against the CPU involves controlling the execution time of certain
>> code blocks, by introducing concurrency or flow control of some sort.
>>
>> There is a single abstraction which has solved the above two problems
>> for over 40 years: the process, which provides an isolated memory
>> space, and an independently schedulable execution context.
>>
>> A VM process would run in its own heap and manages its own memory. The
>> memory allocation routines are scoped to the process, which can itself
>> potentially call out to a "space bank" to allocate more space for its
>> heap. Memory faults in a process can be handled by "keepers" [4].
>>
>> Concurrency is still an open question, because a kernel thread per VM
>> process is actually overkill. A mix of kernel threads and Erlang-style
>> preemptive green threads might be optimal, but this isn't the
>> interesting part of the proposal IMO.
>>
>> There must also be some sort of interprocess communication (IPC),
>> either via copying between heaps, or an "exchange heap". The exchange
>> heap is the approach taken by the Singularity OS [2] where they add
>> "software isolated processes" to the .NET VM and make it an operating
>> system.
>>
>> There are two approaches I currently foresee for adding process
>> constructs to LLVM:
>>
>> 1. Add process management instructions to the core instructions, and
>> modify the runtime to rebind the allocation routines depending on some
>> VM-level context that names which process is actually executing
>> (perhaps in thread-local storage).
>>
>> 2. Add instrinsics to launch an entirely new VM instance
>> (ExecutionEngine?) as if it were the process. This would involve
>> modifying the VM to accept allocation primitives as function pointers,
>> and potentially adding some scheduling awareness.
>>
>> At the moment, I'm not primarily interested in making LLVM itself a
>> secure VM, but I think that too might be possible, and suggests
>> possible future work.
>>
>> For instance, unsafe pointer operations can be made safe if the
>> casting operation from integer to pointer implements a dynamic check
>> that it's within the bounds of the heap. This is potentially an
>> expensive operation, but such casts only penalize heavily unsafe
>> programs, which should hopefully be rare. I believe LLVM programs that
>> do not use these casting instructions are inherently memory safe, so
>> they incur no such penalties (please correct me if I'm wrong). Using
>> this approach, LLVM could support the safe execution of unsafe
>> programs by running them in an isolated VM process.
>>
>> Alternately, one could actually launch the unsafe code in a completely
>> separate OS process with a new LLVM instance, and the VM-level IPC
>> instructions would transparently perform OS-level IPC to the separate
>> process. This maintains the isolation properties, with the full
>> execution speed (no need for dynamic heap bound checks), at the cost
>> of using slightly heavier OS processes.
>>
>> Any comments on the feasibility of this approach? I'm definitely not
>> familiar with the LLVM internals, and I wrote the above given only my
>> understanding from reading the LLVM reference manual.
>>
>> Sandro
>>
>> [1] except perhaps using some sort of region-based approach with
>> region inference, etc. I'm still reading the literature on this.
>> [2] http://research.microsoft.com/os/singularity/
>> [3] I realize that LLVM is unsafe in other ways, but I believe it
>> currently lacks even the base constructs necessary to even build a
>> secure VM on top of it.
>> [4] I can explain space banks and keepers concepts further, but just
>> think of them as stateful exception handlers specific to a process.
>> The concepts come from the KeyKOS/EROS and Coyotos secure operating
>> systems.
>>
>>     
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>