[PATCH] Expose custom MC-JIT memory allocation through the C API

Thu May 16 16:36:30 PDT 2013

On May 16, 2013, at 3:20 PM, Sean Silva <silvas at purdue.edu> wrote:

> On Thu, May 16, 2013 at 3:10 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> AFAIK, this is no less robust than an opaque struct. Both handle added functions gracefully, and neither can handle removed functions gracefully unless we do something crazy. The un-opaque struct just makes writing the code a bit easier, both for LLVM and for the client. But that's just my opinion. :-)
> 
> Unfortunately, it can break in the presence of shared libraries:
> 
> shared library A, compiled against LLVM API v1:
> void runMCJIT(void) {
>   LLVMMCJITMemoryManagerFunctions MCJMM; // v1
>   setupMCJMM(&MCJMM);
> }
> 
> shared library B, recently upgraded, compiled against LLVM API v2:
> void setupMCJMM(LLVMMCJITMemoryManagerFunctions *p) {
>   p->FunctionAddedInV2 = ...; // BOOM, overwriting caller's stack
> }
> 
> Avoiding this kind of issue would force the API clients to replicate the brittle sizeof checking in their own code.

Why would you do this?  I.e., why would you ever want to have initialization of the MCJIT memory management functions span multiple shared libraries?

Loads of C APIs are based on C structs, and use the idiom that I used.  BerkeleyDB does this.  A bunch of libc APIs do this.  Of course when writing C code, if you #include a header that has a struct, and you pass a pointer to that struct to a separately compiled shared library that may have seen a different version of the header, you have to take precautions.  I'm not sure if this concern should drive the design here, particularly when it makes no sense to have two different shared libraries cooperatively setting up a struct of callbacks for the MCJIT.

> 
> Slightly related: Have you considered the security implications of using LLVM in the browser? For example, AFAIK most LLVM code is not written to properly handle OOM conditions and may be a vector for security issues. See <http://comments.gmane.org/gmane.comp.compilers.clang.devel/19981>, which is about clang, but I think also extends to LLVM itself. tl;dr,  Doug Gregor says: "a safety-critical application should not be using Clang in-process".

"Safety-critical" is a term of art that refers to things where failure could cause a person to die.  Nobody will die if the browser's WebContent process crashes due to an unhandled OOM resulting from a JavaScript program that had a large hot function, which LLVM couldn't compile in the available memory.

Of course we care about stability, security, and compliance; but WebKit already pretends that memory allocations succeed, and will crash if they don't.  That's not a security vulnerability.  It's a pragmatic choice: WebKit has to already do so many crazy things that making it safety-critical would be a gargantuan task, and is probably beyond the scope of what any browser vendor could do today.

> Since LLVM is being used to compile code that interacts with the page, sandboxing LLVM will not necessarily be enough protection since an exploit could potentially hijack the compilation and return malicious code.

True.  This is a good point.  My own view on this is that LLVM is already hardened enough - in terms of codegen soundness - that it won't be the longest pole in the tent.

-Filip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130516/3e961cbc/attachment.html>