[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Wed May 9 14:15:52 PDT 2012

On May 9, 2012, at 2:12 AM, Tobias Grosser wrote:

> 
>>> That's why I was asking you where you see the possibility of illegal/malicious code? You did not really explain it yet and I would
>>> be more than happy to be understand such a problem. From my point of view embedded and host module code are both compiled at the same time and are both checked by the LLVM bitcode verifier. How could this introduce any malicious code, that could not be introduced by normal LLVM-IR?
>> 
>> You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance.
> 
> What do you mean by verified? How is normal LLVM-IR verified?
> What do you mean by ensuring an embedded string is safe? How do you ensure normal LLVM-IR is safe?
> 
> The only existing kind of verification I am aware of is the '-verify' pass that checks an LLVM-IR module. This pass is run over the embedded module at the same time as target code is generated for the host function. In case the verification fails, no target code is generated and an empty string is returned. In case target code is generated it is
> stored back in memory. It can obviously be executed through a function
> pointer, but this is not different than executing code that is stored through other means in memory.
> 
> I am kind of surprised security is a concern here. If we really want to do a proper risk analysis, we should first define the security guarantees LLVM gives. I am kind of surprised such security guarantees exist. To me securely verifying LLVM-IR is difficult for other reasons than this intrinsic. Google PNaCL does, for good reasons, not rely on LLVM to provide security guarantees.
> 
> Still, if this is a concern we could make this intrinsic a target option that is disabled by default.

You are missing the point. Don't think in turns of existing implementations. Don't think in turns of clang or other static compilers. There are plenty of systems which use LLVM out there. There can be plenty of different ways to verify / check LLVM IR. We don't know about them.

A LLVM bitcode module as it is today is a representation of some program. It has semantics that are clearly defined by its instructions and data. Now you want to embed some other programs in strings. That makes the IR inherently harder to understand, it's more risky by definition. Of course systems which use LLVM can solve this problem. But it's a big fundamental change and I (and other people on this thread) has pointed out the benefits are just not worth it.

Evan