[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation

Wed May 9 02:12:26 PDT 2012

On 05/08/2012 09:35 PM, Evan Cheng wrote:
>
> On May 8, 2012, at 2:08 AM, Tobias Grosser wrote:
>
>> On 05/08/2012 05:13 AM, Evan Cheng wrote:
>>> Sorry Tobias, I'm not in favor of this change. From what I can tell, this enables some features which can implemented via other means. It adds all kinds of complexity to LLVM and I'm also highly concerned about bitcode that can embed illegal (or worse malicious) code using this feature.
>>
>> Hi Evan,
>>
>> there is no need to force this change in. I am rather trying to understand the shortcomings of my approach and look for possible better solutions.
>
> Hi Tobias,
>
> When you are proposing a significant extension to LLVM, the burden is on the person who is proposing the change to convince folks there is a significant advantage to LLVM developers / users who relay on LLVM mainline.

Hi Evan,

thanks for replying. I understand that the burden is on me. I have no 
intentions on pushing a patch to LLVM where people strongly disagree.

Still, to be able to propose something more sensible we need to get 
feedback pointing out the problems of this patch and what would be a 
better solution. We did not yet get this kind of feedback yet, probably 
because we did not not describe the patch and our requirements well enough.

>> That's why I was asking you where you see the possibility of illegal/malicious code? You did not really explain it yet and I would
>> be more than happy to be understand such a problem. From my point of view embedded and host module code are both compiled at the same time and are both checked by the LLVM bitcode verifier. How could this introduce any malicious code, that could not be introduced by normal LLVM-IR?
>
> You're adding a feature that embed code inside a module. When the module is loaded, is the string going to be verified? How are users of LLVM IR able to ensure the embedded string is safe? I am not saying it cannot be done. This feature just increases the risk and that again raises the bar for acceptance.

What do you mean by verified? How is normal LLVM-IR verified?
What do you mean by ensuring an embedded string is safe? How do you 
ensure normal LLVM-IR is safe?

The only existing kind of verification I am aware of is the '-verify' 
pass that checks an LLVM-IR module. This pass is run over the embedded 
module at the same time as target code is generated for the host 
function. In case the verification fails, no target code is generated 
and an empty string is returned. In case target code is generated it is
stored back in memory. It can obviously be executed through a function
pointer, but this is not different than executing code that is stored 
through other means in memory.

I am kind of surprised security is a concern here. If we really want to 
do a proper risk analysis, we should first define the security 
guarantees LLVM gives. I am kind of surprised such security guarantees 
exist. To me securely verifying LLVM-IR is difficult for other reasons 
than this intrinsic. Google PNaCL does, for good reasons, not rely on 
LLVM to provide security guarantees.

Still, if this is a concern we could make this intrinsic a target option 
that is disabled by default.

>> In terms of the complexity. The only alternative proposal I have heard of was making LLVM-IR multi module aware or adding multi-module support to all LLVM-IR tools. Both of these changes are way more complex than the codegen intrinsic. Actually, they are soo complex that I doubt that they can be implemented any time soon. What is the simpler approach you are talking about?
>
> We don't need multi-module either. The system you are designing should be able to handle multiple bitcode files with multiple modules. I don't claim to know the specifics of your projects. But it seems to be you want this new complexity to LLVM to simplify your tools (single .o rather than multiple). Given how specific your need is, it's just not appropriate for LLVM mainline.

I can follow this argument. We do not want to include project specific 
features in LLVM, if such features are not needed by a broader audience. 
Such features should rather be implemented outside of LLVM.
This worked especially well, as LLVM provides features to make such 
external implementations possible and in some rare cases (calling 
conventions) it includes project specific patches to allow such projects
to use a vanilla LLVM installation.

This intrinsic was proposed in the very same light. I think it would be 
nice to _significantly_ facilitate the development of optimizers that 
target GPGPU accelerators. Yabin's project would use this, but I am 
convinced a wider audience could use this.

A design that handles multiple bitcode files does not seem like a good 
option. It would require large changes to all projects that want to use
such an optimizers, does not work with a vanilla clang installation and 
causes further problems with jit compilation.

It seems we were not able to convince enough people that such an 
extension is useful. I think this was partially because we did not 
explain our project and the codegen intrinsic well enough and also 
because there is no actual use case yet. For now, further discussions 
seem pointless. Thanks for your comments!

Cheers
Tobi