[LLVMdev] Extending LLVM for high-level types

Thu Jan 13 00:46:14 PST 2011

Alexandre Cossette wrote:
> Hi all,
>
> I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system!
>
> I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR.

Absolutely not.

In short, LLVM is its own language. You don't need to extend LLVM IR to 
support your programming language any more than you need to extend x86 
processors to support it.

There's the burden of having that support. For starters LLVM's types are 
purely based on the storage that they back. Most languages use type to 
provide static program safety, or possibly semantics (ie., + means 
string concat on a string but addition on integers). LLVM doesn't do 
that. Further our types are uniqued such that any two types with the 
same in-memory representation have the same LLVM type; we don't discard 
names, but we don't preserve a distinction because there isn't any 
distinction to preserve. That in turn allows us to do fast structural 
comparisons using a pointer comparison.

Then we'd have to extend core passes like mem2reg, gvn, and all of their 
dependencies. These are performance critical pieces of kit, and we 
categorically reject any attempt to push in pieces of infrastructure 
that won't be needed by all users. Put another way, if I want to use 
LLVM for C code on a cell phone, I shouldn't need to pay the 
memory/execution-time price for your LLVM changes to support C³.

Finally, you haven't detailed what benefit you expect out of your 
proposal. Why can't you just lower to the existing IR and get the same 
optimizations out of it? What optimizations aren't possible and why? Can 
we tackle those issues instead? We've gotten very far by designing 
extensions to LLVM which are language-agnostic and can be used by any 
client. For example, if your language has alias analysis optimizations 
that rely on high-level type information, LLVM has a TBAA (type based 
aliasing analysis) design that you could employ to give LLVM the 
additional information it needs to optimize with.

Sorry to sound so negative, but I'm confident that LLVM can provide you 
with the same generated code quality in the same execution time, only 
through a different design than you propose. If you can show us missed 
optimizations (or bad compile time problems) when using the naive 
approach of lowering your high-level types to llvm's low-level types, 
please let us know so we can solve them case-by-case!

Nick

C³ is designed to support what I call "value-oriented programming" and 
it fits naturally with the design of LLVM. The idea is to apply 
SSA-based optimizations on user-defined types.
>
> I would like to know if you think this plan makes sense:
> - Add a new derived type that is uniqued by name for C³ types
> - Add new intrinsic functions for C³ expressions with special semantics
> - Emit this "extended LLVM" from my abstract syntax tree
> - Run the mem2reg pass as is for SSA construction
> - Run optimization passes that can run as is with the new type (like GVN?)
> - Run a new pass that lowers the extended LLVM to normal LLVM
> - Run (or rerun) normal LLVM optimization passes
> - Emit native code using normal LLVM
> - Profit!
>
> Alex
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>