[LLVMdev] Parametric polymorphism

Wed Feb 18 06:53:13 PST 2009

On Tuesday 17 February 2009 17:26:13 DeLesley SpamBox wrote:
> I'm a newcomer to llvm, but what you've done so far is very impressive.
> Llvm is a godsend to anybody who is attempting to implement their own
> their own language.  :-)  My company is considering using llvm as the
> backend for a small matlab-like language for scientific computation;

Very interesting. My company is prototyping a high-level VM built upon LLVM 
that is also designed for scientific computing.

> our other option is MSIL.

We decided not to target MSIL because it will be impossible to compete with 
Microsoft's F#. Building upon LLVM offers *much* better performance and 
platform independence. If the start of this year has been anything to go by, 
LLVM also has much better commercial potential and I believe we could be 
earning more revenue from it than from .NET in 12 months time.

> After reading through the documentation, I noticed that llvm seems to
> have one major limitation -- the lack of parametric polymorphism.  I would
> like to compile code such as the following:
>
> max <T extends Comparable>(T a, T b) {
>   if (a > b) return a; else return b;
> }
>
> There are, of course, various ways to implement the above code.  I could
> compile the above function to a fully generic version with boxed arguments,
> but that is very slow for scalar types.  I could also take the C++ template
> route, and generate different IR code for every type instantiation. 
> However, I have spent way too much time fighting with templates and code
> bloat to like that idea.

I am using the latter approach and it is easy to implement and works well so 
far, although our test base in tiny so bloat is not a problem.

> I believe that type instantiation should ideally be handled by llvm, rather
> than the high-level language.  First of all, there are a lot of
> optimization passes
> that are type-invariant; it would be nice to be able to partially
> optimize the code
> before instantiation.  Second, type substitution is very similar to many of
> the other optimizations that llvm already does, such as inlining, constant
> propagation, and so on.  And third, I am planning to use llvm in JIT mode,
> and it just makes more sense (to me)  to instantiate such functions on
> demand, at run-time.

Excellent idea.

> Are there any plans to add such capability to llvm?

I do not believe so.

> How difficult would it be to add such a capability to llvm?  I was thinking
> of marking type variables like T as opaque types for the initial codegen,
> and then writing a custom pass that instantiates them to real types. 
> However, I don't know if that would confuse or break other parts of the
> compiler infrastructure; parametric polymorphism is not necessarily a
> trivial modification.

What complications do you forsee?

> My personal background is in type theory; I received my doctorate from the
> functional programming group at the University of Edinburgh.  I love the
> fact that llvm uses a typed assembly language, but the actual type system
> that is currently used is pretty limited; it seems to be mostly a copy of
> C.

Yes. LLVM has augmented the functionality required for C with some low-level 
but critical features like tail call elimination but it does not implement 
any of the higher-level features you may have expected. After all, it is the 
LLvm. ;-)

LLVM is ideal for building HLVMs and CLRs though.

> I know that compatibility with C is very important to llvm for obvious
> reasons, but IMHO, the single biggest problem in making different languages
> talk to one another is the type system.  I'm not a fan of Microsoft's
> common type system because it's far too OOP-centric, but the basic idea is
> a good one.

Absolutely.

> I think llvm would really benefit from having a much stronger, 
> but still low-level
> and language-neutral type theory; it would enable cross-platform
> multi-language libraries to be developed in any language, and then
> distributed as llvm IR.

The prospect of higher-level VMs is certainly the single most compelling 
aspect of LLVM for me (and many other people, I believe) but I am not 
convinced that such functionality deserves a place in LLVM itself.

You can implement parametric polymorphism easily on top of LLVM today. LLVM's 
optimizations will be run (probably redundantly) on multiple instantiations 
but the HLVM will also be rewriting code, e.g. to interoperate with the GC 
and to perform optimizations like instantiating higher-order functions for a 
given function argument.

Rather than trying to push high-level type system features into the LLVM I 
would opt for an HLVM that provided not only a decent type system but also 
garbage collection, reflection and any other high-level features of interest. 
I would also prioritize getting HLVM 1.0 shipped over the technical aspects 
of integration otherwise you are likely to end up with nothing.

> (The other major necessity is an accurate and 
> high-performance garbage collector, but the intrinsics for that are already
> in place.)

I heard that LLVM's GC API is immature and largely untested so I chose to 
assume an uncooperative environment instead.

> Is anyone on this list familiar with System F, System F_sub, or System
> F^\omega_sub?

I am only vaguely aware of System F because I use MLs extensively and HM is a 
relative.

> They comprise the basic, standard theories of parametric 
> polymorphism used in the academic world, and have been around for about
> 20 years.  You can obviously get more sophisticated, but the System-F
> series of calculi have the advantage that they are simple, well-known,
> off-the-shelf solutions.  Pick one, plug it into llvm, and you have a type
> system that can compete with the JVM or .NET in terms of functionality,
> without being OOP centric or sacrificing language neutrality.  (System F is
> low-level -- OOP can be
> easily implemented on top of it).

Sounds like a dream come true. Where's the catch? ;-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e