[LLVMdev] Parametric polymorphism

Tue Feb 17 09:26:13 PST 2009

I'm a newcomer to llvm, but what you've done so far is very impressive.
Llvm is a godsend to anybody who is attempting to implement their own
their own language.  :-)  My company is considering using llvm as the
backend for a small matlab-like language for scientific computation; our
other option is MSIL.

After reading through the documentation, I noticed that llvm seems to
have one major limitation -- the lack of parametric polymorphism.  I would
like to compile code such as the following:

max <T extends Comparable>(T a, T b) {
  if (a > b) return a; else return b;
}

There are, of course, various ways to implement the above code.  I could
compile the above function to a fully generic version with boxed arguments,
but that is very slow for scalar types.  I could also take the C++ template
route, and generate different IR code for every type instantiation.  However,
I have spent way too much time fighting with templates and code bloat to
like that idea.

I believe that type instantiation should ideally be handled by llvm, rather
than the high-level language.  First of all, there are a lot of
optimization passes
that are type-invariant; it would be nice to be able to partially
optimize the code
before instantiation.  Second, type substitution is very similar to many of the
other optimizations that llvm already does, such as inlining, constant
propagation, and so on.  And third, I am planning to use llvm in JIT mode, and
it just makes more sense (to me)  to instantiate such functions on demand, at
run-time.

Are there any plans to add such capability to llvm?  I tried looking through the
list archives for any discussion, but the archives are not searchable (or I have
not figured out how to search them) so I didn't find much; feel free
to point me
to the proper place.

How difficult would it be to add such a capability to llvm?  I was thinking of
marking type variables like T as opaque types for the initial codegen, and then
writing a custom pass that instantiates them to real types.  However, I don't
know if that would confuse or break other parts of the compiler infrastructure;
parametric polymorphism is not necessarily a trivial modification.

My personal background is in type theory; I received my doctorate from the
functional programming group at the University of Edinburgh.  I love the fact
that llvm uses a typed assembly language, but the actual type system that
is currently used is pretty limited; it seems to be mostly a copy of C.

I know that compatibility with C is very important to llvm for obvious reasons,
but IMHO, the single biggest problem in making different languages talk to
one another is the type system.  I'm not a fan of Microsoft's common type
system because it's far too OOP-centric, but the basic idea is a good one.
I think llvm would really benefit from having a much stronger, but
still low-level
and language-neutral type theory; it would enable cross-platform multi-language
libraries to be developed in any language, and then distributed as llvm IR.
(The other major necessity is an accurate and high-performance garbage
collector, but the intrinsics for that are already in place.)

Is anyone on this list familiar with System F, System F_sub, or System
F^\omega_sub?  They comprise the basic, standard theories of parametric
polymorphism used in the academic world, and have been around for about
20 years.  You can obviously get more sophisticated, but the System-F series
of calculi have the advantage that they are simple, well-known, off-the-shelf
solutions.  Pick one, plug it into llvm, and you have a type system that can
compete with the JVM or .NET in terms of functionality, without being OOP
centric or sacrificing language neutrality.  (System F is low-level --
OOP can be
easily implemented on top of it).

  -DeLesley