[LLVMdev] Parametric polymorphism

Wed Feb 18 12:32:29 PST 2009

On Wed, Feb 18, 2009 at 12:32, DeLesley SpamBox
<delesley.spambox at googlemail.com> wrote:
>> I think the problem is deeper than that, in that LLVM has no official
>> concept of a subtype, so I don't see how the idea of polymorphism
>> could be defined in it.
>
> Parametric polymorphism is different from subtype polymorphism; you
> can have one without the other.  Parametric polymorphism just means
> that you can use type variables (like T) in the IR, which are later
> instantiated
> to actual types.
>

I was thinking of the "T extends Comparable" part, which does involve
subtype polymorphism.  Apologies if I'm getting terms mixed up.

>
> True, but I'm not worried about that; method tables are easy to add once type
> substitutions are allowed.  Dropping down to a mythical low-level language
> that I call template C, we would have:
>
>  struct ComparableDict<T> {
>    bool (*equals)(T a, T b);
>    bool (*leq)(T a, T b);
>    bool (*geq)(T a, T b);
>  }
>
>  T max<T>(ComparableDict<T>* dict, T a, T b) {
>    if ((*dict->geq)(a,b)) return a; else return b;
>  }
>

What do the parametrized types give you that you don't get from using
opaque instead of T?  Type checking has already passed by the time it
reaches this level, and you would simply add the appropriate bitcasts
when generating the functions to be added to the dictionaries.  You
could even safely bitcast integral and floating-point values to pass
them through the opaque to avoid boxing.

>
> In order to get good performance, you will need to instantiate max
> with both a type
> T, and the dictionary for T.  In other words, max must be partially
> evaluated with
> respect to a given type -and- a given dictionary.  While this is
> obviously a complication,
> partial evaluation with respect to constant arguments is something that llvm is
> already quite capable of.  If there is not a pass that does it
> already, I could easily write
> one.  The problem is that I have no way of declaring a parameterized type or a
> parameterized function within the llvm IR.
>

I wonder if the "instantiation" could be done instead as a normal pass
taking advantage of a general call-site-context-sensitive
inter-procedural points-to analysis.  Getting rid of the indirection
in the generic dictionary seems like the same problem as
devirtualizing method calls, so a unified solution would be nice.

On Wed, Feb 18, 2009 at 14:53, DeLesley Hutchins
<delesley.spambox at googlemail.com> wrote:
>
> Moreover, specialization should really be done at the codegen level
> in order to do it properly.  C++ templates are a great example of
> why *NOT* to do specialization within the high-level language.
>

But specialization (in the C++ template sense) is also a great example
of why it's needed in the host language, as is overloading.

>
> The generated code for getSecond() needs to know the size of T in
> order to calculate the correct offset into the record.  However,
> we still don't need to generate a bazillion specialized copies;
> we can simply pass the size of T as an argument:
>
>  void getSecond(int sizeT, void* pair, void* result) {
>    return memcpy(result, ((char*) pair) + sizeT, sizeT);
>  }
>

Of course, that only works for POD types, so you still need a
different instantiation for std::string, std::vector, ...

There's no possibility of implementing C++'s complicated lookup and
resolution rules down at the LLVM level, and since you need the
instantiations just to parse C++, using it as an example for why LLVM
should do instantiation seems flawed.

Once you restrict yourself to generics, you're only ever passing
pointers around, which, as you said, is the easy case (relatively),
since you don't need the type information at all once past the
front-end.

Thanks for putting up with the newbie,
~ Scott