[LLVMdev] Hooking the global symbol resolver

Thu Mar 27 09:21:17 PDT 2008

On Thu, 2008-03-27 at 07:44 -0400, Gordon Henriksen wrote:
> In the context of a static compiler, I would recommend that you
> implement your own “on the side” symbol table in order to track this
> state and perform on-demand instantiation as required. It is
> worthwhile to consider the LLVM module to be a passive output sink,
> not an active object.

I think I understand what you are saying, but let me delve into this a
little further. My main point is that the technique we are after is
generally useful for languages having well-integrated template
mechanisms. The approach you advocate may still be the best approach,
but I'ld like to make sure that we are having the same conversation.

Let me illustrate the problem concretely.

Consider the BitC primitive definition of lists:

  (defunion (list 'a)
    (cons 'a (list 'a))
    nil)

Modulo surface syntax, this looks on first inspection to be exactly like
the corresponding definitions in ML or Haskell. But in ML or Haskell
there is only one run-time "expansion" of this type, because all of the
possible element types occupy exactly one fundamental unit of storage.
This is not true for BitC types. On a 32-bit machine having suitable
alignment, the above definition guarantees that variables and bindings
of type

  (list int64)

are pointers to boxes whose first element is 64 bits (the int64) and
whose second element is 32 bits (the next pointer). That is: the type
must be instantiated with different concrete representations for
different uses. Since we had to do this anyway, we use the same
technique to resolve type classes, thereby eliminating the need for
dictionary pointers. The BitC procedure:

  (define (add-one x) (+ x 1))

has the type:

  (forall (Arith 'a) (fn ('a) 'a))

which (given our instantiation approach) is best imagined to be a
template-like construct -- albeit one that is fully checked for
consistency and whose expansion is known not to fail (barring memory
exhaustion within the compiler).

When a client program uses a dynamic library providing these sorts of
constructs, we have two choices:

  1. Generate them statically at static link time. This works,
     but it is not robust across dynamic library version changes
     (which is an endemic problem in languages that support both
     templates and dynamic linking).

  2. Generate them dynamically at load time (or on first call).
     This is what we want to do.

That is: we want to use a continuous compilation strategy, which is
precisely what LLVM is supposedly attempting to achieve.

If we adopt the approach that you suggest, we will end up implementing
our own "generate on demand" infrastructure that has to operate in
collusion with the dynamic loader. We know how to do that, but it is a
moderately dicey business. Basically we have to run a pre-resolver
before we emit code to an LLVM module, after which LLVM will run a
second resolver. I certainly agree that this will work.

But I didn't have in mind originally to view Modules as active things.
It was not my intention to "extend" a module in mid compile. Rather, it
was my thought that we could provide the unresolved symbol by emitting
and compiling a second module. Where the LLVM infrastructure currently
has something like this:

  if (!(sym = llvm_resolve_global(GlobalScope, symName)))
    some_failure_action();

it would now look something like:

  sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym && frontend_has_symbol_generator
      && frontend_generate_symbol(symname))
    // Note: if frontend_generate_symbol() has succeeded, it will have
    // constructed some LLVM Module and called the LLVM compiler to
    // admit it, with the consequence that GlobalScope will have been
    // updated to contain a binding for the desired symbol.
    sym = llvm_resolve_global(GlobalScope, symName);
  if (!sym)
    some_failure_action();

I don't think that it's any more complicated than that. This is
basically the test that our static polyinstantiator runs right now.

Note: I'm still not claiming that this is a good approach in the context
of LLVM. I don't have my head wrapped around LLVM enough to have an
opinion about that. I simply wanted to make sure that the question was
clearly framed.

I do, provisionally, think that this particular hook is consistent with
the notion of continuous compilation. It seems (to me) necessary (and
perhaps even sufficient) to let the front end participate in the
continuousness of continuous compilation.

shap