[LLVMdev] recursive nested functions

Fri Jun 25 14:20:56 PDT 2010

On Jun 25, 2010, at 1:46 PM, Nicolas Ojeda Bar wrote:

> Hello,
> 
> This is a little off-topic. But I am writing a compiler to llvm ir
> for a language that admits recursive nested functions and am stuck
> as to how to translate them. Concretely, I'm trying to lift them all
> to the topmost level and pass all their free variables explicitly as
> arguments. To do this, I have to determine all their free variables
> in their bodies. In particular when I come across a function call,
> I have to add _its_ free variables to the list of free variables of
> the enclosing function. You can see how this will not work if we
> allow recursive functions.

> 
> Any pointers would be greatly appreciated.
> 
> Thanks!
> N

In an earlier life, I built a commercial Pascal to C++ translator, called unimaginatively p2c, that was used by most of the top Mac application developers to port their Object Pascal programs to C++ when Apple forced them during the Motorola 68XXX to PowerPC transition. Since Object Pascal (like most Pascals) permitted nested function (or procedure) definitions and C++ does not, I had to address this very issue. 

Since we were translating commercial code, and our own database engine code, we needed the resulting C++ to be as close to the original as possible so it could be maintained. We even preserved comment positions and conditional compilation switches when possible.

If I recall, I handled the nested procedure/function issue by keeping track of the particular scope for each of the free variables used within inner functions during identifier resolution. I used a linked set of symbol tables and pushed a new symbol table for each new nested procedure. This made it easy to determine where the free variable was defined. If it was defined in an enclosing function, I marked the entry for that symbol to indicate that the symbol was required in a nested function.

I then created structs which held the free variables accessed by the nested functions and passed a pointer to the struct into each of the nested functions. I changed all uses of the variables to reference the struct members at the outer level, and then passed in a pointer to the struct into inner scoped functions. A double-nested function would have two struct pointers if it accessed free variables from both of its enclosing functions.  I then changed all the accesses to the free variables within the nested procedures/functions to reference the struct members through the pointer.

This ended up producing fairly efficient code that was pretty easy to understand when looking at the resulting C++ code.

- Curtis