[LLVMdev] MachineOperand: GlobalAddress vs. ExternalSymbol

Sat Jun 19 06:41:01 PDT 2004

Chris Lattner wrote:

> > And why isn't it possible to just make those functions known to LLVM?
> > After all, *I think*, if this function is to be called, it should be
> > declared in assembler, and so you have to pass some information abou
> > those function to the code printer. (Of course, it's possible to just
> > directly print the declarations, but that's scary).
>
> If you wanted to do that, it would be fine.  Be aware that the code
> generators are set up as function passes though, so you would have to
> insert all function prototypes in the doInitialization(...) method of the
> function pass: you can't just do it on the fly from runOn*Function.

Yes, I understand that.

> The real reason that we aren't doing this currently is that we don't want
> code generators to be hacking on the LLVM module.  This greatly interferes
> with JIT-style multi-pass optimization and other things.  Unfortunately,
> we are a long way from this though, as the lowering passes hack on the
> LLVM and other stuff does as well.  Unless you have a good reason to do
> so, I would suggest trying to use MO_ExternalFunction just to make future
> refactoring easier.

I think I more or less understand this motivation.

> > There's another issue I don't understand. The module consists of
> > functions and constants. I'd expect that external function declarations
> > are also constants, with appropriate type. However, it seems they are not
> > included in [Module::gbegin(), Module::gend()], insteads, they a Function
> > objects with isExternal set to true.
>
> Module::gbegin/gend iterate over the global variables, and ::begin/end
> iterate over the functions, some of which may be prototypes.  Function
> prototypes aren't really any more "constant" than other functions are.

I disagree. Say there's declaration of external function "printf". Then it's 
just a constant global address. In assembler it will be

   extern printf: label;

which is not that different from assembler for other constants. For example, 
for external data reference I have to produce the same assembler.

BTW, there's inconsistency in how X86 backend handles constants and functions. 
Consider:

%.str_1 = constant [11 x sbyte] c"'%c' '%c'\0A\00"

implementation   ; Functions:

declare int %printf(sbyte*, ...)

int %main() {
entry:
        %tmp.0.i = call int (sbyte*, ...)*
        %printf( sbyte* getelementptr ([11 x sbyte]*  %.str_1, long 0, l
        ret int 0
}

The assembler produces by X86 backend is:

        call printf
........
        .globl _2E_str_1
        .data
        .align 1
        .type _2E_str_1, at object
        .size _2E_str_1,11
_2E_str_1:

That is, the name of "str1" is mangled, but the name of function is not. I 
don't see the reasons for different handling of those two kinds of names.

> > To me this seems a bit confusing -- it would be clearer if there we plain
> > functions with bodies and everything else were GlobalValue.
>
> The reason that we don't want to do this is that it makes it more
> difficult to create a function and then fill in its body.  Currently when
> you create a function, you get a prototype.  When you fill in its body,
> you now have a defined function.  In your scheme, the function prototype
> and defined function objects would be different: to go from one to the
> other, you would have to delete the object and reallocate it.

Can't  you store all functions in the list of global values? That would be 
quite clear: all top-level module elements are global values, and a present 
in the global list.

The functons list can contains either both functions with bodies or without, 
or only with bodies. In the latter case, when you create function, it's added 
only to global values list. When you add the first basic block, it's also 
added to the list of functions.

> > Anyther question is about SymbolTable. Is it true that it's a mapping
> > from name to objects in Module, and than all objects accessible via
> > SymbolsTable are either in the list of functions or in the list of global
> > values?
>
> Yup.  There are also function-local symbol tables as well.
>
> I wouldn't recommend depending too much on the names, because LLVM has a
> unusual mechanism where it allows objects with different types to have
> the same name.  This means you can have:
>
> int %foo(int %X) { ret int %X }
> float %foo(float %X) { ret float %X }
>
> In the context of a code generator, you should use the NameMangler
> interface to make everything just work.
>
> If you're doing something else and think you need the symbol table, please
> let me know.  Clients of the SymbolTable class are extremely rare (by
> design).  The SymbolTable class is mostly an internal class that is
> automagically used by the system to provide naming invariants and allow
> efficient lookup for the rare clients that need it.

Thanks for explanation. I don't have a use of SymbolTable yet, I was just 
wondering if I have to use it for something ;-)

>
> -Chris