[LLVMdev] Re: gcc like attributes and annotations

Wed Mar 1 11:36:00 PST 2006

> thanks for your reply.

Sorry for the delay, I've been buried in email lately.

>>> When translating a complex c application to llvm bytecodes, some
>>> semantics are lost:
>>>
>> LLVM 1.6 and the "new front-end" already handle this right.  Here's the
>> bugzilla bug corresponding to it:
>> http://llvm.cs.uiuc.edu/bugs/show_bug.cgi?id=659
>
> Great! The bug information is rather scarce. I would be interested, how
> you implemented it. Did you add another bytecode entry for the section
> value mapping? Is it possible to add attributes to other elements like
> functions as well?

Yes, it was added to the .ll/.bc formats:
http://llvm.cs.uiuc.edu/docs/LangRef.html#globalvars
http://llvm.cs.uiuc.edu/docs/BytecodeFormat.html#globalinfo

> Did you think about a mapping of common attributes on different
> platforms. For instance DLLMain Entry point under Win32 and the
> __attribute__((constructor)) under Linux.

__attribute__((constructor)) is handled with a the llvm.globalctors global 
variable (even with llvm 1.6), try it out.

>>> I would generally be interested and could contribute to extending LLVM
>>> by allowing more Annotatinons than currently possible.

> Okay so I am on quite the opposite attitude than the LLVM team towards
> that issue :-)

I don't follow.

>> At one point in time, Value was annotatable.  The problem with this was
>> two fold:
>>
>> 1. This bloat every value in the system, by adding an extra pointer.
>> 2. These annotations would get stale and not be updated correctly.
>>
>> The problem is basically that adding annotations really amounts to
>> extending the LLVM IR, and making it look like something simple doesn't
>> make it easier to deal with.  For example, if you add an "I'm special"
>> attribute to an instruction, then the function is cloned by some pass,
>> is that attribute copied or not?  What if it is deleted, moved,
>> rearranged, etc?  Further, how can annotations be serialized to .ll
>> files and .bc files?  In llvm, we always want "opt -pass1 -pass2" to be
>> the same as "opt -pass1 | opt -pass2", which would break if annotations
>> can't be serialized (which they can't currently).
>>
>
> I get you 100 % here. But as you say later in the mail, many information
> is done by some runtime std::map<Value*,foo> stuff. Which is really
> handy at runtime, but I *had* serialization in mind when I was thinking
> about Annotations.

Okay, if you want to serialize/deserialize, they become much more 
palatable, the implementation just gets stickier.

> I see annotations as a way to serialize some extra
> information with the bytecode without having to extend/change the core
> classes. The best way to implemented in runtime is to use some kind of
> std::map subscripting, plus the additional benefit that you can
> serialize it to the bytecode. Perhaps the best of both worlds.

That's fine, but don't think that makes them solve all of the problems. 
Again, there is still the updating issue.

> Two things here:
> (1) Annotations should not be something which really changes the meaning
> of a Value/Type. All the passes should work without the annotation.

Okay, what use are they then?

Note that source language types are not unique in LLVM, and they shouldn't 
be even with annotations.  For example:

struct X { int A; };
struct Y { int B; };

Both X and Y map to the same LLVM Type.  This cannot change.

> (2) I think annotations are a handy way to augment the bytecode without
> changing the bytecode format. It gives people the freedom to add some
> extra information. This is also interesting since changing the
> bytecode/adding fields to Value/... is often not a real option since one
> wants to work with production core libraries. (like I do now).

Okay.

> Perhaps the thing could be solved by adding policy statemetns to
> annotations. I could imagine the inventor of an Annotation should think
> about how the annotation should behave during optimisation/change. So
> the anntation should have a policy field which defaults to DontCare. In
> that case the user of the Annotation cannot be sure that it will get
> retained or something like that.

Personally, I see annotations as a convenient way to do experiments and 
allow rapid development.  If we decide that a feature makes sense in the 
LLVM IR long term, it should be added as a first class feature of it.

>>> %struct.A = type { int }
>>> %struct.B = type { int }
>>>
>>> BTW: How would one generate a type alias like the above through the LLVM
>>> API?
>>
>>
>> Add two entries to the module symbol table for the same Type using
>> Module::addTypeName.
>
> Very interesting. I then have to take the type by calling
> Module::getTypeByName to have a second Type pointer or?

Again, see above, there is no way to distinguish between two source level 
types that have the same structure.

> since I saw the llvm-gcc generates code like:
>
> %pa = alloca %struct.A
> %pb = alloca %struct.B
>
> this means that the AllocaInst must have knowledge about two types which
> can only be so by having two different pointers? right?

This is an implementation detail of the old llvm-gcc that breaks with the 
new one.  Do not depend on it.

>>> But sometimes it would be interesting to actually get symbol information
>>> about a type beeing used, without the need for full featured debug
>>> information ala DWARF.

>> This isn't something you can do, this is far more tricky than you make
>> it out to be. :)

> Hehe, I know. Certainly if someone like you says that. But *if* the
> front end is aware of the annotation, which would be doable, and the
> annations are serializable in bytecode, then one would have this
> information during LLVM bytecode processing as well.

Yes.  However, there would be no way to keep isomorphic LLVM types 
separate.  This dramatically limits the usefulness of what you're trying 
to do.

> One could also emit the symbolic information like Relocation 
> informations in a .section or as I am currently working with the JIT - 
> use the JITs information about the annotations to get the symbolic 
> information.

I don't understand.

>>> This could be also solved by introducing Annotable.
>>> For instance if the alloca/malloc/.. instruction would get an Anntation
>>> about a symbolic type which could look like:
>>>
>>> { ("x",int) }
>>>
>>> One could use the DEF/USE and operand information in the byte code to
>>> know which symbolic field was accessed for instance through
>>> getelementptr.
>>
>>
>> Again, this is effectively extending the LLVM IR.  Calling it an
>> 'annotation' doesn't make it simpler. :)  Also, the front-end would have
>> to be modified to generate the annotation.
>
> See above. Every information must be understood in order to be usable.
> But I would do it as an annotation since it is just additional meta
> information and the program would perfectly run without the information.

Hopefully I made the issue more clear above.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/