[LLVMdev] Re: gcc like attributes and annotations

Fri Feb 24 10:56:52 PST 2006

hi Chris!

thanks for your reply.
First of all I did not know about the history with the Annotation stuff.
 Annotable for me was a way how one could realize this things. So as I
see it right now - it is more that Annotable will completly vanish soon.
This is interesting to me.

Chris Lattner schrieb:
> On Fri, 24 Feb 2006, Jakob Praher wrote:

> 
>> When translating a complex c application to llvm bytecodes, some
>> semantics are lost:
>>
> LLVM 1.6 and the "new front-end" already handle this right.  Here's the
> bugzilla bug corresponding to it:
> http://llvm.cs.uiuc.edu/bugs/show_bug.cgi?id=659

Great! The bug information is rather scarce. I would be interested, how
you implemented it. Did you add another bytecode entry for the section
value mapping? Is it possible to add attributes to other elements like
functions as well?

Did you think about a mapping of common attributes on different
platforms. For instance DLLMain Entry point under Win32 and the
__attribute__((constructor)) under Linux.

> 
>>
>> I would generally be interested and could contribute to extending LLVM
>> by allowing more Annotatinons than currently possible.
Okay so I am on quite the opposite attitude than the LLVM team towards
that issue :-)

> 
> At one point in time, Value was annotatable.  The problem with this was
> two fold:
> 
> 1. This bloat every value in the system, by adding an extra pointer.
> 2. These annotations would get stale and not be updated correctly.
>
> The problem is basically that adding annotations really amounts to
> extending the LLVM IR, and making it look like something simple doesn't
> make it easier to deal with.  For example, if you add an "I'm special"
> attribute to an instruction, then the function is cloned by some pass,
> is that attribute copied or not?  What if it is deleted, moved,
> rearranged, etc?  Further, how can annotations be serialized to .ll
> files and .bc files?  In llvm, we always want "opt -pass1 -pass2" to be
> the same as "opt -pass1 | opt -pass2", which would break if annotations
> can't be serialized (which they can't currently).
>

I get you 100 % here. But as you say later in the mail, many information
is done by some runtime std::map<Value*,foo> stuff. Which is really
handy at runtime, but I *had* serialization in mind when I was thinking
about Annotations. I see annotations as a way to serialize some extra
information with the bytecode without having to extend/change the core
classes. The best way to implemented in runtime is to use some kind of
std::map subscripting, plus the additional benefit that you can
serialize it to the bytecode. Perhaps the best of both worlds.

Two things here:
(1) Annotations should not be something which really changes the meaning
of a Value/Type. All the passes should work without the annotation.

(2) I think annotations are a handy way to augment the bytecode without
changing the bytecode format. It gives people the freedom to add some
extra information. This is also interesting since changing the
bytecode/adding fields to Value/... is often not a real option since one
wants to work with production core libraries. (like I do now).

Perhaps the thing could be solved by adding policy statemetns to
annotations. I could imagine the inventor of an Annotation should think
about how the annotation should behave during optimisation/change. So
the anntation should have a policy field which defaults to DontCare. In
that case the user of the Annotation cannot be sure that it will get
retained or something like that.

Given the discussion that happens in the higher level vms (like Gilad
Bracha's Paper on pluggable type systems) gives some hints about the
difficulties in changing Instruction Sets over time. I think core system
functionality is invariant, but meta information, that is not essential
for the application to work should be pluggable too.

> As a historical curiosity, Function still needs to be annotatable due to
> the LLVM code generator relying on it.  This will be fixed in LLVM 1.8
> and Function will not be annotable anymore.
> 
> If you *really* just want per-pass local data, you should just use an
> std::map from the Value* to your data.

Why not see Annotations as the means to serialize these Maps. Maybe we
could add an Annotations table that maps Value types to ConstantPool
entries or something like that. This would make it more easily for LLVM
libraries in other languages too.

> 
>> %struct.A = type { int }
>> %struct.B = type { int }
>>
>> BTW: How would one generate a type alias like the above through the LLVM
>> API?
> 
> 
> Add two entries to the module symbol table for the same Type using
> Module::addTypeName.

Very interesting. I then have to take the type by calling
Module::getTypeByName to have a second Type pointer or?

since I saw the llvm-gcc generates code like:

%pa = alloca %struct.A
%pb = alloca %struct.B

this means that the AllocaInst must have knowledge about two types which
can only be so by having two different pointers? right?

> 
>> But sometimes it would be interesting to actually get symbol information
>> about a type beeing used, without the need for full featured debug
>> information ala DWARF.
> 
> 
> This isn't something you can do, this is far more tricky than you make
> it out to be. :)
> 
Hehe, I know. Certainly if someone like you says that. But *if* the
front end is aware of the annotation, which would be doable, and the
annations are serializable in bytecode, then one would have this
information during LLVM bytecode processing as well. One could also emit
the symbolic information like Relocation informations in a .section or
as I am currently working with the JIT - use the JITs information about
the annotations to get the symbolic information.

>> This could be also solved by introducing Annotable.
>> For instance if the alloca/malloc/.. instruction would get an Anntation
>> about a symbolic type which could look like:
>>
>> { ("x",int) }
>>
>> One could use the DEF/USE and operand information in the byte code to
>> know which symbolic field was accessed for instance through
>> getelementptr.
> 
> 
> Again, this is effectively extending the LLVM IR.  Calling it an
> 'annotation' doesn't make it simpler. :)  Also, the front-end would have
> to be modified to generate the annotation.

See above. Every information must be understood in order to be usable.
But I would do it as an annotation since it is just additional meta
information and the program would perfectly run without the information.

> 
>> I don't know how you feel about that, but I there would be many
>> circumstances where Annotations could help getting more information out
>> of the bytecode.
> 
> 
> While I understand the general utility of annotations, the LLVM
> Annotation facility has several problems (some of which are described
> above) that make them not work well in practice.  Even if they did, they
> would still have the "updating" class of problems, which I'm not sure
> how to solve.
> 
> If you think that this is something that would be really useful, you can
> come up with solutions for these issues, and you're willing to implement
> it, then this is the right place to talk about the design of the new
> facility. :)
Hehe. Yes. I am just getting comfortable with the framework and I think
it is very nice. If I have some more points (which I hope I have) I will
definitely talk to you.

-- Jakob
> 
> -Chris
>