[LLVMdev] Re: gcc like attributes and annotations

Sun Feb 26 14:21:20 PST 2006

Hi Mike,

hope you are doing well with the llvm gcjx backend. I am currently
writing an llvm backend for a C like language for tracing (like D in
dtrace). I am very interested in this area. Do you currently put your
work in a repository? (maybe as Tom suggested gcjx.sf.net would be an
easy start - since it would not require gcc committer status). I am keen
on getting LLVM support for gcj. Maybe we could also wrap the LLVM
infrastructure in CNI so that the ecj compiler could be targetable to
LLVM more easily.

Mike Emmel schrieb:
> This is a interesting thread.
Thank you.

> 
> I think this would also help with compiling scripting languages such
> as JavaScript/Python etc. We could keep the high level meta data and
> runtime binding info as language specific bytecode in the file and
> just have the parts that are easy to represent as compileable in the
> main object sections. There is no intrinsic reason for all the runtime
> type information to get compiled into the core object module.  Also I
> could bypass code thats difficult to compile and just stuff its
> bytcode into this section. So I think this really helps with partial
> compliation and supporting languags that have complex runtimes.
>  The llvm bycode section would just get a stub runtime upcall for code
> that not compiled.
> 

Hmm. Not sure I understand you 100 % here. I think the most interesting
use for annotations is if you want to augment information at some point
in the bytecode. For instance if you want to say, you label exactly this
Value.

If on the other hand you are developing higher level constructs like a
symbolic dispatch facility for dynamic languages, you could as well put
the information in C-like data structures. Even *if* you want to add raw
bytecode into the modules, which I think is better to associate more
externally like the gcj-dbtool is doing it, you just need some kind of
blob, but not really annotions.

Btw: I am very interested in the dynamic languages project you are
mentioning. Do you have any dynamic language frontends in use. The PyPy
project I think is targeting LLVM too. I could imagine that this meta
information is just stored in plain C structures.

> For java for example this would probably be the compiled parts with
> stubs and a regular classfile for the runtime data with compiled
> functions converted to native.

Hmm. Maybe we should follow the gcj approach. Or at least use an
interchangeable metadata spec. At last years FOSDEM there was a short
discussion about a more general meta-data format, which would make gcj
generated object code self containing an not needing the classfiles when
compiling. Currently AFAIK gcj uses Class structures to represent the
runtime meta information (for getClass( ) and reflection stuff as well
as the indirect dispatch). You could model this approach. I think
class-level metadata in special ELF sections for instance could provide
a good way to make the gcj generated code more abstract in a way that
external tools like linkers could understand the format.  Since LLVM
supports special sections now, we could use a similar approach here.

> 
> In the short term I think I'll simply use the class file format in my
> native compiled classes
> and wait and see how this turns out. I've been stuck thinking about
> this for two months.
So you are currently compiling class files to LLVM modules and you place
the Java .class file inforamtion in the LLVM bytecode too? Or am I
missing somethign here?
> 
--Jakob
> 
> 
> On 2/25/06, Jakob Praher <jp at hapra.at> wrote:
> 
>>Hi Reid,
>>
>>Reid Spencer schrieb:
>>
>>>I have some thoughts on this too ..
>>>
>>
>>Great!
>>
>>
>>>On Fri, 2006-02-24 at 19:56 +0100, Jakob Praher wrote:
>>>
>>>
>>>>I get you 100 % here. But as you say later in the mail, many information
>>>>is done by some runtime std::map<Value*,foo> stuff. Which is really
>>>>handy at runtime, but I *had* serialization in mind when I was thinking
>>>>about Annotations. I see annotations as a way to serialize some extra
>>>>information with the bytecode without having to extend/change the core
>>>>classes. The best way to implemented in runtime is to use some kind of
>>>>std::map subscripting, plus the additional benefit that you can
>>>>serialize it to the bytecode. Perhaps the best of both worlds.
>>>>
>>
>>...
>>
>>>As Chris mentioned, I would prefer that we keep annotations out of the
>>>core IR altogether as they are fraught with problems that are not easy
>>>to resolve. However, I understand where you're coming from in wanting to
>>>keep additional information with the bytecode. I have wanted the same
>>>thing for use by front end or specialized tools. For example an IDE that
>>>could keep track of source information or a language that needs special
>>>passes that can only be done at link time.
>>>
>>
>>Yes.
>>
>>
>>>In thinking about the "right" way to do this, I came up with the idea of
>>>a single "blob" of data that could be appended to a Module. This single
>>>"annotation" would always be ignored by LLVM, would not require
>>>significant additional space to construct, and there is already a
>>>mechanism for constructing the information via the bytecode reader's
>>>handler interface (might need some extension).
>>>
>>
>>As far as locality is concerned, perhaps it would make sense to make
>>such a blob on every primary object (module,function), so that
>>annotations that only apply to a certain function can be stored directly
>>in the function. That would make certain collisions easier to resolve.
>>
>>
>>>This is simply a way of making that std::map of information embeddable
>>>in the bytecode. It means the information is stored in one additional
>>>bytecode block (at the end) where it doesn't have any impact on LLVM
>>>(JIT/storage/etc).  The only question is: how do multiple tools avoid
>>>collision in this approach. Some kind of registry or partitioning of the
>>>data could likely solve that.
>>>
>>
>>Yes that sounds like a doable approach. But I would not write any binary
>>data into the blob, but use a LLVM type encoding approach/table
>>approach. Many annotations are simple or can be composite simple types
>>and people should be encouraged to store data in a way, that makes it
>>possible to read it without library code. If you just serialize C++
>>structs, you end up relying heavy on the code that wrote it. Which makes
>>it harder for tools to introspect anntoations. Java's annotations rely
>>on simple types for the same principle and I think it is the right way
>>for most things. There could be an opaque type for more complex
>>information, which should be discouraged.
>>
>>This would also make it possible to have tripple of
>>Value,AnnotationType,Name to match the Annotation, which helps to the
>>solve the collision problem too.
>>
>>The lookup mechanism could lookup by anything of the tripple:
>>- Target Value
>>- AnnotationType
>>- Name
>>
>>NULL values are wildcards.
>>
>>So you could say:
>>
>>Give me all annotations for a Value*
>>
>>/// Function local annotations
>>Value* v = ...
>>vector< const Annotation *>  &ans = curFunction->lookupAnnotation( v,
>>NULL, NULL);
>>
>>Or based on a specific type:
>>
>>/// Module wide annoations
>>AnnotationType *type = ...
>>Value< const Annotation *> &ans = module->lookupAnnotation( v, type, NULL );
>>
>>This just random thought though.
>>
>>
>>>>>As a historical curiosity, Function still needs to be annotatable due to
>>>>>the LLVM code generator relying on it.  This will be fixed in LLVM 1.8
>>>>>and Function will not be annotable anymore.
>>>>>
>>>>>If you *really* just want per-pass local data, you should just use an
>>>>>std::map from the Value* to your data.
>>>>
>>>>Why not see Annotations as the means to serialize these Maps. Maybe we
>>>>could add an Annotations table that maps Value types to ConstantPool
>>>>entries or something like that. This would make it more easily for LLVM
>>>>libraries in other languages too.
>>>
>>>
>>>This is similar to my idea above, but I wouldn't want to restrict it to
>>>any particular data structure. The application can construct the data
>>>however it wishes and simply pass a pointer to a block of memory to the
>>>bytecode writer.
>>>
>>
>>Great that we have a similar view. I would use a public simple type
>>encoding for the annotations, So that annotations are introspectable
>>without knowing much on the details of the annotation data. This helps
>>to keep the bytecode free from language specific data encoding too.
>>
>>-- Jakob
>>
>>
>>>------------------------------------------------------------------------
>>>
>>>_______________________________________________
>>>LLVM Developers mailing list
>>>LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>_______________________________________________
>>LLVM Developers mailing list
>>LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
> 
>