Fwd: [LLVMdev] Re: gcc like attributes and annotations

Mike Emmel mike.emmel at gmail.com
Sun Feb 26 17:42:35 PST 2006


On 2/26/06, Jakob Praher <jp at hapra.at> wrote:
> Hi Mike,
>
> hope you are doing well with the llvm gcjx backend. I am currently
> writing an llvm backend for a C like language for tracing (like D in
> dtrace). I am very interested in this area. Do you currently put your
> work in a repository? (maybe as Tom suggested gcjx.sf.net would be an
> easy start - since it would not require gcc committer status). I am keen
> on getting LLVM support for gcj. Maybe we could also wrap the LLVM
> infrastructure in CNI so that the ecj compiler could be targetable to
> LLVM more easily.
>
Hmm I'll hav to look at the data structures more I've not really
thought about a mapping to java there is a lvm backend as part of PyPy
the python compliler I'm looking right now at what they do.

> Mike Emmel schrieb:
> > This is a interesting thread.
> Thank you.
>
> >
> > I think this would also help with compiling scripting languages such
> > as JavaScript/Python etc. We could keep the high level meta data and
> > runtime binding info as language specific bytecode in the file and
> > just have the parts that are easy to represent as compileable in the
> > main object sections. There is no intrinsic reason for all the runtime
> > type information to get compiled into the core object module.  Also I
> > could bypass code thats difficult to compile and just stuff its
> > bytcode into this section. So I think this really helps with partial
> > compliation and supporting languags that have complex runtimes.
> >  The llvm bycode section would just get a stub runtime upcall for code
> > that not compiled.
> >
>
> Hmm. Not sure I understand you 100 % here. I think the most interesting
> use for annotations is if you want to augment information at some point
> in the bytecode. For instance if you want to say, you label exactly this
> Value.
>
> If on the other hand you are developing higher level constructs like a
> symbolic dispatch facility for dynamic languages, you could as well put
> the information in C-like data structures. Even *if* you want to add raw
> bytecode into the modules, which I think is better to associate more
> externally like the gcj-dbtool is doing it, you just need some kind of
> blob, but not really annotions.
>
> Btw: I am very interested in the dynamic languages project you are
> mentioning. Do you have any dynamic language frontends in use. The PyPy
> project I think is targeting LLVM too. I could imagine that this meta
> information is just stored in plain C structures.
>
Yep I just found it.
That was the route I was taking intially then the more I thought about it I was
like why hide meta data in C structs. It's certainly the common practice but
its not clear its the best. My first thought was the java class file format now
I've even moved beyond that why not XML  ??
If at some some point size is a problem you can gzip it.
Its a format that is easy to use by a huge variety of tools. It
actually parses fairly nicely
If performance becomes a problem its a good starting point for translation
to a faster format plus you can consider global optimizations  a
unified string table for
a package for example.

I can't really come up with a good reason to not do it in XML there
are plenty of traditional compliers Its worth exploring this approach.


> > For java for example this would probably be the compiled parts with
> > stubs and a regular classfile for the runtime data with compiled
> > functions converted to native.
>
> Hmm. Maybe we should follow the gcj approach. Or at least use an
> interchangeable metadata spec. At last years FOSDEM there was a short
> discussion about a more general meta-data format, which would make gcj
> generated object code self containing an not needing the classfiles when
> compiling. Currently AFAIK gcj uses Class structures to represent the
> runtime meta information (for getClass( ) and reflection stuff as well
> as the indirect dispatch). You could model this approach. I think
> class-level metadata in special ELF sections for instance could provide
> a good way to make the gcj generated code more abstract in a way that
> external tools like linkers could understand the format.  Since LLVM
> supports special sections now, we could use a similar approach here.
>

Yep this is true but agian a inital  XML format in my opinion makes
sense to facilitate
these types of translations.  I think  there my be several that are
useful in different  circumstances.

1.) XML format for development debugging wrapper generation.
2.) Classfile format may work better with "traditional" jvms.
3.) Elf based format for elf systems.
4.) Weird formats for embedded systems esp ones were the total amount of code
is fixed or well understood this includes stripping out unused code etc.
5.) LLVM/jit friendly format ??

My point is there are a lot of formats that may be optimum but by
starting with XML you can easily convert with the code is deployed too
the best format.


> >
> > In the short term I think I'll simply use the class file format in my
> > native compiled classes
> > and wait and see how this turns out. I've been stuck thinking about
> > this for two months.
> So you are currently compiling class files to LLVM modules and you place
> the Java .class file inforamtion in the LLVM bytecode too? Or am I
> missing somethign here?
> >
I've not got that far I'm walking to tree converting methods I just
started working on the
class definition.

Also I've got another task of upgrading the webkit gtk port that I
need to do right now
I also just got the directfb backend into the mainline gtk cvs.
But as soon as the browser upgrad is done I'll get back to work I was 
also stalled conceptually till now.
Ohh and I moved from Boston to Chicago to LA in less then four months :(

Also I've thinking that instead of using the traditional approach of
calling the compiled methods with a pointer to the object  I want to
do it different.

generally a method is converted to native code like this.

class foo {
  void bar(){}
}

becomes in C

void  bar( objptr *foo );

But why not this...

void bar( clazz *foocls, instantpotr *fooinstance, void *vtable );

So instead of definined a fixed native struct for a class such as

struct foo {
   clazz *cls;
   vtable *my vtabe;
   int instantdata1;
}

Or something like that we break out the three pieces of info needed in
native code
the class pointer for class static viariables the instance variable
struct pointer and the
vtable for virtual methods.

The cool thing is this works for any language that has the concepts of
class objects and
instance objects and methods.
Even if it does not have class objects it still can call the native
method with two objects of the right type since there just plain c
structs.

The xml meta data fits in nicely since it would say allow both Java
and python to use the same native library. the price is your filling a
lot more registers with args but generally you have either machines
with a few registers or a bunch so I'm not sure thats a huge deal.
Arm is the only thing that comes to mind where this may cause problems.

I'm sure there is a chance to optimize out any or all of the args
depending on them being used in the call. This could easily be
reflected in the meta data.


So the idea now is break out the class object and vtable pointers for
the native methods
and put a ton of info in a XML meta data format section.

Finally I'm wondering if introducing a interpeter style stack may make
sense with so many args and the fact we may want to emliminate some
based on use.

But even here I'm thinking of two entry points a pure stack one and a
register one

so for interpeters calling we would have

void bar(  StackPtr *stack) {

   public barInner :
       ( cls * stack[0], objptr * stack[1], vtable * stack[3] ..... ){

    }

}
So I can public two entry points one that take args from memory and
converts to the native calling convention the second takes the native
convention.

I'm trying to write nested functions in a C like language with the
inner function also public.

Agian these entry points can be published in the meta data and if
wanted for example the outer one could be stripped off.

Mike





> --Jakob
> >
> >
> > On 2/25/06, Jakob Praher <jp at hapra.at> wrote:
> >
> >>Hi Reid,
> >>
snipped Older conversation that I've now drifted far away from.




More information about the llvm-dev mailing list