[LLVMdev] OT: new here, dynamic/runtime compilation (in general)

BGB cr88192 at hotmail.com
Sun Oct 21 17:27:37 PDT 2007


well, sadly, I am not sure how people are on this list...
in any case, LLVM is an interesting project, and may well continue being interesting.


but, in my case, I have done my own compilation framework...

ok, I didn't really hear about the really interesting bits of LLVM until after I had (more or less) wrote mine...

ok, my point is to maybe to have something interesting to talk about, not as much to try to spam my own effort (so, I hope that people here do not take offense). 


in a more or less completely custom written compiler framework (I am the only programmer in my case, a lone hobbyist), this project originally starting about march or so (though some parts are older, and were reused as such).

far from being all that complete or featureful, and currently my version only has a working x86 version, and an incomplete x86-64 version...

as of yet, it is still often broken, and sometimes buggy (when I find bugs, I fix then, but I have hardly been doing all that comprehensive of testing, and as of yet, I am paranoid of the horrible levels of breaking such testing would reveal...).


now, what I do with it is this:
I use C as a scripting language...

ok, C is also my main programming language, as I very rarely use C++ for much of anything...
(something like embedded-C++ may also be possible at some point, but full C++ is unlikely as it just looks too painful to write a compiler for...).


so, I load the C modules dynamically, and have them link directly to the host app (on windows, this requires manually opening and processing the exe's contents, but on linux, I have libdl...).

the reason is this:
what is better than C at interfacing with an otherwise C codebase?...
if I have no need of wrappers or FFI crap (needed for nearly any other non-C language), all the better (this was another major design goal).

for all this, I also needed a compiler (not an interpreter...), and also something that is if possible purely dynamic.

my idea was that we go from good old static compilation and linking, to something far more dynamic (imagine, for example, if we could use the compiler in much the same way we use 'malloc', and piece together code much like we currently piece together memory objects, though in a more metaphoric than literal sense...).


for example, it was a design goal that it be possible to incrementally replace parts of the running app image (at least, partly, aka, in VM controlled parts of the app), for example, by redefining some already running function at runtime, ... and have other code automatically go over to using the new version (thus requiring dynamic re-linking on the part of the framework...).

another minor goal:
fully dynamic piecewise compilation (like for example, a lisp style 'eval') has not yet been achieved (a major hurtle here is C's semantics, meaning that nearly any language which could work in an eval-style manner is necessarily no longer really C, which is sad...).

so, at present, it only handles source modules (with include files and a toplevel and so on), but oh well...


C conformance:
well mostly it implements an as of yet incomplete version of C. some features are still lacking, but they are features that are sufficiently infrequent that I can make due without them for the time being (namely: 'static', initialized structs, multi-prototypes, C99 style dynamicly-sized arrays, ...).

to 'add insult to injury' though, I implemented a few compiler extension features:
some partly derived from gcc (__real, __imag, ...);
some custom: builtin geometric vectors and quaternions (I do a lot of 3D stuff...);
..

I also have Garbage Collection, Dynamic Typing, and Prototype OO available as library features (note, I mean 'Prototype OO', in reference to the object system used in Self, and crudely immitated in JavaScript, and not in reference to "some crude mockup of OOP in C"...).

well, one can use these and pretend they were using a real script language...
note that these library features do not depend on compiler extensions, and so they also still work in parts of my project compiled with gcc...

though technically possible, I am a little more uncertain about adding compiler extensions for stuff this much unlike C...

well, all this works for me...


performance:
well, this was never really a major goal of mine (as long as it was tolerable I guess), but it seems oddly enough to generally produce better code than GCC, which is probably worth something...

I don't set out to do elaborate tricks to gain performance. instead most of what I have was gained by tweaking code produced in the "common, special case" (looking at assembler, "well, this stupid-looking construction is appearing far too often, may go and fix it").


intermediate language:
at this point, my project and LLVM are somewhat different.

LLVM using its good old variable and three-address-code based model.

mine works quite a bit different.
my IL is technically, a little closer to something like postscript or forth (not very nice looking, but hell, it is produced by a compiler...).

note that it is, oddly enough, passed between the compiler stages in a texual form, which was done in an attempt to resolve some major technical differences between the mechanics of my upper and lower compilers.


it represents an abstract stack machine (though, mind you, this may have little to do with the stack as seen by the HW or at the assembler level).

unlike forth, since this stack is more an abstraction than a representation of the physical stack, it is not valid to attempt to use conditionals in any way that would change the ordering or structure of the stack elements. long arguments came up over these points, that I am dealing with a declarative rather than an imperative notion of a stack...

as a result, in a few cases I have what basically amount to phi operators for the stack, which are used for dealing with these issues (with the restriction that all the values merging into this operator have to have the same type...).


likewise, unlike an 'actual' stack, there is no real notion of a stack element size (for example, pointers, integers, long longs, vectors, and structs, can all be pushed to the stack and treated as single items regardless of worrying about type and size).

likewise, literals and symbols (names referring to variables, functions, or types) can also go on the stack.

though also statically typed (at runtime), most operations are fairly type-agnostic (it is the task of the operation itself to deal with whatever was given to it).

as such, a lot of the IL-compiler's internals go to dealing with all the various types (a good deal exists in terms of elaborate type-dispatch code).

I also used a stack because that is what I am most fammiliar with...


I call my IL language RPNIL, or RIL (since it is an RPN-based IL).
the RIL compiler (converts RIL to assembler) is generally referred to as the 'lower compiler', with the 'upper compiler' handling the translation from C to RIL.

as noted, I also have a runtime assembler and linker...


or such...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071022/6759e27b/attachment.html>


More information about the llvm-dev mailing list