[LLVMdev] Announcement: GNAT ported to LLVM
Duncan Sands
baldrick at free.fr
Sun Mar 23 15:00:11 PDT 2008
Hi, this is to let people know that the recently released
LLVM 2.2 compiler toolkit contains experimental support for
Ada through the llvm-gcc-4.2 compiler. Currently the only
platform it works on is linux running on 32 bit intel x86.
This is because that's what I run, and I'm the only one who's
been working on this. I would appreciate help from other Ada
guys, both for porting to new platforms and adding support for
missing features, not to mention testing and bug fixing!.
LLVM (http://llvm.org/) is a set of compiler libraries and tools
for optimization and static and just-in-time code generation.
Personally I find LLVM a lot of fun, and pleasant to work with
due to the good design and clean implementation. I hope you
will too! llvm-gcc is gcc with the gcc optimizers replaced by
LLVM's; llvm-gcc-4.2 is the version of llvm-gcc based on gcc-4.2.
The way llvm-gcc works (this is transparent to users) is that the
gcc-4.2 GNAT front-end converts Ada into "gimple", gcc's internal
language independent representation. The gimple is then turned
into LLVM's internal form, referred to as IR. This in then run
through LLVM's optimizers, followed by LLVM's code generators
which squirt it out as assembler or object code. In practice
you can use llvm-gcc as a drop in replacement for gcc. However
the use of LLVM opens up other possibilities too.
For example, it is possible to have llvm-gcc squirt out LLVM IR
rather than object code (by using -emit-llvm on the command line).
It is possible to link the LLVM IR for different compilation units
together and reoptimize them. In other words you can do link-time
optimization. This is all language independent, so if part of your
program is written in Ada and other parts in C/C++/Fortran etc, you
can link them all together and mutually optimize them, resulting in
C routines being inlined into Ada etc.
The compiler works quite well, but it is still experimental. All
of the ACATS testsuite passes except for c380004 and c393010. Since
c380004 also fails with gcc-4.2, that makes c393010 the only failure
due to the use of the LLVM infrastructure (the problem comes from
the GNAT front-end which produces a bogus type declaration that the
gimple -> LLVM convertor rejects). On the other hand, many of the
tests in the GNAT testsuite fail. The release notes give some more
details of what works and what doesn't:
http://llvm.org/releases/2.2/docs/ReleaseNotes.html
The precompiled llvm-gcc-4.2 shipped with the LLVM 2.2 release was
built without support for Ada, so you will need to build the compiler
yourself. You can find instructions at
http://llvm.org/docs/GCCFEBuildInstrs.html
Please report bugs and problems to the LLVM mailing lists, or using
http://llvm.org/bugs/ One nice thing about LLVM is that people are
responsive and quickly fix bugs (often by the next day).
The LLVM IR is easy to read (with a bit of practice), and since it
contains the entire LLVM state you get to see exactly what has
happened to your program. This might be useful for static analysis,
it is certainly useful for understanding how the various Ada constructs
are implemented. To give you a taste for what it looks like, here is
an example showing what a simple Ada program gets turned into. Here
is the Ada:
with Ada.Text_IO;
procedure Hello is
begin
Ada.Text_IO.Put_Line ("Hello world!");
end;
Here's the result of compiling it:
$ gcc -S -O2 -emit-llvm -o - hello.adb
...
%struct.string___XUB = type { i32, i32 }
...
@.str = internal constant [12 x i8] c"Hello world!" ; <[12 x i8]*> [#uses=1]
@C.168.1155 = internal constant %struct.string___XUB { i32 1, i32 12 } ; <%struct.string___XUB*> [#uses=1]
define void @_ada_hello() {
entry:
tail call void @ada__text_io__put_line__2( i8* getelementptr ([12 x i8]* @.str, i32 0, i32 0), %struct.string___XUB* @C.168.1155 )
ret void
}
declare void @ada__text_io__put_line__2(i8*, %struct.string___XUB*)
I've dropped the declarations of some uninteresting types and other info,
thus the ... Note that passing -S -emit-llvm results in LLVM assembler
being output (the human readable version of LLVM IR); using -c -emit-llvm
would result in the compact binary form of LLVM IR, known as bitcode.
Passing -o - causes the assembler to be dumped to the terminal.
Here you can see:
(1) The declaration of Ada.Text_IO.Put_Line:
declare void @ada__text_io__put_line__2(i8*, %struct.string___XUB*)
The name ada__text_io__put_line__2 is that generated by GNAT for this routine.
The function returns no value ("void") and has two arguments: a pointer to an
i8 (an i8 is an 8 bit integer, in this case a character) and a pointer to a
%struct.string___XUB, which is a record type. The declaration of the type is
%struct.string___XUB = type { i32, i32 }
which is a record containing two 32 bit integers. These are the lower and
upper bounds for the string. Thus a call two Ada.Text_IO.Put_Line in fact
passes two arguments, a pointer to the string contents and a pointer to the
string bounds.
(2) The code defining Hello (_ada_hello). There is one basic block,
the entry block marked "entry:". It contains two instructions: a call
and a return instruction. The call
tail call void @ada__text_io__put_line__2( i8* getelementptr ([12 x i8]* @.str, i32 0, i32 0), %struct.string___XUB* @C.168.1155 )
is marked as a "tail call". If you don't know what that means, don't worry
about it. The call is to the function @ada__text_io__put_line__2, see (1) above.
The first parameter is an i8*, a pointer to an 8 bit integer, and has the value
getelementptr ([12 x i8]* @.str, i32 0, i32 0)
What is this? First off, @.str is the string constant
@.str = internal constant [12 x i8] c"Hello world!" ; <[12 x i8]*> [#uses=1]
This is an internal constant, meaning that it is not visible outside this
compilation unit. It has type [12 x i8], which is an array of 12 i8's.
It has the value "Hello world!", which is indeed 12 characters long. There
is a comment on the end of the line (starting with ";") pointing out the type
of @.str, which [12 x i8]*, a pointer to an array of 12 characters, and the
fact that @.str is only used in one place. The getelementptr instruction is
explained in the LLVM docs, see http://llvm.org/docs/LangRef.html and also
http://llvm.org/docs/GetElementPtr.html
Here it just converts @.str from a [12 x i8]* into an i8* before passing it
to @ada__text_io__put_line__2. In short: a pointer to the H in Hello World!
is passed as the first parameter of the call.
The second parameter is a pointer to a %struct.string___XUB, a record holding
the lower and upper bounds for the string. The value passed is @C.168.1155,
which is the constant declared as:
@C.168.1155 = internal constant %struct.string___XUB { i32 1, i32 12 } ; <%struct.string___XUB*> [#uses=1]
This is a constant record containing the values 1 (the lower bound) and 12 (the
upper bound).
The return instruction "ret void" completes execution of the function, and
returns control to the caller. The "void" indicates that this routine does
not actually return anything.
I hope you have fun playing with LLVM!
Duncan.
More information about the llvm-dev
mailing list