[LLVMdev] LLVM + FORTRAN 95
Jon Harrop
jon at ffconsultancy.com
Tue Nov 3 16:39:21 PST 2009
On Tuesday 03 November 2009 23:24:57 Nilesh Mahajan wrote:
> Hi David/Renato,
>
> By AST I mean Abstract Syntax Tree. We are writing an optimization
> pass for some FORTRAN95 + MPI code that requires us to analyze the
> AST. We thought of 2 ways of doing this:
>
> 1. Compile the code using Clang/llvm-gfortran, get the textual AST
> dump (somehow), analyze the AST dump using Ruby, modify it and then
> feed back the modified AST to LLVM.
>
> 2. Do the analysis as an LLVM module.
>
> From your comments, I get the feeling that 2nd option is the better
> option.
I was doing something similar last year and tried writing my own Fortran
lexer/parser and reusing some of the existing ones. I found it so hard that I
ended up rewriting the 800kLOC of Fortran code in a more modern language by
hand. Basically, the Fortran-related open source tools are so poorly written
and unreliable that they are not worth using. AFAIK, the llvm-gfortran
compiler is just an LLVM backend on GCC's Fortran front-end. GCC is awful so
I would not recommend trying to get anything sensical out of it.
One project I did have limited success with was g95-xml, which is a hacked
version of GCC's g95 compiler that can output the nearest thing Fortran has
to an AST as XML:
http://g95-xml.sourceforge.net/
The "First attempts" version that I used was a Perl programmer's idea of a
parse tree though. ;-)
For example:
<fortran>
<statement id="0xbdf7b30" type="PROGRAM" loc="[0,6,0,18]"/>
<statement id="0xbdf8420" type="TYPE_DECLARATION" loc="[1,6,1,23]"
decl_type="0x705820" decl_kind="0xbdf7fe0" decl_symbols="0xbdf8290"/>
<statement id="0xbdf8f90" type="ASSIGNMENT" loc="[2,6,2,12]"
expr1="0xbdf8100"
expr2="0xbdf8b00"/>
<expr id="0xbdf8100" type="VARIABLE" loc="[2,6,2,7]" symbol="0xbdf8290"/>
<expr id="0xbdf8b00" type="CONSTANT" loc="[2,10,2,12]" value="1.E+0"/>
<statement id="0xbdf9550" type="END_PROGRAM" loc="[3,6,3,17]"/>
</fortran>
The edges between nodes in the AST are represented by those hexadecimal values
(!). IIRC, after a lot of effort writing OCaml code to decipher that "XML", I
discovered that it did not, in fact, contain all of the information from the
source code and could not be used to perform the automated transformation
that I wanted.
So my advice is certainly to compile your Fortran into LLVM IR because that is
a far more sane and malleable format.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
More information about the llvm-dev
mailing list