[LLVMdev] LLVM + FORTRAN 95

Jon Harrop jon at ffconsultancy.com
Tue Nov 3 16:39:21 PST 2009


On Tuesday 03 November 2009 23:24:57 Nilesh Mahajan wrote:
> Hi David/Renato,
>
> By AST I mean Abstract Syntax Tree. We are writing an optimization
> pass for some FORTRAN95 + MPI code that requires us to analyze the
> AST. We thought of 2 ways of doing this:
>
> 1. Compile the code using Clang/llvm-gfortran, get the textual AST
> dump (somehow), analyze the AST dump using Ruby, modify it and then
> feed back the modified AST to LLVM.
>
> 2. Do the analysis as an LLVM module.
>
> From your comments, I get the feeling that 2nd option is the better
> option.

I was doing something similar last year and tried writing my own Fortran 
lexer/parser and reusing some of the existing ones. I found it so hard that I 
ended up rewriting the 800kLOC of Fortran code in a more modern language by 
hand. Basically, the Fortran-related open source tools are so poorly written 
and unreliable that they are not worth using. AFAIK, the llvm-gfortran 
compiler is just an LLVM backend on GCC's Fortran front-end. GCC is awful so 
I would not recommend trying to get anything sensical out of it.

One project I did have limited success with was g95-xml, which is a hacked 
version of GCC's g95 compiler that can output the nearest thing Fortran has 
to an AST as XML:

  http://g95-xml.sourceforge.net/

The "First attempts" version that I used was a Perl programmer's idea of a 
parse tree though. ;-)

For example:

<fortran>
  <statement id="0xbdf7b30" type="PROGRAM" loc="[0,6,0,18]"/>
  <statement id="0xbdf8420" type="TYPE_DECLARATION" loc="[1,6,1,23]" 
    decl_type="0x705820" decl_kind="0xbdf7fe0" decl_symbols="0xbdf8290"/>
  <statement id="0xbdf8f90" type="ASSIGNMENT" loc="[2,6,2,12]" 
expr1="0xbdf8100" 
    expr2="0xbdf8b00"/>
  <expr id="0xbdf8100" type="VARIABLE" loc="[2,6,2,7]" symbol="0xbdf8290"/>
  <expr id="0xbdf8b00" type="CONSTANT" loc="[2,10,2,12]" value="1.E+0"/>
  <statement id="0xbdf9550" type="END_PROGRAM" loc="[3,6,3,17]"/>
</fortran>

The edges between nodes in the AST are represented by those hexadecimal values 
(!). IIRC, after a lot of effort writing OCaml code to decipher that "XML", I 
discovered that it did not, in fact, contain all of the information from the 
source code and could not be used to perform the automated transformation 
that I wanted.

So my advice is certainly to compile your Fortran into LLVM IR because that is 
a far more sane and malleable format.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e



More information about the llvm-dev mailing list