[cfe-users] Syntax Parser using Clang

Harald Servat harald.servat at bsc.es
Fri Sep 20 00:49:20 PDT 2013


On 19/09/13 22:11, david.weber at l-3com.com wrote:
> I'm attempting to write a syntax parser using clang to let me list out
> tokens (variables, types, etc) for another toolset.
>
> I have a file specified as
>
> #include "bogus.h"
>
> const bogus::bogus_type bogus::bogus_val = 0;
>
> int main(){
>
>       int alpha = 40;
>
>       return(-1);
>
> }
>
> and I would expect to get a list of something like:
>
>                  bogus::bogus_type
>
>                  bogus_bogus_val
>
>                  alpha
>
>                  main
>
>                  ...
>
>                  (built in values as well)
>
> I'm using the C API, and for simple C files,  it works fine.  However
> once I start going into C++ bodies, especially when the header file
> isn't found (again, I only care about syntax, not making sure its
> valid), it's giving me only:
>
> alpha
>
> main
>
> and ignoring the bogus values.  I'm almost thinking that I am basically
> using too powerful of a tool for what I need, and I should find
> something "dumber".
>
> My main follows (whole thing also attached)
>
> Compile with:
>
> g++ parse.cpp -lclang -L/usr/lib64/llvm -o parse
>
> int main(int argc, char* argv[]){
>
>      init_filter();
>
>      CXIndex index = clang_createIndex(1, 1);
>
>      unsigned int options = CXTranslationUnit_None;
>
>      // We don't want to expand any #include statements
>
>      // so disable the standard include locations
>
>      const unsigned int num_args = 2;
>
>      const char* const args[num_args] = {
>
>          "-nostdlibinc",
>
>          "-nostdinc"
>
>      };
>
>      std::cout << "-----------------" << std::endl;
>
>      // Parse the file
>
>      CXTranslationUnit tu =
>
>          clang_parseTranslationUnit   (
>
>              index,       // index to associate w/ this translation unit
>
>              argv[1],  // source file name
>
>              args,        // number command line args
>
>              num_args,    // command line args
>
>              0,           // number of unsaved files
>
>              NULL,        // unsaved files
>
>              options
>
>          );
>
>      std::cout << "-----------------" << std::endl;
>
>      // Get a cursor into the parsed file
>
>      CXCursor cursor = clang_getTranslationUnitCursor(tu);
>
>      if(clang_Cursor_isNull(cursor)){
>
>          std::cout << "Cursor was NULL!" << std::endl;
>
>          exit(-1);
>
>      }
>
>      // Visit the children
>
>      clang_visitChildren(cursor, visitor, NULL);
>
>      // Print out the unique tokens
>
>      std::cout << std::endl << "Unique Tokens:" << std::endl;
>
>      for(tokenSet_t::iterator iter = token_set.begin();
>
>          iter != token_set.end();
>
>          ++iter)
>
>      {
>
>          std::cout << "\t" << *iter << std::endl;
>
>      }
>
>      return 0;
>
> }
>
>
>

Hello David,

   once I tried to do something similar, but only looking at the loops & 
procedures within the code. When I tried that I was on a hurry due to a 
deadline, and I simply relied on the output of

   clang -cc1 -ast-dump <inputfile>

   which was almost complete for me. Maybe you can give this a try.

   BTW, now I have some more spare time, I can try something like using 
clang's API like you.

   Just my 0.02€.

Best regards.

-- 


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer



More information about the cfe-users mailing list