[cfe-dev] New clangIndex library

Argyrios Kyrtzidis kyrtzidis at apple.com
Thu Jul 9 09:52:13 PDT 2009


Hi all,

If you are following the cfe-commits mailing list, you may have  
noticed that a new 'Index' library (referred to as 'clangIndex' from  
now on) has "landed" on the clang repository recently.
I'd like to make a proper introduction of this new library.

ClangIndex is meant to provide the basic infrastructure for cross- 
translation-unit analysis and is primarily focused on indexing related  
functionality.
It provides an API for clients that need to accurately map the AST  
nodes of the ASTContext to the locations in the source files.
It also allows them to analyze information across multiple translation  
units.

As a "general rule", ASTContexts are considered the primary source of  
information that a client wants about a translation unit.
As a consequence, there will be no such class as an "indexing  
database" that stores, for example, source locations of identifiers  
separately from ASTContext.
All the information that a client needs from a translation unit will  
be extracted from the ASTContext.


Entity:
--------
To be able to reason about semantically the same Decls that are  
contained in multiple ASTContexts, the 'Entity' class was introduced.
An Entity is an ASTContext-independent "token" that can be created  
from a Decl (and a typename in the future) with the purpose to "resolve"
it into a Decl belonging to another ASTContext. Some examples to make  
the concept of Entities more clear:

t1.c:
void foo(void);
void bar(void);

t2.c:
void foo(void) {
}

Translation unit 't1.c' contains 2 Entities 'foo' and 'bar', while  
't2.c' contains 1 Entity 'foo'.
Entities are uniqued in such a way that the Entity* pointer for 't1.c/ 
foo' is the same as the Entity* pointer for 't2.c/foo'.
An Entity doesn't convey any information about the declaration, it is  
more like an opaque pointer used only to get the
associated Decl out of an ASTContext so that the actual information  
for the declaration can be accessed.
Another important aspect of Entities is that they can only be created/ 
associated for declarations that are visible outside the
translation unit. This means that for:

t3.c:
static void foo(void);

there can be no Entity (if you ask for the Entity* of the static  
function 'foo' you'll get a null pointer).
This is for 2 reasons:
1) To preserve the invariant that the same Entity* pointers refer to  
the same semantic Decls.
    In the above example t1.c/foo and t2.c/foo are the same, while  
t3.c/foo is different.
2) The purpose of Entity is to get the same semantic Decl from  
multiple ASTContexts. For a Decl that is not visible
    outside of its own translation unit, you don't need an Entity  
since it won't appear in another ASTContext.


ASTLocation
-----------
Encapsulates a "point" in the AST tree of the ASTContext.
It represents either a Decl*, or a Stmt* along with its immediate  
Decl* parent.
An example for its usage is that clangIndex will provide the  
references of 'foo' in the form of ASTLocations, "pointing" at the  
expressions that reference 'foo'.

ResolveLocationInAST
-------------------
A function that accepts an ASTContext and a SourceLocation which it  
resolves into an ASTLocation.

DeclReferenceMap
---------------
Accepts an ASTContext and creates a mapping from NamedDecls to the  
ASTLocations that reference them (in the same ASTContext).

AST files
---------
The precompiled headers implementation of clang (http://clang.llvm.org/docs/PCHInternals.html 
) is ideal for storing an ASTContext in a compact form that
will be loaded later for AST analysis. An "AST file" refers to a  
translation unit that was "compiled" into a precompiled header file.

index-test
----------
A command-line tool that exercises the clangIndex API, useful for  
testing the clangIndex features.
As input it accepts multiple AST files (representing multiple  
translation units) and a few options:

    -point-at  [file:line:column]
Resolves a [file:line:column] triplet into a ASTLocation from the  
first AST file. If no other option is specified, it prints the  
ASTLocation.
It also prints a declaration's associated doxygen comment, if one is  
available (courtesy of Doug).

    -print-refs
Prints the ASTLocations that reference the declaration that was  
resolved out of the [file:line:column] triplet

    -print-defs
Prints the ASTLocations that define the resolved declaration

    -print-decls
Prints the ASTLocations that declare the resolved declaration


Here's an example of using index-test:

You have 3 files,

foo.h:
---------------------
extern int global_var;

void foo_func(int param1);
void bar_func(void);
-----------------------

t1.c:
--------------------
#include "foo.h"

void foo_func(int param1) {
   int local_var = global_var;
   for (int for_var = 100; for_var < 500; ++for_var) {
     local_var = param1 + for_var;
   }
   bar_func();
}
--------------------

t2.c:
----------------------
#include "foo.h"

int global_var = 10;

void bar_func(void) {
   global_var += 100;
   foo_func(global_var);
}
-------------------------

You first get AST files out of t1.c and t2.c:

$ clang-cc -emit-pch t1.c -o t1.ast
$ clang-cc -emit-pch t2.c -o t2.ast

Find the ASTLocation under this position of t1.c:
.................
void foo_func(int param1) {
   int local_var = global_var;
                       ^
...................

$ index-test t1.ast -point-at t1.c:4:23
[Decl: Var local_var | Stmt: DeclRefExpr global_var] <t1.c:4:19, t1.c: 
4:19>

Find the declaration:

$ index-test t1.ast -point-at t1.c:4:23 -print-decls
[Decl: Var global_var] <foo.h:1:12, foo.h:1:12>

Find the references:

$ index-test t1.ast t2.ast -point-at t1.c:4:23 -print-refs
[Decl: Var local_var | Stmt: DeclRefExpr global_var] <t1.c:4:19, t1.c: 
4:19>
[Decl: Function bar_func | Stmt: DeclRefExpr global_var] <t2.c:6:3,  
t2.c:6:3>
[Decl: Function bar_func | Stmt: DeclRefExpr global_var] <t2.c:7:12,  
t2.c:7:12>

Find definitions:

$ index-test t1.ast t2.ast -point-at t1.c:4:23 -print-defs
[Decl: Var global_var] <t2.c:3:5, t2.c:3:18>



This concludes the introduction to the clangIndex library, if you have  
any questions or comments please let me know.

-Argiris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20090709/612cbf0e/attachment.html>


More information about the cfe-dev mailing list