[cfe-dev] source code database
thakis at chromium.org
Tue Feb 28 20:37:21 PST 2012
clang should support much of what you ask for.
DXR ( https://wiki.mozilla.org/DXR ) is an existing attempt to use
clang to build a program database. https://github.com/nico/complete is
some old hack from me that does the same in worse - but since there's
a lot less code, maybe it's easier for a first look (relevant file:
On Tue, Feb 28, 2012 at 8:29 PM, James K. Lowden
<jklowden at schemamania.org> wrote:
> The "open clang projects" page refers to some potential uses of clang
> for tool-building. A few of them require metadata from the
> lexer or parser.
> I'm interested in creating a framework for searching and reporting on
> large C++ code trees. I wonder what work has already been done, and if
> the information I want is currently available from the clang front
> end. I would begin by capturing the token metadata in SQLite, thereby
> making them accessible to a variety of applications.
> Back when the VAX dinosaur was knee-high to a mammal, I used DEC's
> Source Code Analyzer (SCA). To this day, I have never seen or heard
> of anything as good. ISTM clang could be used to create something
> What is "as good", and what would be better?
> SCA let the user:
> 1. analyze arbitrary subsets of a source code tree
> 2. dynamically restrict the range of queries on that subset
> 3. distinguish among read, write, invoke, reference, and dereference
> 4. define "interesting" cases for repeated use, including reports
> Current Tools Fail
> Microsoft's tool lacks all these features. cscope has some of them,
> but only for C. (For example, cscope cannot search for a
> destructor or anything with a scope operator.) VS parses C++, but the
> user cannot search for uses of e.g. operator<<.
> The free tools I've looked at share don't really parse C++. They
> parse the nonlanguage "C/C++". Consequently they cannot hope to
> answer #3 above; they can't even distinguish between ::B and A::B.
> They also lack any kind of scripting language, preventing #4 and
> severely restricting the capability of #2.
> These problems are all answered by clang+SQL. Or, might be, if clang
> is up to the job.
> Required Metadata
> I'm sure the following is incomplete and that it is more
> comprehensive than what is available from any existing tool at any
> price. Is it covered by clang at present?
> For any token
> 1. namespace
> 2. enclosing class/struct
> 3. const, static
> 4. linkage
> 5. public, protected, or private (or none)
> 6. declare, define, or use
> 7. translation unit (file) and line number
> It should be possible to say in which lines of a file a given token
> is visible.
> For types
> 1. class, struct, or enum
> 2. derived from
> 3. derived how (public/protected/private)
> For typedefs, the above must be available for all components of the
> For variables
> 1. read, write, invoke, reference, and dereference
> (A variable may be invoked if it holds a pointer to a function.)
> 2. type: class, struct, typedef, or builtin
> 3. const, static, or automatic
> 4. (overrides can be derived)
> 5. for uses, discarded Koenig lookups
> For functions
> 1. for each parameter and return type, cf. "for variables", above
> 2. invoke or reference
> 3. (overrides can be derived)
> 4. for invocations, discarded Koenig lookups
> For operators
> 1. declare, define, reference, or invoke
> 2. friendship (1 : many)
> 3. for invocations, discarded Koenig lookups
> For the preprocessor
> 1. define or use
> 2. scope
> 3. post-processing interpretation, as above
> As I said, I would like to know if the above information is accessible
> from the clang "kit" and what, if anything, has been undertaken in this
> vicinity heretofore. If clang can provide the information, the project
> I have in mind -- of writing a tool to collect it and keep it in a
> database -- is both useful and feasible.
> It's a big question, I know. You can appreciate I'd want to know the
> feasibility first, before diving in.
> Thank you for your time.
> P.S. Prior to posting, I tried to read the mailing list archives. I
> must not be the first to notice they're almost impossible to read
> because the text doesn't wrap in the browser.
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
More information about the cfe-dev