[cfe-dev] CompilationDatabase support added to libclang and python binding

Arnaud de Grandmaison arnaud.allarddegrandmaison at parrot.com
Mon Jun 11 05:39:16 PDT 2012

Thanks for taking time for this lengthy review :)

See my comments inline.

On 06/11/2012 01:06 PM, Gregory Szorc wrote:
> On Mon, Jun 11, 2012 at 10:32 AM, Arnaud de Grandmaison
> <arnaud.allarddegrandmaison at parrot.com> wrote:
>> Hi Manuel,
>> I am done adding the CompilationDatabase functionality to libclang
>> (first patch)
>> and the python binding (second patch).
>> With respect to our last email exchange :
>>  - comments have been modified to be futureproof with respect to
>> database actual format.
>>  - the python binding is using exceptions for operations which have failed
>>  - tests have been added for the libclang part as well as the python binding
>> The tests are making an assumption I am not totally happy with : in case
>> a file has several CompileCommands, the tests expect to get the
>> CompileCommands in the same order they appear in the database.
>> In c-index-test, I have added a 'dirname' function for use on Windows
>> platforms, assuming that if 'basename' is not available, than 'dirname'
>> would not. I do not know if this is necessary, and I have not tested it
>> as I have no windows platform available.
> Yay - new functionality! Overall a great patch. I can't wait to see
> this integrated.
> I do have a number of comments though.
> I am lacking clout on the C front, so take that feedback with a grain
> of salt. I'm a maintainer of the Python bindings though, so that must
> mean something.
> I've tried to fight this battle before and I'll say it again: I don't
> like the existing naming convention for the libclang APIs.
> Specifically, there isn't a consistent one. And, this patch continues
> that trend (not your fault - you were basing it on existing code,
> which is the right thing to do).
> Let's look at the new API (in source order):
> CXString clang_tooling_CompileCommand_getDirectory(CXCompileCommand);
> unsigned clang_tooling_CompileCommand_getNumArgs(CXCompileCommand);
> CXString clang_tooling_CompileCommand_getArg(CXCompileCommand, unsigned i);
> void clang_tooling_disposeCompilationDb(CXCompilationDb);
> CXCompilationDb clang_tooling_getCompilationDb_fromDirectory(const
> char *buildDir);
> void clang_tooling_disposeCompileCommands(CXCompileCommands);
> unsigned clang_tooling_CompileCommands_getSize(CXCompileCommands);
> CXCompileCommand
> clang_tooling_CompileCommands_getCommand(CXCompileCommands, unsigned
> i);
> CXCompileCommands clang_tooling_getCompileCommands(CXCompilationDb
> CDb, const char *completeFileName);
> First, some nits. I would change the source order of the declarations
> so related APIs are grouped together then sorted from highest to
> lowest level. Start with all the functions operating on
> CXCompilationDb. Then define everything operating on
> CXCompileCommands. Finally CXCompileCommand. This will increase
> readability, IMO. Also, always place the "constructors" and
> "destructors" next to each other and reference the other in the
> doxygen blocks. This increases the chances that the destructors are
> called appropriately.
Yes. Things are in a bottom up order because I wanted each 'type' and
its related functions to be grouped together. Doing things top down will
force to split the type definition (usually void * for now) from its
related functions, which I, as a developer, do not really like.

> The use of "Db" rather than "DB" looks unpleasant to me. I think you
> should capitalize both letters.
I did hesitate between both. Let's see how others react.

> Everything is prefixed with "clang_tooling_." The "tooling" bit is new
> to libclang. I personally like prefixing this way as it draws a clear
> line between underlying modules. But, I know others feel differently
> and may argue for its elimination. If we decide to keep it, perhaps a
> new header, Tooling.h, would be the appropriate place to declare these
> APIs since they are clearly separate from "index." This would extend
> to the Python API as well.
Also though about that. Although the tooling part is clearly a separate
module, it will most probably be used together with the "index". So
having a separate CTooling.h makes sense to me, but it should be made
available to the users thru the Index.h. The same holds for the python
binding. The Index acts more as an umbrella from which you can draw
different submodules. Beside, some functionality from Index is reused :
CXString for example.

> I really like your general naming convention of
> clang_tooling_<thing>_<action>. IMO, this is a proper naming
> convention. Related functions are naturally grouped together and the
> name communicates what they operate on and the action being performed.
> Unfortunately, the rest of the libclang API isn't like this.
> Furthermore, your patch isn't consistent in this style. e.g.
> "clang_tooling_disposeCompilationDb" should be
> "clang_tooling_CompilationDb_dispose" and
> "clang_tooling_getCompilationDb_fromDirectory" should be
> "clang_tooling_CompilationDb_getFromDirectory" or even
> "clang_tooling_CompilationDb_fromDirectory," etc. Since the existing
> API doesn't use my preferred naming convention, I guess this means you
> don't have to adopt it. Whatever you end up doing, please make it
> consistent because an inconsistent convention is worse than a
> consistently "bad" one.
I agree. I did not adhere to the clang_tooling_<thing>_<action>
convention for the dispose functions to stick with the actual convention
used in the rest of index.h for those functions.
> I wonder if a clang_tooling_CompileCommand_getArgs(CXCompileCommand,
> CXString **, size_t *size) (or similar) would be helpful. This API
> would allocate a new array of CXString or CXString* in one go instead
> of requiring a function call for each argument. Or, you could have the
> caller pre-allocate the array and the function would fill. Semantics.
> I think the majority of users are interested in most or all arguments
> and this API would save some function call overhead (which could be
> noticeable in places like the Python bindings). I can make the same
> argument for having a
> clang_tooling_CompileCommands_getCommands(CXCompileCommands,
> CXCompileCommand **, size_t *).
I also though about this. My belief is that we should provide a "low
level access" for the few people needing this power --- this is the
current function, plus a "high level" function which will be used in 99%
of the case and has no performance penalty.

Taking a CompileCommand as an example, I am thinking of :
 - CXString clang_tooling_CompileCommand_getArgs(CXCompileCommand);
 - CXString clang_tooling_CompileCommand_getExe(CXCompileCommand);
to get all arguments concatenated (without the executable part) and the
executable separately. This beside expresses an invariant from

Or even :
 - CXString clang_tooling_CompileCommand_getCmdLine(CXCompileCommand);
to get the full command line, ready for execution.

> I really don't like the magic of compilation databases being
> constructed from a directory with the auto-discovery of
> "compile_commands.json." I know this is how the underlying API works,
> so this isn't the fault of this patch. I think the underlying API and
> libclang should be supplemented with the ability to construct the
> database from an arbitrary file or possibly even a buffer with an
> associated compilation database type (CXCompilationDBType_JSON).
Manuel already answered this one.
We can make the other possibilities available in Index.h and the python
biding later on.

> While we're on the subject of limitations in the underlying API, the
> lack of the ability to extract *all* compilation commands is a serious
> downer. I can think of a number of use cases where one would crawl all
> compilation databases and operate on all commands. Are those users
> expected to implement their own parsers for the compilation database
> and not go through the tooling/libclang API?
> Whew, that was a lot. Again, I don't have clout in C land, so don't go
> integrating my feedback before others have time to rebuke.
> On to the Python!
> Nit: Include space after comma between function arguments. This is
> missing throughout.
I will fix that.
> Nit: Again, I like "CompilationDB" not "CompilationDb." I may like
> "CompilationDatabase" even more.
Again, I have no preference there. I like the principle that the same
concept spells the same way thru Index.h, the python binding and the C++
original library. However, CompilationDatabase created really long names
(e.g. clang_tooling_getCompilationDatabase_fromDirectory).
> Nit: I like wrapping the method summary on the same line as """. i.e.
> don't waste lines.
I will fix that.
>> @property
>> def getDirectory(self):
> This produces code that looks like:
>> dir = command.getDirectory
> This looks weird. To most Python programmers, it looks like you are
> retrieving the underlying method. Rename your properties so they are
> nouns, not verbs. e.g. "directory," "commandLine."
I will fix that.
> For the CompileCommand arguments, I would make it a "arguments"
> property or a "getArguments" method. To me, "command line" implies the
> full string, not a list of strings.
> On a related note, there is no need to create inner iterator classes.
> Instead, just use yield, which does all the magic for you. e.g.
>> def getArguments(self):
>>     length = CompileCommand_getNumArgs(self.cl)
>>     for i in xrange(length):
>>         yield CompileCommand_getArg(self.cl, i)
Will fix that too
>> Of course, you lose the ability to do efficient len() and subscript
>> access. Callers can gain this back by doing list(foo.getArguments()),
>> etc. But, that buffers everything, which could have performance
>> side-effects. That could be a problem for cursor iteration. I don't
>> think it is a problem for compilation databases. Although, it /might/
>> be if we ever expose an API to retrieve all CompileCommands. The good
>> news is you can swap in a custom iterator later and the API is
>> backwards-compatible (yield returns an iterator magically under the
>> hood). My vote is to optimize for fewer lines of code now. Complexity
>> can always be added later if needed for performance.
>> The current Python API is currently inconsistent with its use of
>> iterators in terms of using properties or methods. I /think/ I like
>> methods more. But, I'm fine either way as long as things are
>> documented.
>> In CompilationDb.fromDirectory, the ptr validation can be done in a
>> from_result function registered on the ctypes library. I would prefer
>> this, as that ensures all call sites have the same behavior (nevermind
>> there will only likely be 1 call site).
I will add the from_result check.
>> +    try:
>> +        cmds = cdb.getCompileCommands('file_do_not_exist.cpp')
>> +    except KeyError:
>> +        assert True
>> +    else:
>> +        assert False
>> You can do this?! I guess you do learn something new every day!
What is surprising you there ? I am definitely not a python expert :)
>> +    cmdline = ' '.join(cmds[0].getCommandLine)
>> +    assert cmdline == 'clang++ -o project.o -c
>> /home/john.doe/MyProject/project.cpp'
>> Please compare lists instead. Every time you compare derived entities,
>> you introduce a point for subtle failures.
Will fix that too.
>> OK. That's a lot of feedback. I think I'm done :)
>> Again, great patch and thank you for putting the time into it. Despite
>> the length of this reply, things are looking good. Most of my comments
>> are cosmetic in nature, so they should be trivial to address. Again,
>> expect conflicting feedback on my C comments.
>> Gregory
Thanks again for your review !

Arnaud de Grandmaison
Senior CPU engineer
Business Unit Digital Tuner

Parrot S.A.
174, quai de Jemmapes
75010 Paris - France
Phone: +33 1 48 03 84 59

More information about the cfe-dev mailing list