[cfe-dev] CompilationDatabase support added to libclang and python binding

Mon Jun 11 07:23:33 PDT 2012

Bikeshedding! /me jumps right in

On Mon, Jun 11, 2012 at 2:39 PM, Arnaud de Grandmaison <
arnaud.allarddegrandmaison at parrot.com> wrote:

> Thanks for taking time for this lengthy review :)
>
> See my comments inline.
>
> On 06/11/2012 01:06 PM, Gregory Szorc wrote:
> > On Mon, Jun 11, 2012 at 10:32 AM, Arnaud de Grandmaison
> > <arnaud.allarddegrandmaison at parrot.com> wrote:
> >> Hi Manuel,
> >>
> >> I am done adding the CompilationDatabase functionality to libclang
> >> (first patch)
> >> and the python binding (second patch).
> >>
> >> With respect to our last email exchange :
> >>  - comments have been modified to be futureproof with respect to
> >> database actual format.
> >>  - the python binding is using exceptions for operations which have
> failed
> >>  - tests have been added for the libclang part as well as the python
> binding
> >>
> >> The tests are making an assumption I am not totally happy with : in case
> >> a file has several CompileCommands, the tests expect to get the
> >> CompileCommands in the same order they appear in the database.
> >>
> >> In c-index-test, I have added a 'dirname' function for use on Windows
> >> platforms, assuming that if 'basename' is not available, than 'dirname'
> >> would not. I do not know if this is necessary, and I have not tested it
> >> as I have no windows platform available.
> > Yay - new functionality! Overall a great patch. I can't wait to see
> > this integrated.
> >
> > I do have a number of comments though.
> >
> > I am lacking clout on the C front, so take that feedback with a grain
> > of salt. I'm a maintainer of the Python bindings though, so that must
> > mean something.
> >
> > I've tried to fight this battle before and I'll say it again: I don't
> > like the existing naming convention for the libclang APIs.
> > Specifically, there isn't a consistent one. And, this patch continues
> > that trend (not your fault - you were basing it on existing code,
> > which is the right thing to do).
> >
> > Let's look at the new API (in source order):
> >
> > CXString clang_tooling_CompileCommand_getDirectory(CXCompileCommand);
> > unsigned clang_tooling_CompileCommand_getNumArgs(CXCompileCommand);
> > CXString clang_tooling_CompileCommand_getArg(CXCompileCommand, unsigned
> i);
> > void clang_tooling_disposeCompilationDb(CXCompilationDb);
> > CXCompilationDb clang_tooling_getCompilationDb_fromDirectory(const
> > char *buildDir);
> > void clang_tooling_disposeCompileCommands(CXCompileCommands);
> > unsigned clang_tooling_CompileCommands_getSize(CXCompileCommands);
> > CXCompileCommand
> > clang_tooling_CompileCommands_getCommand(CXCompileCommands, unsigned
> > i);
> > CXCompileCommands clang_tooling_getCompileCommands(CXCompilationDb
> > CDb, const char *completeFileName);
> >
> > First, some nits. I would change the source order of the declarations
> > so related APIs are grouped together then sorted from highest to
> > lowest level. Start with all the functions operating on
> > CXCompilationDb. Then define everything operating on
> > CXCompileCommands. Finally CXCompileCommand. This will increase
> > readability, IMO. Also, always place the "constructors" and
> > "destructors" next to each other and reference the other in the
> > doxygen blocks. This increases the chances that the destructors are
> > called appropriately.
> Yes. Things are in a bottom up order because I wanted each 'type' and
> its related functions to be grouped together. Doing things top down will
> force to split the type definition (usually void * for now) from its
> related functions, which I, as a developer, do not really like.

>
> > The use of "Db" rather than "DB" looks unpleasant to me. I think you
> > should capitalize both letters.
> I did hesitate between both. Let's see how others react.
>

Can we spell out Database? /me ducks. Otherwise I'd vote for DB, too. Db
looks strange.

> Everything is prefixed with "clang_tooling_." The "tooling" bit is new
> > to libclang. I personally like prefixing this way as it draws a clear
> > line between underlying modules. But, I know others feel differently
> > and may argue for its elimination. If we decide to keep it, perhaps a
> > new header, Tooling.h, would be the appropriate place to declare these
> > APIs since they are clearly separate from "index." This would extend
> > to the Python API as well.
> Also though about that. Although the tooling part is clearly a separate
> module, it will most probably be used together with the "index". So
> having a separate CTooling.h makes sense to me, but it should be made
> available to the users thru the Index.h. The same holds for the python
> binding. The Index acts more as an umbrella from which you can draw
> different submodules. Beside, some functionality from Index is reused :
> CXString for example.
>
> > I really like your general naming convention of
> > clang_tooling_<thing>_<action>. IMO, this is a proper naming
> > convention. Related functions are naturally grouped together and the
> > name communicates what they operate on and the action being performed.
> > Unfortunately, the rest of the libclang API isn't like this.
> > Furthermore, your patch isn't consistent in this style. e.g.
> > "clang_tooling_disposeCompilationDb" should be
> > "clang_tooling_CompilationDb_dispose" and
> > "clang_tooling_getCompilationDb_fromDirectory" should be
> > "clang_tooling_CompilationDb_getFromDirectory" or even
> > "clang_tooling_CompilationDb_fromDirectory," etc. Since the existing
> > API doesn't use my preferred naming convention, I guess this means you
> > don't have to adopt it. Whatever you end up doing, please make it
> > consistent because an inconsistent convention is worse than a
> > consistently "bad" one.
> I agree. I did not adhere to the clang_tooling_<thing>_<action>
> convention for the dispose functions to stick with the actual convention
> used in the rest of index.h for those functions.
>
> > I wonder if a clang_tooling_CompileCommand_getArgs(CXCompileCommand,
> > CXString **, size_t *size) (or similar) would be helpful. This API
> > would allocate a new array of CXString or CXString* in one go instead
> > of requiring a function call for each argument. Or, you could have the
> > caller pre-allocate the array and the function would fill. Semantics.
> > I think the majority of users are interested in most or all arguments
> > and this API would save some function call overhead (which could be
> > noticeable in places like the Python bindings). I can make the same
> > argument for having a
> > clang_tooling_CompileCommands_getCommands(CXCompileCommands,
> > CXCompileCommand **, size_t *).
> I also though about this. My belief is that we should provide a "low
> level access" for the few people needing this power --- this is the
> current function, plus a "high level" function which will be used in 99%
> of the case and has no performance penalty.
>
> Taking a CompileCommand as an example, I am thinking of :
>  - CXString clang_tooling_CompileCommand_getArgs(CXCompileCommand);
>  - CXString clang_tooling_CompileCommand_getExe(CXCompileCommand);
> to get all arguments concatenated (without the executable part) and the
> executable separately. This beside expresses an invariant from
> CompileCommand.
>
> Or even :
>  - CXString clang_tooling_CompileCommand_getCmdLine(CXCompileCommand);
> to get the full command line, ready for execution.
>
> > I really don't like the magic of compilation databases being
> > constructed from a directory with the auto-discovery of
> > "compile_commands.json." I know this is how the underlying API works,
> > so this isn't the fault of this patch. I think the underlying API and
> > libclang should be supplemented with the ability to construct the
> > database from an arbitrary file or possibly even a buffer with an
> > associated compilation database type (CXCompilationDBType_JSON).
> Manuel already answered this one.
> We can make the other possibilities available in Index.h and the python
> biding later on.
>
> >
> > While we're on the subject of limitations in the underlying API, the
> > lack of the ability to extract *all* compilation commands is a serious
> > downer. I can think of a number of use cases where one would crawl all
> > compilation databases and operate on all commands. Are those users
> > expected to implement their own parsers for the compilation database
> > and not go through the tooling/libclang API?
> >
> > Whew, that was a lot. Again, I don't have clout in C land, so don't go
> > integrating my feedback before others have time to rebuke.
> >
> > On to the Python!
> >
> > Nit: Include space after comma between function arguments. This is
> > missing throughout.
> I will fix that.
> >
> > Nit: Again, I like "CompilationDB" not "CompilationDb." I may like
> > "CompilationDatabase" even more.
> Again, I have no preference there. I like the principle that the same
> concept spells the same way thru Index.h, the python binding and the C++
> original library. However, CompilationDatabase created really long names
> (e.g. clang_tooling_getCompilationDatabase_fromDirectory).
> >
> > Nit: I like wrapping the method summary on the same line as """. i.e.
> > don't waste lines.
> I will fix that.
> >> @property
> >> def getDirectory(self):
> > This produces code that looks like:
> >
> >> dir = command.getDirectory
> > This looks weird. To most Python programmers, it looks like you are
> > retrieving the underlying method. Rename your properties so they are
> > nouns, not verbs. e.g. "directory," "commandLine."
> I will fix that.
> > For the CompileCommand arguments, I would make it a "arguments"
> > property or a "getArguments" method. To me, "command line" implies the
> > full string, not a list of strings.
> OK
> > On a related note, there is no need to create inner iterator classes.
> > Instead, just use yield, which does all the magic for you. e.g.
> >
> >> def getArguments(self):
> >>     length = CompileCommand_getNumArgs(self.cl)
> >>     for i in xrange(length):
> >>         yield CompileCommand_getArg(self.cl, i)
> Will fix that too
> >> Of course, you lose the ability to do efficient len() and subscript
> >> access. Callers can gain this back by doing list(foo.getArguments()),
> >> etc. But, that buffers everything, which could have performance
> >> side-effects. That could be a problem for cursor iteration. I don't
> >> think it is a problem for compilation databases. Although, it /might/
> >> be if we ever expose an API to retrieve all CompileCommands. The good
> >> news is you can swap in a custom iterator later and the API is
> >> backwards-compatible (yield returns an iterator magically under the
> >> hood). My vote is to optimize for fewer lines of code now. Complexity
> >> can always be added later if needed for performance.
> >>
> >> The current Python API is currently inconsistent with its use of
> >> iterators in terms of using properties or methods. I /think/ I like
> >> methods more. But, I'm fine either way as long as things are
> >> documented.
> >>
> >> In CompilationDb.fromDirectory, the ptr validation can be done in a
> >> from_result function registered on the ctypes library. I would prefer
> >> this, as that ensures all call sites have the same behavior (nevermind
> >> there will only likely be 1 call site).
> I will add the from_result check.
> >>
> >> +    try:
> >> +        cmds = cdb.getCompileCommands('file_do_not_exist.cpp')
> >> +    except KeyError:
> >> +        assert True
> >> +    else:
> >> +        assert False
> >>
> >> You can do this?! I guess you do learn something new every day!
> What is surprising you there ? I am definitely not a python expert :)
> >>
> >> +    cmdline = ' '.join(cmds[0].getCommandLine)
> >> +    assert cmdline == 'clang++ -o project.o -c
> >> /home/john.doe/MyProject/project.cpp'
> >>
> >> Please compare lists instead. Every time you compare derived entities,
> >> you introduce a point for subtle failures.
> Will fix that too.
> >>
> >> OK. That's a lot of feedback. I think I'm done :)
> >>
> >> Again, great patch and thank you for putting the time into it. Despite
> >> the length of this reply, things are looking good. Most of my comments
> >> are cosmetic in nature, so they should be trivial to address. Again,
> >> expect conflicting feedback on my C comments.
> >>
> >> Gregory
> Thanks again for your review !
>
> --
> Arnaud de Grandmaison
> Senior CPU engineer
> Business Unit Digital Tuner
>
> Parrot S.A.
> 174, quai de Jemmapes
> 75010 Paris - France
> Phone: +33 1 48 03 84 59
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120611/bcec6bc4/attachment.html>