[cfe-dev] CompilationDatabase support added to libclang and python binding

Mon Jun 11 04:42:34 PDT 2012

On Mon, Jun 11, 2012 at 1:06 PM, Gregory Szorc <gregory.szorc at gmail.com>wrote:

> On Mon, Jun 11, 2012 at 10:32 AM, Arnaud de Grandmaison
> <arnaud.allarddegrandmaison at parrot.com> wrote:
> > Hi Manuel,
> >
> > I am done adding the CompilationDatabase functionality to libclang
> > (first patch)
> > and the python binding (second patch).
> >
> > With respect to our last email exchange :
> >  - comments have been modified to be futureproof with respect to
> > database actual format.
> >  - the python binding is using exceptions for operations which have
> failed
> >  - tests have been added for the libclang part as well as the python
> binding
> >
> > The tests are making an assumption I am not totally happy with : in case
> > a file has several CompileCommands, the tests expect to get the
> > CompileCommands in the same order they appear in the database.
> >
> > In c-index-test, I have added a 'dirname' function for use on Windows
> > platforms, assuming that if 'basename' is not available, than 'dirname'
> > would not. I do not know if this is necessary, and I have not tested it
> > as I have no windows platform available.
>
> Yay - new functionality! Overall a great patch. I can't wait to see
> this integrated.
>
> I do have a number of comments though.
>
> I am lacking clout on the C front, so take that feedback with a grain
> of salt. I'm a maintainer of the Python bindings though, so that must
> mean something.
>
> I've tried to fight this battle before and I'll say it again: I don't
> like the existing naming convention for the libclang APIs.
> Specifically, there isn't a consistent one. And, this patch continues
> that trend (not your fault - you were basing it on existing code,
> which is the right thing to do).
>
> Let's look at the new API (in source order):
>
> CXString clang_tooling_CompileCommand_getDirectory(CXCompileCommand);
> unsigned clang_tooling_CompileCommand_getNumArgs(CXCompileCommand);
> CXString clang_tooling_CompileCommand_getArg(CXCompileCommand, unsigned i);
> void clang_tooling_disposeCompilationDb(CXCompilationDb);
> CXCompilationDb clang_tooling_getCompilationDb_fromDirectory(const
> char *buildDir);
> void clang_tooling_disposeCompileCommands(CXCompileCommands);
> unsigned clang_tooling_CompileCommands_getSize(CXCompileCommands);
> CXCompileCommand
> clang_tooling_CompileCommands_getCommand(CXCompileCommands, unsigned
> i);
> CXCompileCommands clang_tooling_getCompileCommands(CXCompilationDb
> CDb, const char *completeFileName);
>
> First, some nits. I would change the source order of the declarations
> so related APIs are grouped together then sorted from highest to
> lowest level. Start with all the functions operating on
> CXCompilationDb. Then define everything operating on
> CXCompileCommands. Finally CXCompileCommand. This will increase
> readability, IMO. Also, always place the "constructors" and
> "destructors" next to each other and reference the other in the
> doxygen blocks. This increases the chances that the destructors are
> called appropriately.
>
> The use of "Db" rather than "DB" looks unpleasant to me. I think you
> should capitalize both letters.
>
> Everything is prefixed with "clang_tooling_." The "tooling" bit is new
> to libclang. I personally like prefixing this way as it draws a clear
> line between underlying modules. But, I know others feel differently
> and may argue for its elimination. If we decide to keep it, perhaps a
> new header, Tooling.h, would be the appropriate place to declare these
> APIs since they are clearly separate from "index." This would extend
> to the Python API as well.
>
> I really like your general naming convention of
> clang_tooling_<thing>_<action>. IMO, this is a proper naming
> convention. Related functions are naturally grouped together and the
> name communicates what they operate on and the action being performed.
> Unfortunately, the rest of the libclang API isn't like this.
> Furthermore, your patch isn't consistent in this style. e.g.
> "clang_tooling_disposeCompilationDb" should be
> "clang_tooling_CompilationDb_dispose" and
> "clang_tooling_getCompilationDb_fromDirectory" should be
> "clang_tooling_CompilationDb_getFromDirectory" or even
> "clang_tooling_CompilationDb_fromDirectory," etc. Since the existing
> API doesn't use my preferred naming convention, I guess this means you
> don't have to adopt it. Whatever you end up doing, please make it
> consistent because an inconsistent convention is worse than a
> consistently "bad" one.
>
> I wonder if a clang_tooling_CompileCommand_getArgs(CXCompileCommand,
> CXString **, size_t *size) (or similar) would be helpful. This API
> would allocate a new array of CXString or CXString* in one go instead
> of requiring a function call for each argument. Or, you could have the
> caller pre-allocate the array and the function would fill. Semantics.
> I think the majority of users are interested in most or all arguments
> and this API would save some function call overhead (which could be
> noticeable in places like the Python bindings). I can make the same
> argument for having a
> clang_tooling_CompileCommands_getCommands(CXCompileCommands,
> CXCompileCommand **, size_t *).
>
> I really don't like the magic of compilation databases being
> constructed from a directory with the auto-discovery of
> "compile_commands.json." I know this is how the underlying API works,
> so this isn't the fault of this patch. I think the underlying API and
>

What don't you like about this? As a tool author, I don't want to care
about what underlying system somebody uses, I want my tool to just work. I
wonder what's different in your use case.

> libclang should be supplemented with the ability to construct the
> database from an arbitrary file or possibly even a buffer with an
> associated compilation database type (CXCompilationDBType_JSON).
>

This is already possible with the underlying API:
JSONCompilationDatabase::loadFromFile
JSONCompilationDatabase::loadFromBuffer
FixedCompilationDatabase::loadFromCommandLine

> While we're on the subject of limitations in the underlying API, the
> lack of the ability to extract *all* compilation commands is a serious
> downer. I can think of a number of use cases where one would crawl all
> compilation databases and operate on all commands. Are those users
> expected to implement their own parsers for the compilation database
> and not go through the tooling/libclang API?
>

Great idea. Even better, combine that with a
ClangTool::runOnAll(ASTFrontendAction*) which we can parallelize once clang
is sufficiently thread safe for this. Patch welcome :) (or, open a bug)

> Whew, that was a lot. Again, I don't have clout in C land, so don't go
> integrating my feedback before others have time to rebuke.
>
> On to the Python!
>
> Nit: Include space after comma between function arguments. This is
> missing throughout.
>
> Nit: Again, I like "CompilationDB" not "CompilationDb." I may like
> "CompilationDatabase" even more.
>
> Nit: I like wrapping the method summary on the same line as """. i.e.
> don't waste lines.
>
> > @property
> > def getDirectory(self):
>
> This produces code that looks like:
>
> > dir = command.getDirectory
>
> This looks weird. To most Python programmers, it looks like you are
> retrieving the underlying method. Rename your properties so they are
> nouns, not verbs. e.g. "directory," "commandLine."
>
> For the CompileCommand arguments, I would make it a "arguments"
> property or a "getArguments" method. To me, "command line" implies the
> full string, not a list of strings.
>
> On a related note, there is no need to create inner iterator classes.
> Instead, just use yield, which does all the magic for you. e.g.
>
> > def getArguments(self):
> >     length = CompileCommand_getNumArgs(self.cl)
> >     for i in xrange(length):
> >         yield CompileCommand_getArg(self.cl, i)
>
> Of course, you lose the ability to do efficient len() and subscript
> access. Callers can gain this back by doing list(foo.getArguments()),
> etc. But, that buffers everything, which could have performance
> side-effects. That could be a problem for cursor iteration. I don't
> think it is a problem for compilation databases. Although, it /might/
> be if we ever expose an API to retrieve all CompileCommands. The good
> news is you can swap in a custom iterator later and the API is
> backwards-compatible (yield returns an iterator magically under the
> hood). My vote is to optimize for fewer lines of code now. Complexity
> can always be added later if needed for performance.
>
> The current Python API is currently inconsistent with its use of
> iterators in terms of using properties or methods. I /think/ I like
> methods more. But, I'm fine either way as long as things are
> documented.
>
> In CompilationDb.fromDirectory, the ptr validation can be done in a
> from_result function registered on the ctypes library. I would prefer
> this, as that ensures all call sites have the same behavior (nevermind
> there will only likely be 1 call site).
>
> +    try:
> +        cmds = cdb.getCompileCommands('file_do_not_exist.cpp')
> +    except KeyError:
> +        assert True
> +    else:
> +        assert False
>
> You can do this?! I guess you do learn something new every day!
>
> +    cmdline = ' '.join(cmds[0].getCommandLine)
> +    assert cmdline == 'clang++ -o project.o -c
> /home/john.doe/MyProject/project.cpp'
>
> Please compare lists instead. Every time you compare derived entities,
> you introduce a point for subtle failures.
>
> OK. That's a lot of feedback. I think I'm done :)
>
> Again, great patch and thank you for putting the time into it. Despite
> the length of this reply, things are looking good. Most of my comments
> are cosmetic in nature, so they should be trivial to address. Again,
> expect conflicting feedback on my C comments.
>
> Gregory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120611/1a7eefe8/attachment.html>