<div class="gmail_quote">Bikeshedding! /me jumps right in</div><div class="gmail_quote"><br></div><div class="gmail_quote">On Mon, Jun 11, 2012 at 2:39 PM, Arnaud de Grandmaison <span dir="ltr"><<a href="mailto:arnaud.allarddegrandmaison@parrot.com" target="_blank">arnaud.allarddegrandmaison@parrot.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks for taking time for this lengthy review :)<br>
<br>
See my comments inline.<br>
<div><div class="h5"><br>
On 06/11/2012 01:06 PM, Gregory Szorc wrote:<br>
> On Mon, Jun 11, 2012 at 10:32 AM, Arnaud de Grandmaison<br>
> <<a href="mailto:arnaud.allarddegrandmaison@parrot.com">arnaud.allarddegrandmaison@parrot.com</a>> wrote:<br>
>> Hi Manuel,<br>
>><br>
>> I am done adding the CompilationDatabase functionality to libclang<br>
>> (first patch)<br>
>> and the python binding (second patch).<br>
>><br>
>> With respect to our last email exchange :<br>
>> - comments have been modified to be futureproof with respect to<br>
>> database actual format.<br>
>> - the python binding is using exceptions for operations which have failed<br>
>> - tests have been added for the libclang part as well as the python binding<br>
>><br>
>> The tests are making an assumption I am not totally happy with : in case<br>
>> a file has several CompileCommands, the tests expect to get the<br>
>> CompileCommands in the same order they appear in the database.<br>
>><br>
>> In c-index-test, I have added a 'dirname' function for use on Windows<br>
>> platforms, assuming that if 'basename' is not available, than 'dirname'<br>
>> would not. I do not know if this is necessary, and I have not tested it<br>
>> as I have no windows platform available.<br>
> Yay - new functionality! Overall a great patch. I can't wait to see<br>
> this integrated.<br>
><br>
> I do have a number of comments though.<br>
><br>
> I am lacking clout on the C front, so take that feedback with a grain<br>
> of salt. I'm a maintainer of the Python bindings though, so that must<br>
> mean something.<br>
><br>
> I've tried to fight this battle before and I'll say it again: I don't<br>
> like the existing naming convention for the libclang APIs.<br>
> Specifically, there isn't a consistent one. And, this patch continues<br>
> that trend (not your fault - you were basing it on existing code,<br>
> which is the right thing to do).<br>
><br>
> Let's look at the new API (in source order):<br>
><br>
> CXString clang_tooling_CompileCommand_getDirectory(CXCompileCommand);<br>
> unsigned clang_tooling_CompileCommand_getNumArgs(CXCompileCommand);<br>
> CXString clang_tooling_CompileCommand_getArg(CXCompileCommand, unsigned i);<br>
> void clang_tooling_disposeCompilationDb(CXCompilationDb);<br>
> CXCompilationDb clang_tooling_getCompilationDb_fromDirectory(const<br>
> char *buildDir);<br>
> void clang_tooling_disposeCompileCommands(CXCompileCommands);<br>
> unsigned clang_tooling_CompileCommands_getSize(CXCompileCommands);<br>
> CXCompileCommand<br>
> clang_tooling_CompileCommands_getCommand(CXCompileCommands, unsigned<br>
> i);<br>
> CXCompileCommands clang_tooling_getCompileCommands(CXCompilationDb<br>
> CDb, const char *completeFileName);<br>
><br>
> First, some nits. I would change the source order of the declarations<br>
> so related APIs are grouped together then sorted from highest to<br>
> lowest level. Start with all the functions operating on<br>
> CXCompilationDb. Then define everything operating on<br>
> CXCompileCommands. Finally CXCompileCommand. This will increase<br>
> readability, IMO. Also, always place the "constructors" and<br>
> "destructors" next to each other and reference the other in the<br>
> doxygen blocks. This increases the chances that the destructors are<br>
> called appropriately.<br>
</div></div>Yes. Things are in a bottom up order because I wanted each 'type' and<br>
its related functions to be grouped together. Doing things top down will<br>
force to split the type definition (usually void * for now) from its<br>
related functions, which I, as a developer, do not really like.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
<br>
> The use of "Db" rather than "DB" looks unpleasant to me. I think you<br>
> should capitalize both letters.<br>
</div>I did hesitate between both. Let's see how others react.<br></blockquote><div><br></div><div>Can we spell out Database? /me ducks. Otherwise I'd vote for DB, too. Db looks strange.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
> Everything is prefixed with "clang_tooling_." The "tooling" bit is new<br>
> to libclang. I personally like prefixing this way as it draws a clear<br>
> line between underlying modules. But, I know others feel differently<br>
> and may argue for its elimination. If we decide to keep it, perhaps a<br>
> new header, Tooling.h, would be the appropriate place to declare these<br>
> APIs since they are clearly separate from "index." This would extend<br>
> to the Python API as well.<br>
</div>Also though about that. Although the tooling part is clearly a separate<br>
module, it will most probably be used together with the "index". So<br>
having a separate CTooling.h makes sense to me, but it should be made<br>
available to the users thru the Index.h. The same holds for the python<br>
binding. The Index acts more as an umbrella from which you can draw<br>
different submodules. Beside, some functionality from Index is reused :<br>
CXString for example.<br>
<div class="im"><br>
> I really like your general naming convention of<br>
> clang_tooling_<thing>_<action>. IMO, this is a proper naming<br>
> convention. Related functions are naturally grouped together and the<br>
> name communicates what they operate on and the action being performed.<br>
> Unfortunately, the rest of the libclang API isn't like this.<br>
> Furthermore, your patch isn't consistent in this style. e.g.<br>
> "clang_tooling_disposeCompilationDb" should be<br>
> "clang_tooling_CompilationDb_dispose" and<br>
> "clang_tooling_getCompilationDb_fromDirectory" should be<br>
> "clang_tooling_CompilationDb_getFromDirectory" or even<br>
> "clang_tooling_CompilationDb_fromDirectory," etc. Since the existing<br>
> API doesn't use my preferred naming convention, I guess this means you<br>
> don't have to adopt it. Whatever you end up doing, please make it<br>
> consistent because an inconsistent convention is worse than a<br>
> consistently "bad" one.<br>
</div>I agree. I did not adhere to the clang_tooling_<thing>_<action><br>
convention for the dispose functions to stick with the actual convention<br>
used in the rest of index.h for those functions.<br>
<div class="im"><br>
> I wonder if a clang_tooling_CompileCommand_getArgs(CXCompileCommand,<br>
> CXString **, size_t *size) (or similar) would be helpful. This API<br>
> would allocate a new array of CXString or CXString* in one go instead<br>
> of requiring a function call for each argument. Or, you could have the<br>
> caller pre-allocate the array and the function would fill. Semantics.<br>
> I think the majority of users are interested in most or all arguments<br>
> and this API would save some function call overhead (which could be<br>
> noticeable in places like the Python bindings). I can make the same<br>
> argument for having a<br>
> clang_tooling_CompileCommands_getCommands(CXCompileCommands,<br>
> CXCompileCommand **, size_t *).<br>
</div>I also though about this. My belief is that we should provide a "low<br>
level access" for the few people needing this power --- this is the<br>
current function, plus a "high level" function which will be used in 99%<br>
of the case and has no performance penalty.<br>
<br>
Taking a CompileCommand as an example, I am thinking of :<br>
- CXString clang_tooling_CompileCommand_getArgs(CXCompileCommand);<br>
- CXString clang_tooling_CompileCommand_getExe(CXCompileCommand);<br>
to get all arguments concatenated (without the executable part) and the<br>
executable separately. This beside expresses an invariant from<br>
CompileCommand.<br>
<br>
Or even :<br>
- CXString clang_tooling_CompileCommand_getCmdLine(CXCompileCommand);<br>
to get the full command line, ready for execution.<br>
<div class="im"><br>
> I really don't like the magic of compilation databases being<br>
> constructed from a directory with the auto-discovery of<br>
> "compile_commands.json." I know this is how the underlying API works,<br>
> so this isn't the fault of this patch. I think the underlying API and<br>
> libclang should be supplemented with the ability to construct the<br>
> database from an arbitrary file or possibly even a buffer with an<br>
> associated compilation database type (CXCompilationDBType_JSON).<br>
</div>Manuel already answered this one.<br>
We can make the other possibilities available in Index.h and the python<br>
biding later on.<br>
<div class="im"><br>
><br>
> While we're on the subject of limitations in the underlying API, the<br>
> lack of the ability to extract *all* compilation commands is a serious<br>
> downer. I can think of a number of use cases where one would crawl all<br>
> compilation databases and operate on all commands. Are those users<br>
> expected to implement their own parsers for the compilation database<br>
> and not go through the tooling/libclang API?<br>
><br>
> Whew, that was a lot. Again, I don't have clout in C land, so don't go<br>
> integrating my feedback before others have time to rebuke.<br>
><br>
> On to the Python!<br>
><br>
> Nit: Include space after comma between function arguments. This is<br>
> missing throughout.<br>
</div>I will fix that.<br>
<div class="im">><br>
> Nit: Again, I like "CompilationDB" not "CompilationDb." I may like<br>
> "CompilationDatabase" even more.<br>
</div>Again, I have no preference there. I like the principle that the same<br>
concept spells the same way thru Index.h, the python binding and the C++<br>
original library. However, CompilationDatabase created really long names<br>
(e.g. clang_tooling_getCompilationDatabase_fromDirectory).<br>
<div class="im">><br>
> Nit: I like wrapping the method summary on the same line as """. i.e.<br>
> don't waste lines.<br>
</div>I will fix that.<br>
<div class="im">>> @property<br>
>> def getDirectory(self):<br>
> This produces code that looks like:<br>
><br>
>> dir = command.getDirectory<br>
> This looks weird. To most Python programmers, it looks like you are<br>
> retrieving the underlying method. Rename your properties so they are<br>
> nouns, not verbs. e.g. "directory," "commandLine."<br>
</div>I will fix that.<br>
<div class="im">> For the CompileCommand arguments, I would make it a "arguments"<br>
> property or a "getArguments" method. To me, "command line" implies the<br>
> full string, not a list of strings.<br>
</div>OK<br>
<div class="im">> On a related note, there is no need to create inner iterator classes.<br>
> Instead, just use yield, which does all the magic for you. e.g.<br>
><br>
>> def getArguments(self):<br>
>> length = CompileCommand_getNumArgs(<a href="http://self.cl" target="_blank">self.cl</a>)<br>
>> for i in xrange(length):<br>
>> yield CompileCommand_getArg(<a href="http://self.cl" target="_blank">self.cl</a>, i)<br>
</div>Will fix that too<br>
<div class="im">>> Of course, you lose the ability to do efficient len() and subscript<br>
>> access. Callers can gain this back by doing list(foo.getArguments()),<br>
>> etc. But, that buffers everything, which could have performance<br>
>> side-effects. That could be a problem for cursor iteration. I don't<br>
>> think it is a problem for compilation databases. Although, it /might/<br>
>> be if we ever expose an API to retrieve all CompileCommands. The good<br>
>> news is you can swap in a custom iterator later and the API is<br>
>> backwards-compatible (yield returns an iterator magically under the<br>
>> hood). My vote is to optimize for fewer lines of code now. Complexity<br>
>> can always be added later if needed for performance.<br>
>><br>
>> The current Python API is currently inconsistent with its use of<br>
>> iterators in terms of using properties or methods. I /think/ I like<br>
>> methods more. But, I'm fine either way as long as things are<br>
>> documented.<br>
>><br>
>> In CompilationDb.fromDirectory, the ptr validation can be done in a<br>
>> from_result function registered on the ctypes library. I would prefer<br>
>> this, as that ensures all call sites have the same behavior (nevermind<br>
>> there will only likely be 1 call site).<br>
</div>I will add the from_result check.<br>
<div class="im">>><br>
>> + try:<br>
>> + cmds = cdb.getCompileCommands('file_do_not_exist.cpp')<br>
>> + except KeyError:<br>
>> + assert True<br>
>> + else:<br>
>> + assert False<br>
>><br>
>> You can do this?! I guess you do learn something new every day!<br>
</div>What is surprising you there ? I am definitely not a python expert :)<br>
<div class="im">>><br>
>> + cmdline = ' '.join(cmds[0].getCommandLine)<br>
>> + assert cmdline == 'clang++ -o project.o -c<br>
>> /home/john.doe/MyProject/project.cpp'<br>
>><br>
>> Please compare lists instead. Every time you compare derived entities,<br>
>> you introduce a point for subtle failures.<br>
</div>Will fix that too.<br>
<div class="im">>><br>
>> OK. That's a lot of feedback. I think I'm done :)<br>
>><br>
>> Again, great patch and thank you for putting the time into it. Despite<br>
>> the length of this reply, things are looking good. Most of my comments<br>
>> are cosmetic in nature, so they should be trivial to address. Again,<br>
>> expect conflicting feedback on my C comments.<br>
>><br>
>> Gregory<br>
</div>Thanks again for your review !<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Arnaud de Grandmaison<br>
Senior CPU engineer<br>
Business Unit Digital Tuner<br>
<br>
Parrot S.A.<br>
174, quai de Jemmapes<br>
75010 Paris - France<br>
Phone: <a href="tel:%2B33%201%2048%2003%2084%2059" value="+33148038459">+33 1 48 03 84 59</a><br>
<br>
</font></span></blockquote></div><br>