[PATCH] D39050: Add index-while-building support to Clang

Marc-Andre Laperle via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 27 13:38:07 PST 2018


malaperle added a comment.

In https://reviews.llvm.org/D39050#949185, @malaperle wrote:

> In https://reviews.llvm.org/D39050#948500, @akyrtzi wrote:
>
> > @malaperle, to clarify we are not suggesting that you write your own parser, the suggestion is to use clang in 'fast-scan' mode to get the structure of the declarations of a single file, see `CXTranslationUnit_SingleFileParse` (along with enabling skipping of bodies). We have found clang is super fast when you only try to get the structure of a file like this.
>
>
> Thank you, that sounds very useful. I will try that and get some measurements.
>
> > We can make convenient APIs to provide the syntactic structure of declarations based on their location.
>
> Perhaps just for the end-loc since it's pretty much guaranteed to be needed by everyone. But if it's very straightforward, perhaps that's not needed. I'll try and see.
>
> > But let's say we added the end-loc, is it enough ? If you want to implement the 'peek the definition' like Eclipse, then it is not enough, you also need to figure out if there are documentation comments associated with the declaration and also show those. Also what if you want to highlight the type signature of a function, then just storing the location of the closing brace of its body is not enough. There can be any arbitrary things you may want to get from the structure of the declaration (e.g. the parameter ranges), but we could provide an API to gather any syntactic structure info you may want.
>
> That's a very good point. I guess in the back of my mind, I have the worry that one cannot extend what is stored, either for a different performance trade-off or for additional things. The fact that both clang and clangd have to agree on the format so that index-while-building can be used seems to make it inherently not possible to extend. But perhaps it's better to not overthink this for now.


I did a bit more of experimenting. For the end-loc, I changed my prototype so that the end-loc is not stored in the index but rather computed "on the fly" using SourceManager and Lexer only. For my little benchmark, I used the LLVM/Clang/Clangd code base which I queried for all references of "std" (the namespace) which is around 46K references in the index.

With end-loc in index: 3.45s on average    (20 samples)
With end-loc computed on the fly: 11.33s on average    (20 samples)
I also tried with Xcode but without too much success: it took about 30 secs to reach 45K results and then carried on for a long time and hung (although I didn't try to leave it for hours to see if it finished).

>From my perspective, it seems that the extra time is quite substantial and it doesn't seem worth to save an integer per occurrence in this case.

For computing the start/end-loc of function bodies, I tried the SingleFileParseMode and SkipFunctionBodies separately ( as a start). The source I use this on looks like this:

  #include "MyClass.h"
  
  MyClass::MyClass() {
  }
  
  void MyClass::doOperation() {
  }

With SingleFileParseMode, I get several errors:

> MyClass.cpp:5:1: error: use of undeclared identifier 'MyClass'
>  MyClass.cpp:8:6: error: use of undeclared identifier 'MyClass'

Then I cannot obtain any Decl* at the position of doOperation. With SingleFileParseMode, I'm also a bit weary that not processing headers will result in many inaccuracies. From our perspective, we are more wiling to sacrifice disk space in order to have more accuracy and speed. For comparison, the index I worked with containing all end-loc for occurrences and also function start/end is 201M for LLVM/Clang/Clangd which is small to us.

With SkipFunctionBodies alone, I can get the Decl* but FunctionDecl::getSourceRange() doesn't include the body, rather, it stops after the arguments.
It would be very nice if we could do this cheaply but it doesn't seem possible with those two flags alone. What did you have in mind for implementing an "API to gather any syntactic structure info" ?


https://reviews.llvm.org/D39050





More information about the cfe-commits mailing list