[cfe-dev] [RFC] A C++ pseudo parser for tooling

Demi Marie Obenour via cfe-dev cfe-dev at lists.llvm.org
Tue Nov 9 07:31:13 PST 2021


On 11/9/21 9:34 AM, Sam McCall via cfe-dev wrote:
> On Tue, Nov 9, 2021 at 3:40 AM Andrew Tomazos <andrewtomazos at gmail.com>
> wrote:
> 
>> On Mon, Nov 8, 2021 at 11:18 AM Haojian Wu <hokein at google.com> wrote:
>>
>>> IDE use cases (for clangd)
>>> -  provide code-folding, outline, syntax highlighting, selection features
>>> without a long "warmup" time;
>>> -  a fast index to provides approximate results;
>>>
>>> Other use cases we aim to support:
>>> - smart diff and merge tool for C++ code;
>>> - a fast linter, a cpplint replacement, with clang-tidy-like
>>> extensibility;
>>> - syntactic grep/sed tools;
>>>
>>
>> * I don't know what "fast index to provide approximate results" means.
>> Results of what?  Do you mean generating an index?  What will the index be
>> used for?
>>
> clangd has a symbol index to enable codebase-wide operations. (see
> SymbolIndex
> <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/index/Index.h>
> and
> some documentation <https://clangd.llvm.org/design/indexing>)
> These include:
> 
>    - go-to-definition: finding a definition associated with a declaration
>    visible in the AST
>    - code completion: for contexts where the AST cannot provide all results
>    efficiently, such as namespace scopes (including results from non-included
>    headers)
>    - cross-references: finding references from files that are not part of
>    the current AST
> 
> Today this index is built from ASTs in various ways (see docs), which takes
> many hours for large codebases (on machines too slow to build).
> Most results are missing for a long time. Many users turn off indexing
> (e.g. to avoid battery drain) and the results stay missing. If compile flag
> metadata is missing for the project, these features don't work at all.
> 
> The idea for clangd is to augment (not replace) this index with a
> pseudo-parser based index that processes each file once. It would be
> halfway between the AST index and grep. This index would provide the same
> operations with lower fidelity, and would be replaced by the AST-based
> index as it completes.
> 
> * Syntax highlighting is the only use case of those listed that can
>> tolerate inaccuracy.  For the rest, a correct parse will be more
>> productive.  The trouble is that if people start depending on these
>> features in their workflow, when they fail (and they often will) it will be
>> very disruptive.  The cost of the disruption outweighs the time saved
>> waiting for a correct parse.
>>
> Our experience with clangd is that people very often value latency over
> correctness when editing C++ code, and this is a situational, quantitative
> question.
> As examples, we've failed to replace cpplint and our heuristic outline with
> clang-tidy and our AST-based outline. Despite being inaccurate and
> incomplete, users find them useful and are not willing to wait.
> 
> 
>> * I think you are better off spending your time on optimizing the correct
>> parser infrastructure.  I'm sure more can be done - particularly in terms
>> of caching, persisting and resusing state (think like PCH and modules etc).
>>
> We have worked on projects over several years to improve these things (and
> other aspects such as error-resilience). We agree there's more that can be
> done, and will continue to work on this. We don't believe this approach
> will get anywhere near a 100x latency improvement, which is what we're
> looking for.

What about pushing the state to a server?  Have a server that has the entire
index, and keeps it up to date whenever a VCS commit is made to the main
branch.  Users then only need to compute deltas, which should be much smaller.

Sincerely,
Demi Marie Obenour (she/her/hers)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xB288B55FFF9C22C1.asc
Type: application/pgp-keys
Size: 4874 bytes
Desc: OpenPGP public key
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211109/90304ccd/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211109/90304ccd/attachment-0001.sig>


More information about the cfe-dev mailing list