[cfe-dev] [RFC] A C++ pseudo parser for tooling

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Fri Nov 5 19:36:23 PDT 2021


Yeah, FWIW I'd +1 Andrew's comments here - it was sort of one major premise
of clang being designed as a reusable library, that C++ is just too
complicated to reimplement separately/repeatedly in various tools.

For something's that's going to change significant code - how slow is a
clang-based solution? What's the tradeoff being made?

On Fri, Nov 5, 2021 at 6:00 PM Andrew Tomazos via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Unfortunately it's not possible to parse C++ even close to accurately
> without preprocessing (and so build-system integration).  There are
> predefined macros that determine what code is conditionally included,
> conditionally included code can change basically anything, redefine
> anything.  Macros can expand to an arbitrary token sequence (or even create
> new tokens through stringization or concatenation).  It means that any
> identifier can become any token sequence.  That's even before we mention
> how name lookup is needed for disambiguation.  To parse C++ you in fact
> need to do full preprocessing and a large chunk of semantic analysis.
>
> Given how inaccurate the parse from the best possible "single source file"
> parser is - it's not clear what the use case is for it.  clang-format
> (largely) only makes whitespace changes, so there is limited opportunity
> for inaccuracies in its parse to lead to errors.
>
> To generate file outlines and do refactoring I suspect you're better off
> waiting for a proper parse than using a completely inaccurate one.  In the
> dev environment I use, past versions of the indexer had tried to do such an
> approximate parse, and current versions do a full correct C++ parse, so
> I've experienced the difference first-hand.  It's night and day.
>
> Just my 2c.  -Andrew
>
> On Fri, Nov 5, 2021 at 1:37 PM Haojian Wu via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hello everyone,
>>
>> We’d like to propose a pseudo-parser which can approximately parse C++
>> (including broken code). It parses a file in isolation, without needing
>> headers, compile flags etc. Ambiguities are resolved heuristically, like
>> clang-format. Its output is a clang::syntax tree, which maps the token
>> sequence onto the C++ grammar.
>> Our motivation comes from wanting to add some low latency features (file
>> outline, refactorings etc) in clangd, but we think this is a useful
>> building block for other tools too.
>>
>> Design is discussed in detail here:
>> https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit?usp=sharing
>>
>> The proposal is based on the experience with a working prototype.
>> Initially, we will focus on building the foundation. We consider the first
>> version as experimental, and plan to use and validate it with applications
>> in clangd (the detailed plan is described here
>> <https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit#heading=h.mawgmexy688j>
>> ).
>>
>> As soon as we have consensus on the proposal, we plan to start this work
>> in the clang repository (code would be under clang/Tooling/Syntax). We hope
>> we can start sending out patches for review at the end of November.
>>
>> Eager to hear your thoughts. Comments and suggestions are much
>> appreciated.
>>
>> Thanks,
>> Haojian
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211105/cb5e3bbd/attachment.html>


More information about the cfe-dev mailing list