[cfe-dev] [RFC] A C++ pseudo parser for tooling

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Sat Nov 6 22:14:16 PDT 2021


On Sat, Nov 6, 2021 at 8:50 AM Sam McCall <sammccall at google.com> wrote:

> On Sat, 6 Nov 2021, 03:36 David Blaikie via cfe-dev, <
> cfe-dev at lists.llvm.org> wrote:
>
>> Yeah, FWIW I'd +1 Andrew's comments here - it was sort of one major
>> premise of clang being designed as a reusable library, that C++ is just too
>> complicated to reimplement separately/repeatedly in various tools.
>>
> Yes. This is a good argument for reusable implementations, but I'm not
> sure one is enough.
> c.f. clang-format not using clang beyond the lexer, and the success
> attributable to that.
> Ideally we'd share an impl there, in practice its maturity as a product
> and concrete design choices in its parser combine to make that hard.
>

Given the long time scale of these things - any chance of a plan to
converge clang-format and this new thing eventually? (so we have 2 rather
than 3 versions of C++ understanding in the LLVM project)


> For something's that's going to change significant code - how slow is a
>> clang-based solution? What's the tradeoff being made?
>>
> Basically it's the difference between interactive latency and not.
> For our internal clangd deployment (because those are the numbers I have)
> 90%ile is most of a minute to parse headers, and several minutes in the
> build system to get ready (generated headers, flags...).
>

How much of this work is equivalent/shared/cached by the build system? (eg:
if I just did a build, then I wanted to refactor a function - how long are
we talking there?)


> Secondarily, it's the difference between just using the tool and having to
> "set it up". We do a lot of user support for clangd and I can tell you this
> is a nontrivial concern. (For people who build with something that's not
> recent mainline clang/gcc, target weird platforms, don't build on the
> machine they edit on, use non-cmake build systems, ...)
>

The second one I have less concern for, I'll admit.


>
> You can see this tradeoff play out in the recent discussions about whether
> an "east const" conversion belongs in clang-format vs clang-tidy: one of
> the arguments for putting it in clang-format is it's the only way to make
> it fast and easy enough that people want to use it.
>
>
>> On Fri, Nov 5, 2021 at 6:00 PM Andrew Tomazos via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Unfortunately it's not possible to parse C++ even close to accurately
>>> without preprocessing (and so build-system integration).  There are
>>> predefined macros that determine what code is conditionally included,
>>> conditionally included code can change basically anything, redefine
>>> anything.  Macros can expand to an arbitrary token sequence (or even create
>>> new tokens through stringization or concatenation).  It means that any
>>> identifier can become any token sequence.  That's even before we mention
>>> how name lookup is needed for disambiguation.  To parse C++ you in fact
>>> need to do full preprocessing and a large chunk of semantic analysis.
>>>
>>> Given how inaccurate the parse from the best possible "single source
>>> file" parser is - it's not clear what the use case is for it.  clang-format
>>> (largely) only makes whitespace changes, so there is limited opportunity
>>> for inaccuracies in its parse to lead to errors.
>>>
>>> To generate file outlines and do refactoring I suspect you're better off
>>> waiting for a proper parse than using a completely inaccurate one.  In the
>>> dev environment I use, past versions of the indexer had tried to do such an
>>> approximate parse, and current versions do a full correct C++ parse, so
>>> I've experienced the difference first-hand.  It's night and day.
>>>
>>> Just my 2c.  -Andrew
>>>
>>> On Fri, Nov 5, 2021 at 1:37 PM Haojian Wu via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> We’d like to propose a pseudo-parser which can approximately parse C++
>>>> (including broken code). It parses a file in isolation, without needing
>>>> headers, compile flags etc. Ambiguities are resolved heuristically, like
>>>> clang-format. Its output is a clang::syntax tree, which maps the token
>>>> sequence onto the C++ grammar.
>>>> Our motivation comes from wanting to add some low latency features
>>>> (file outline, refactorings etc) in clangd, but we think this is a useful
>>>> building block for other tools too.
>>>>
>>>> Design is discussed in detail here:
>>>> https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit?usp=sharing
>>>>
>>>> The proposal is based on the experience with a working prototype.
>>>> Initially, we will focus on building the foundation. We consider the first
>>>> version as experimental, and plan to use and validate it with applications
>>>> in clangd (the detailed plan is described here
>>>> <https://docs.google.com/document/d/1eGkTOsFja63wsv8v0vd5JdoTonj-NlN3ujGF0T7xDbM/edit#heading=h.mawgmexy688j>
>>>> ).
>>>>
>>>> As soon as we have consensus on the proposal, we plan to start this
>>>> work in the clang repository (code would be under clang/Tooling/Syntax). We
>>>> hope we can start sending out patches for review at the end of November.
>>>>
>>>> Eager to hear your thoughts. Comments and suggestions are much
>>>> appreciated.
>>>>
>>>> Thanks,
>>>> Haojian
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211106/bb967ae2/attachment-0001.html>


More information about the cfe-dev mailing list