[cfe-dev] [RFC] A C++ pseudo parser for tooling

Tue Nov 9 07:25:07 PST 2021

On Sun, Nov 7, 2021 at 6:14 AM David Blaikie <dblaikie at gmail.com> wrote:

> On Sat, Nov 6, 2021 at 8:50 AM Sam McCall <sammccall at google.com> wrote:
>
>> On Sat, 6 Nov 2021, 03:36 David Blaikie via cfe-dev, <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Yeah, FWIW I'd +1 Andrew's comments here - it was sort of one major
>>> premise of clang being designed as a reusable library, that C++ is just too
>>> complicated to reimplement separately/repeatedly in various tools.
>>>
>> Yes. This is a good argument for reusable implementations, but I'm not
>> sure one is enough.
>> c.f. clang-format not using clang beyond the lexer, and the success
>> attributable to that.
>> Ideally we'd share an impl there, in practice its maturity as a product
>> and concrete design choices in its parser combine to make that hard.
>>
>
> Given the long time scale of these things - any chance of a plan to
> converge clang-format and this new thing eventually? (so we have 2 rather
> than 3 versions of C++ understanding in the LLVM project)
>
>
>> For something's that's going to change significant code - how slow is a
>>> clang-based solution? What's the tradeoff being made?
>>>
>> Basically it's the difference between interactive latency and not.
>> For our internal clangd deployment (because those are the numbers I have)
>> 90%ile is most of a minute to parse headers, and several minutes in the
>> build system to get ready (generated headers, flags...).
>>
>
> How much of this work is equivalent/shared/cached by the build system?
> (eg: if I just did a build, then I wanted to refactor a function - how long
> are we talking there?)
>
The build system stuff is cacheable[1], so once you've done that, a tool
might take 30 seconds (per file) each time you run it.

For single-file operations (think go-to-definition), this is enough to
avoid the tool. See the (lack of) popularity of clang-rename :-).
This can be mitigated with PCH/preamble as in clangd, which still takes 30
seconds to prepare, but now you can perform subsequent operations quickly.
This startup delay is the #1 user complaint about clangd. (We have several
significant optimizations here that trade off accuracy, and still).
The PCH is typically hundreds of megabytes per source file, so caching it
silently/indefinitely makes people unhappy - ask me how I know! In clangd
we retain it while the user has the file open, which works OK for a
stateful program.

For codebase-wide operations (find-refs) the parsing time easily gets into
hours.
You can mitigate this by building an index, but *that* takes hours and it's
a significant barrier.

Bottom line: users want tools that are predictably fast (<100ms, including
the first run).

[1] In practice, because build system caches are mutable user-controlled
state, cache sharing isn't transparent. Either tools don't share cache with
the 'real' build, or the user is *required* to do a real build to get
accurate results - I've seen both

>
>> Secondarily, it's the difference between just using the tool and having
>> to "set it up". We do a lot of user support for clangd and I can tell you
>> this is a nontrivial concern. (For people who build with something that's
>> not recent mainline clang/gcc, target weird platforms, don't build on the
>> machine they edit on, use non-cmake build systems, ...)
>>
>
> The second one I have less concern for, I'll admit.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211109/0b70c282/attachment.html>