[cfe-dev] [RFC] A C++ pseudo parser for tooling

Tue Nov 9 08:20:08 PST 2021

On Tue, Nov 9, 2021 at 7:25 AM Sam McCall <sammccall at google.com> wrote:

> On Sun, Nov 7, 2021 at 6:14 AM David Blaikie <dblaikie at gmail.com> wrote:
>
>> On Sat, Nov 6, 2021 at 8:50 AM Sam McCall <sammccall at google.com> wrote:
>>
>>> On Sat, 6 Nov 2021, 03:36 David Blaikie via cfe-dev, <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Yeah, FWIW I'd +1 Andrew's comments here - it was sort of one major
>>>> premise of clang being designed as a reusable library, that C++ is just too
>>>> complicated to reimplement separately/repeatedly in various tools.
>>>>
>>> Yes. This is a good argument for reusable implementations, but I'm not
>>> sure one is enough.
>>> c.f. clang-format not using clang beyond the lexer, and the success
>>> attributable to that.
>>> Ideally we'd share an impl there, in practice its maturity as a product
>>> and concrete design choices in its parser combine to make that hard.
>>>
>>
>> Given the long time scale of these things - any chance of a plan to
>> converge clang-format and this new thing eventually? (so we have 2 rather
>> than 3 versions of C++ understanding in the LLVM project)
>>
>>
>>> For something's that's going to change significant code - how slow is a
>>>> clang-based solution? What's the tradeoff being made?
>>>>
>>> Basically it's the difference between interactive latency and not.
>>> For our internal clangd deployment (because those are the numbers I
>>> have) 90%ile is most of a minute to parse headers, and several minutes in
>>> the build system to get ready (generated headers, flags...).
>>>
>>
>> How much of this work is equivalent/shared/cached by the build system?
>> (eg: if I just did a build, then I wanted to refactor a function - how long
>> are we talking there?)
>>
> The build system stuff is cacheable[1], so once you've done that, a tool
> might take 30 seconds (per file) each time you run it.
>
> For single-file operations (think go-to-definition), this is enough to
> avoid the tool. See the (lack of) popularity of clang-rename :-).
> This can be mitigated with PCH/preamble as in clangd, which still takes 30
> seconds to prepare, but now you can perform subsequent operations quickly.
> This startup delay is the #1 user complaint about clangd. (We have several
> significant optimizations here that trade off accuracy, and still).
> The PCH is typically hundreds of megabytes per source file, so caching it
> silently/indefinitely makes people unhappy - ask me how I know! In clangd
> we retain it while the user has the file open, which works OK for a
> stateful program.
>
> For codebase-wide operations (find-refs) the parsing time easily gets into
> hours.
> You can mitigate this by building an index, but *that* takes hours and
> it's a significant barrier.
>
> Bottom line: users want tools that are predictably fast (<100ms, including
> the first run).
>
> [1] In practice, because build system caches are mutable user-controlled
> state, cache sharing isn't transparent. Either tools don't share cache with
> the 'real' build, or the user is *required* to do a real build to get
> accurate results - I've seen both
>

Fair enough - thanks for the context!

Minor implementation question/request/probably already covered: Hopefully
this'll be fuzz tested (both asan fuzzing for crashes, but also general
behavioral correctness fuzzing too, if practical) from the start?
Is it expected that this will fail gracefully? (if it can't correctly
perform the transformation, it'll be able to explicitly notify the user? (I
guess not - presumably some of the things it doesn't understand it won't
know that it doesn't understand, and may mangle things - and the user has
to look at the diff to understand whether the tool got it right?))
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211109/a1b4da2a/attachment.html>