[cfe-dev] [RFC] A C++ pseudo parser for tooling

Sam McCall via cfe-dev cfe-dev at lists.llvm.org
Tue Nov 9 09:09:58 PST 2021


On Tue, Nov 9, 2021 at 4:38 PM Demi Marie Obenour via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> >> * I think you are better off spending your time on optimizing the
> correct
> >> parser infrastructure.  I'm sure more can be done - particularly in
> terms
> >> of caching, persisting and resusing state (think like PCH and modules
> etc).
> >>
> > We have worked on projects over several years to improve these things
> (and
> > other aspects such as error-resilience). We agree there's more that can
> be
> > done, and will continue to work on this. We don't believe this approach
> > will get anywhere near a 100x latency improvement, which is what we're
> > looking for.
>
> What about pushing the state to a server?  Have a server that has the
> entire
> index, and keeps it up to date whenever a VCS commit is made to the main
> branch.

We have this for clangd's index: https://clangd.llvm.org/guides/remote-index
It works great (try it out with LLVM!) but needing to deploy a server means
90% of users won't ever touch it.

(A shared repository of serialized ASTs *is* something we're considering in
a tightly controlled corp environment but the barriers are pretty huge:
size, security and it only works if everyone uses the same precise version
of the tool. And it only makes sense at all if you're sure you can download
300MB in less than 30 seconds!)

David Blaikie wrote:
> Minor implementation question/request/probably already covered: Hopefully
this'll be fuzz tested (both asan fuzzing for crashes, but also general
behavioral correctness fuzzing too, if practical) from the start?
Thanks, we should definitely do crash fuzzing at least, will add it to the
doc. I don't really understand how behavioral fuzzing would work. At google
we have access to a firehose of incomplete code snapshots that we'd also
use for evaluation.
(Regarding stability: C++ isn't the ideal implementation language here. But
we'd like to depend on the clang lexer, be depended on by clangd, and be
accessible to the C++ community).

> Is it expected that this will fail gracefully? (if it can't correctly
perform the transformation, it'll be able to explicitly notify the user?
Two parts here: what does the parser produce and what does the tool on top
of it do?

The parser's tree output will include places where it knows there was an
error (and maybe how the error was corrected).
But in case of ambiguity it won't know, e.g. if it parsed "Foo(42);" as a
function call but it was actually a constructor.

The behavior on top is up to the tool, e.g. an indexer needs to make a
decision because it's not useful to display a warning each time you query
the index.

> and the user has to look at the diff to understand whether the tool got
it right?
Ultimately yes.
If you don't have both a human reviewer and automated tests for every
change, you're on shaky ground.
This is true for human-authored changes and clang-based tools like
clang-tidy too.
(Of course the accuracy level is still important!)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211109/db4b7643/attachment.html>


More information about the cfe-dev mailing list