[cfe-dev] parsing without building an AST tree

Wed Jul 25 10:31:13 PDT 2012

On Wed, Jul 25, 2012 at 6:01 PM, Anders Bakken <agbakken at gmail.com> wrote:
>
>
> --
>
> Anders
>
> On Jul 22, 2012, at 10:17 PM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Sat, Jul 21, 2012 at 4:09 AM, Anders Bakken <agbakken at gmail.com> wrote:
>>> Hi Manuel
>>>
>>> Thanks for the info. We didn't know about libtooling. It seems like
>>> there will be a lot of overlap between what we do and what clangd will
>>> do. Our approach has been not to persist translation units but rather
>>> tear out the information we need when we parse it and reparse when we
>>> need to. The main reasons we've found for this:
>>>
>>> 1) clang_reparse doesn't seem much (any) faster than parsing the whole
>>> thing over again.
>>
>> As far as I understand it reparsing only gets faster if you store a
>> precompiled preamble of the source files in between runs.
>>
>
> We did pass those flags to the initial clang_parseTranslationUnit call I believe. I could take another look.

Not having first-hand knowledge here, but having heard from multiple
people who tried this: it doesn't seem to be easy ;)

>
>>> 2) clang apis do not seem to give us a way to find references for a
>>> given cursor across translation units.
>>
>> USRs are made for that; I assume you've seen:
>> http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html
>>
>
> I"ve seen those Apis. We use them for certain things but I am not sure how that would help for this. E.g. Suppose I want all calls to printf across all my source files.

The idea would be that you point at the printf you want in one of your
TUs, and create the USR from that - then you can look up the USR in
your database. Obviously you might want to store the qualified name,
too, so users can do more inclusive textual queries without needing to
go through an existing TU. But for the queries where you already are
in a source file, the additional preciseness of the USR would seem
important to me.

Cheers,
/Manuel

>>> We'll be watching the project though when code starts appearing
>>> though. I imagine APIs will pop up in Index.h as they are needed by
>>> clangd.
>>
>> I don't think python APIs will appear first - the clangd project has
>> C++ clients as a first goal. (Python clients are a core goal, too, but
>> not as high prio I think).
>>
> C++ is what we want so that's good.
>
>> Cheers,
>> /Manuel
>>
>
> Thanks
>>
>>>
>>> regards
>>>
>>> Anders
>>>
>>> On Thu, Jul 19, 2012 at 1:18 AM, Manuel Klimek <klimek at google.com> wrote:
>>>> On Thu, Jul 19, 2012 at 5:14 AM, Anders Bakken <agbakken at gmail.com> wrote:
>>>>> Hi Manuel
>>>>>
>>>>> Well. We essentially provide a client/server setup where an editor can
>>>>> pass a location (file,offset) to the server and some options and the
>>>>> server can respond with various information. Most importantly
>>>>> references to this location (from all the files we've indexed) and
>>>>> whatever it refers to. This is to be able to do 21st century things
>>>>> like "follow symbol" and "find references" in Emacs since I'll never
>>>>> switch to an IDE. We need to be able to visit cursors I and ask them
>>>>> what they reference I guess. Not sure if this would be possible with
>>>>> the preprocess-only option. Likely not I guess. If you could point me
>>>>> at an example on how to do the preprocessing only I'd love to have a
>>>>> look.
>>>>
>>>> I think for your use case you really need the fully type-resolved AST.
>>>> This also means that there is no faster way to do it than to parse the
>>>> C++ code. The way you can save time is by doing aggressive in-memory
>>>> caching of processed parts of the file, which is one thing Chandler is
>>>> planning to work on (we call that "clangd" for Clang daemon).
>>>>
>>>> You can take a look at:
>>>> http://clang.llvm.org/docs/Tooling.html
>>>> to see the various possibilities you currently have to integrate with
>>>> clang here.
>>>>
>>>> Cheers,
>>>> /Manuel
>>>>
>>>>>
>>>>> If you want to take a look at the project it can be found here:
>>>>>
>>>>> https://github.com/Andersbakken/rtags
>>>>>
>>>>> thanks
>>>>>
>>>>> On Tue, Jul 17, 2012 at 2:11 AM, Manuel Klimek <klimek at google.com> wrote:
>>>>>> On Tue, Jul 17, 2012 at 10:07 AM, Anders Bakken <agbakken at gmail.com> wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> We're writing a clang-based tagger and while trying to improve the
>>>>>>> performance of our solution we came upon this paragraph:
>>>>>>
>>>>>> Not sure what your requirements for a "tagger" are, would be curious :)
>>>>>>
>>>>>>> "Elsa is not built as a stack of reusable libraries like clang is. It
>>>>>>> is very difficult to use part of Elsa without the whole front-end. For
>>>>>>> example, you cannot use Elsa to parse C/ObjC code without building an
>>>>>>> AST. You can do this in Clang and it is much faster than building an
>>>>>>> AST."
>>>>>>>
>>>>>>> from here: http://clang.llvm.org/comparison.html
>>>>>>>
>>>>>>> We've been using the C-api in clang-c/Index.h but if we could get
>>>>>>> better performance by using the C++ APIs directly we'd gladly do so
>>>>>>> (even if it might change or be harder to use).
>>>>>>>
>>>>>>> Is there an example or some documentation on how to do this somewhere possibly?
>>>>>>
>>>>>> You can use the clang preprocessor to tokenize if that's all you need.
>>>>>> Currently there's not really good docs around that, and I don't think
>>>>>> I have a really good example. I can get you some more ideas on how to
>>>>>> go about this if you say that preprocessor-only is what you need.
>>>>>>
>>>>>> Cheers,
>>>>>> /Manuel