[cfe-dev] parsing without building an AST tree

Anders Bakken agbakken at gmail.com
Wed Jul 25 09:01:21 PDT 2012



--

Anders

On Jul 22, 2012, at 10:17 PM, Manuel Klimek <klimek at google.com> wrote:

> On Sat, Jul 21, 2012 at 4:09 AM, Anders Bakken <agbakken at gmail.com> wrote:
>> Hi Manuel
>> 
>> Thanks for the info. We didn't know about libtooling. It seems like
>> there will be a lot of overlap between what we do and what clangd will
>> do. Our approach has been not to persist translation units but rather
>> tear out the information we need when we parse it and reparse when we
>> need to. The main reasons we've found for this:
>> 
>> 1) clang_reparse doesn't seem much (any) faster than parsing the whole
>> thing over again.
> 
> As far as I understand it reparsing only gets faster if you store a
> precompiled preamble of the source files in between runs.
> 

We did pass those flags to the initial clang_parseTranslationUnit call I believe. I could take another look. 

>> 2) clang apis do not seem to give us a way to find references for a
>> given cursor across translation units.
> 
> USRs are made for that; I assume you've seen:
> http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html
> 

I"ve seen those Apis. We use them for certain things but I am not sure how that would help for this. E.g. Suppose I want all calls to printf across all my source files. 
>> We'll be watching the project though when code starts appearing
>> though. I imagine APIs will pop up in Index.h as they are needed by
>> clangd.
> 
> I don't think python APIs will appear first - the clangd project has
> C++ clients as a first goal. (Python clients are a core goal, too, but
> not as high prio I think).
> 
C++ is what we want so that's good. 

> Cheers,
> /Manuel
> 

Thanks
> 
>> 
>> regards
>> 
>> Anders
>> 
>> On Thu, Jul 19, 2012 at 1:18 AM, Manuel Klimek <klimek at google.com> wrote:
>>> On Thu, Jul 19, 2012 at 5:14 AM, Anders Bakken <agbakken at gmail.com> wrote:
>>>> Hi Manuel
>>>> 
>>>> Well. We essentially provide a client/server setup where an editor can
>>>> pass a location (file,offset) to the server and some options and the
>>>> server can respond with various information. Most importantly
>>>> references to this location (from all the files we've indexed) and
>>>> whatever it refers to. This is to be able to do 21st century things
>>>> like "follow symbol" and "find references" in Emacs since I'll never
>>>> switch to an IDE. We need to be able to visit cursors I and ask them
>>>> what they reference I guess. Not sure if this would be possible with
>>>> the preprocess-only option. Likely not I guess. If you could point me
>>>> at an example on how to do the preprocessing only I'd love to have a
>>>> look.
>>> 
>>> I think for your use case you really need the fully type-resolved AST.
>>> This also means that there is no faster way to do it than to parse the
>>> C++ code. The way you can save time is by doing aggressive in-memory
>>> caching of processed parts of the file, which is one thing Chandler is
>>> planning to work on (we call that "clangd" for Clang daemon).
>>> 
>>> You can take a look at:
>>> http://clang.llvm.org/docs/Tooling.html
>>> to see the various possibilities you currently have to integrate with
>>> clang here.
>>> 
>>> Cheers,
>>> /Manuel
>>> 
>>>> 
>>>> If you want to take a look at the project it can be found here:
>>>> 
>>>> https://github.com/Andersbakken/rtags
>>>> 
>>>> thanks
>>>> 
>>>> On Tue, Jul 17, 2012 at 2:11 AM, Manuel Klimek <klimek at google.com> wrote:
>>>>> On Tue, Jul 17, 2012 at 10:07 AM, Anders Bakken <agbakken at gmail.com> wrote:
>>>>>> Hi
>>>>>> 
>>>>>> We're writing a clang-based tagger and while trying to improve the
>>>>>> performance of our solution we came upon this paragraph:
>>>>> 
>>>>> Not sure what your requirements for a "tagger" are, would be curious :)
>>>>> 
>>>>>> "Elsa is not built as a stack of reusable libraries like clang is. It
>>>>>> is very difficult to use part of Elsa without the whole front-end. For
>>>>>> example, you cannot use Elsa to parse C/ObjC code without building an
>>>>>> AST. You can do this in Clang and it is much faster than building an
>>>>>> AST."
>>>>>> 
>>>>>> from here: http://clang.llvm.org/comparison.html
>>>>>> 
>>>>>> We've been using the C-api in clang-c/Index.h but if we could get
>>>>>> better performance by using the C++ APIs directly we'd gladly do so
>>>>>> (even if it might change or be harder to use).
>>>>>> 
>>>>>> Is there an example or some documentation on how to do this somewhere possibly?
>>>>> 
>>>>> You can use the clang preprocessor to tokenize if that's all you need.
>>>>> Currently there's not really good docs around that, and I don't think
>>>>> I have a really good example. I can get you some more ideas on how to
>>>>> go about this if you say that preprocessor-only is what you need.
>>>>> 
>>>>> Cheers,
>>>>> /Manuel




More information about the cfe-dev mailing list