[cfe-dev] Auto-generation of ASTMatchers predicates from source code, proof-of-concept

Tue Jun 19 06:19:46 PDT 2012

19.06.2012 16:57, Manuel Klimek wrote:
> On Tue, Jun 19, 2012 at 2:53 PM, Evgeny Panasyuk 
> <evgeny.panasyuk at gmail.com <mailto:evgeny.panasyuk at gmail.com>> wrote:
>
>     19.06.2012 16:35, Manuel Klimek wrote:
>>
>>>>                 Or maybe about some interactive (maybe gui) tool
>>>>                 for building predicates? I remember that Chandler
>>>>                 mentioned about something similar at
>>>>                 http://www.youtube.com/watch?v=yuIOGfcOH0k&t=27m56s
>>>>                 <http://www.youtube.com/watch?v=yuIOGfcOH0k&t=27m56s>
>>>>
>>>>
>>>>             Now we're talking the next step :) Yea, having a GUI
>>>>             would be *great* (and just so we're clear: with GUI I
>>>>             mean a web page :P)
>>>
>>>             And maybe AST database optimized for fast predicate
>>>             matches :)
>>>
>>>
>>>         For small projects this might be interesting - for us the
>>>         question is how that would scale - we've found parsing the
>>>         C++ code to be actually an interesting way to scale the AST,
>>>         for the small price of needing up 3-4 seconds per TU (on
>>>         average). Denormalizing the AST itself produces a huge
>>>         amount of data, and denormalizing even more seems like a
>>>         non-starter.
>>>
>>>         Thoughts?
>>
>>         It depends on how much you would like to scale. And yes, it
>>         also depends on project sizes.
>>         For instance, if required scaling is task per TU - it is one
>>         case.
>>
>>
>>     Perhaps I need to expand on what I mean here:
>>     Imagine you have on the order of 100MLOC.
>>     If you want an "AST database" for predicate matches, the question
>>     is what indexes you create. If you basically want to create an
>>     extra index per "matcher", the denormalization takes too much
>>     data. If you don't create an index per matcher, how do you
>>     efficiently evaluate matchers?
>
>     I understood that part of previous message.
>     My point was, that if you have 1k translation units and need to
>     scale up to 100k parallel tasks, then it is obvious that "task per
>     TU" is not sufficient, and need to use another approach (maybe
>     pre-parse and split AST).
>
>
> I don't understand the point you're trying to make here yet :)
> Are you talking about having the same (parametrized) task done 100k 
> times in parallel (like: find all references to X done by many 
> engineeres), or something else? How would a pre-parsed AST help? 
> Perhaps you can expand on the "obvious" part ;)
>

For instance, we have following code:

__x.h__
struct X
{
     int a,b,c;
};
_______
__some.cpp__
#include <x.h>

void f()
{
     X t;
    // ...
}

void g()
{
     X u;
    // ...
}
_____________

In that case, we can literally split some.cpp into two files some1.cpp 
(with include + f defenition) and some2.cpp (with include + g 
defenition), and seek for X references separately.
In similar manner we can split pre-parsed AST to parts which do not 
"overlap" in terms of implemented predicates.

Best Regards,
Evgeny
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120619/be1420a3/attachment.html>