[cfe-dev] C++ algorithm analysis tool

Wed Apr 5 03:11:41 PDT 2017

On 5 April 2017 at 11:05, Artem Dergachev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On second thought, maybe it's quite neat.
>
> I mean, like, a person who doesn't know anything about AST and about
> matchers and whatever, could probably do something like that
>
> $ cat < pattern.c
> void bubblesort(int *array, int length) {
>       for (int i = 0; i < length - 1; ++i)
>          for (int j = 0; j < length - i - 1; ++j)
>              if (array[j] > array[j + 1]) {
>                  int tmp = array[j];
>                  array[j] = array[j + 1];
>                  array[j + 1] = tmp;
>              }
> }
> ^D
>
> and then
>
> $ clang-finder --pattern pattern.c my_project.c
>
> and the tool would find all clones of the pattern in the project, by
> constructing the AST for the pattern and hashing it and looking for clones.
> The pattern would be in a different AST context, but i guess for the clone
> checker it shouldn't be a problem, so we wouldn't need to deal with
> ASTImporter for that.

Pardon my ignorance, but will the clone checker match patterns that have
different variable names (e.g. values instead of array and size instead of
length)?

Also, is it possible to use patterns recursively? E.g. Say I want to detect
naive swap and std::swap in the bubble sort:

swap1.c:
                 int tmp = array[j];
                 array[j] = array[j + 1];
                 array[j + 1] = tmp;
swap2.c:
                 std::swap(array[j], array[j + 1]);

pattern.c:
void bubblesort(int *array, int length) {
      for (int i = 0; i < length - 1; ++i)
         for (int j = 0; j < length - i - 1; ++j)
             if (array[j] > array[j + 1]) {
                 SWAP(array, j)
             }
}

>
> On 4/4/17 12:25 PM, Vassil Vassilev wrote:
>
>> On 04/04/17 10:59, Raphael Isemann wrote:
>>
>>> I feel the implementation of an algorithm in the STL and the
>>> equivalent implementation written by the user are usually so different
>>> in terms of syntax that the current clone detection won't work that
>>> well. If we had a constraint for estimating semantic equality, then it
>>> would be a different situation, but I don't expect an implementation
>>> of this anytime soon :).
>>>
>>> However, if we are satisfied with syntactic equality, then this can be
>>> done in a few minutes with the clone detector infrastructure with
>>> something like this pseudocode:
>>>
>>>     detector.findClones(RecursiveTypeIIConstraint(),
>>> MinFunctionBodyCount(1), MinComplexity(20));
>>>
>>> This would detect code like this:
>>>
>>> void bubblesort(int *array, int length) {
>>>       for (int i = 0; i < length - 1; ++i)
>>>          for (int j = 0; j < length - i - 1; ++j)
>>>              if (array[j] > array[j + 1]) {
>>>                  int tmp = array[j];
>>>                  array[j] = array[j + 1];
>>>                  array[j + 1] = tmp;
>>>              }
>>> }
>>>
>>> int main() {
>>>       for (int i = 0; i < length - 1; ++i) // expect-warning{You could
>>> call the function 'bubblesort' instead
>>>          for (int j = 0; j < length - i - 1; ++j)
>>>              if (array[j] > array[j + 1]) {
>>>                  int tmp = array[j];
>>>                  array[j] = array[j + 1];
>>>                  array[j + 1] = tmp;
>>>              }
>>> }
>>>
>>> - Raphael
>>>
>>
>>   That being said, patches are welcome. The current infrastructure is
>> friendly for implementing semantic clone detection constraints (clones type
>> III). We'd be interested to help if necessary.
>>
>> -- Vassil
>>
>>>
>>> 2017-04-04 10:35 GMT+02:00 Artem Dergachev <noqnoqneo at gmail.com>:
>>>
>>>> +CC CloneDetector guys.
>>>>
>>>> Hmm, the idea of making body-farms and then using CloneChecker to find
>>>> clones of synthesized bodies in the actual code looks curious and funny,
>>>> though i'm not immediately seeing how is it superior to ASTMatchers.
>>>>
>>>>
>>>>
>>>> On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>>>>
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>> To my knowledge, there isn't.
>>>>>
>>>>> I don't recall where I got the idea, but I gave it a try last summer
>>>>> trying to implement a clang-tidy check doing what you proposed. I
>>>>> didn't
>>>>> have enough time to complete it, though, and I only managed to detect
>>>>> one or
>>>>> two very simple patterns.
>>>>>
>>>>> After thinking about this idea for some time I found that clang-tidy
>>>>> might
>>>>> be a perfect place for that, not sure whether a separate tool would be
>>>>> beneficial. The task of detecting a specific pattern is very similar
>>>>> to what
>>>>> clang-tidy checks do in a wide range of tasks. Also, there'd be a
>>>>> separate
>>>>> heuristic set for each standard algorithm, which makes the
>>>>> partitioning into
>>>>> different checks (for each popular standard library algorithm) natural.
>>>>>
>>>>> In my opinion, such checks would be useful, I'd be interested in
>>>>> seeing a
>>>>> proof-of-concept of some sort.
>>>>>
>>>>> One more idea I have in mind: it might be interesting to try using
>>>>> CloneChecker (a check of Clang Static Analyzer) to detect similar
>>>>> patterns
>>>>> in a generic way, but I'm not sure how beneficial that would be in
>>>>> practice.
>>>>> Still, might worth a try.
>>>>>
>>>>> +CC Alex, he might have some thoughts about this.
>>>>>
>>>>> Kind regards,
>>>>> Kirill
>>>>>
>>>>>
>>>>> On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
>>>>>
>>>>>> Hey everyone,
>>>>>>
>>>>>> Just wondering if there's a clang tool that can analyse an algorithm
>>>>>> to
>>>>>> suggest where standard algorithms can be replace handwritten algos?
>>>>>> E.g.
>>>>>>
>>>>>> int i = 0;
>>>>>> for (; i < v.size(); ++i)
>>>>>>     if (v[i] == expected)
>>>>>>        break;
>>>>>> if (i != v.size())
>>>>>>     some_function(v[i]);
>>>>>>
>>>>>> Can be rewritten to
>>>>>>
>>>>>> auto i = find(v.begin(), v.end(), expected);
>>>>>> if (i != v.end())
>>>>>>     some_function(*i);
>>>>>>
>>>>>> or in C++17:
>>>>>>
>>>>>> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>>>>>>     some_function(*i);
>>>>>>
>>>>>> If not, how difficult a task is it to write such a tool? Is there
>>>>>> anything that one should take into special consideration while
>>>>>> writing this
>>>>>> tool? Do you think it would have a lot of use-cases? (I do, based on
>>>>>> my
>>>>>> company's code base, and code I have seen while marking assignments).
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>>
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170405/672fbf41/attachment.html>