[cfe-dev] C++ algorithm analysis tool
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Wed Apr 5 03:18:07 PDT 2017
Clone checker currently does discard variable names, as long as they
follow the same pattern, eg. `a = a + b` is a clone of `c = c + d` but
not of `a = b + b`. It also has a mode in which it warns about patterns
that are almost-clones (broken in one variable) as copy-paste errors.
Function names matter though, so if we expect to find both SWAP and
std::swap, we'd need two runs for now.
As soon as https://reviews.llvm.org/D23418 lands, which is right about
now, the clone checker would also be *highly* configurable and modular,
and all sorts of per-use-case tweaks would be possible. Fuzzy-matching
function names is probably not impossible.
On 4/5/17 1:11 PM, Alex L wrote:
>
>
> On 5 April 2017 at 11:05, Artem Dergachev via cfe-dev
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> On second thought, maybe it's quite neat.
>
> I mean, like, a person who doesn't know anything about AST and
> about matchers and whatever, could probably do something like that
>
> $ cat < pattern.c
> void bubblesort(int *array, int length) {
> for (int i = 0; i < length - 1; ++i)
> for (int j = 0; j < length - i - 1; ++j)
> if (array[j] > array[j + 1]) {
> int tmp = array[j];
> array[j] = array[j + 1];
> array[j + 1] = tmp;
> }
> }
> ^D
>
> and then
>
> $ clang-finder --pattern pattern.c my_project.c
>
> and the tool would find all clones of the pattern in the project,
> by constructing the AST for the pattern and hashing it and looking
> for clones. The pattern would be in a different AST context, but i
> guess for the clone checker it shouldn't be a problem, so we
> wouldn't need to deal with ASTImporter for that.
>
>
> Pardon my ignorance, but will the clone checker match patterns that
> have different variable names (e.g. values instead of array and size
> instead of length)?
>
> Also, is it possible to use patterns recursively? E.g. Say I want to
> detect naive swap and std::swap in the bubble sort:
>
> swap1.c:
> int tmp = array[j];
> array[j] = array[j + 1];
> array[j + 1] = tmp;
> swap2.c:
> std::swap(array[j], array[j + 1]);
>
> pattern.c:
> void bubblesort(int *array, int length) {
> for (int i = 0; i < length - 1; ++i)
> for (int j = 0; j < length - i - 1; ++j)
> if (array[j] > array[j + 1]) {
> SWAP(array, j)
> }
> }
>
>
>
>
> On 4/4/17 12:25 PM, Vassil Vassilev wrote:
>
> On 04/04/17 10:59, Raphael Isemann wrote:
>
> I feel the implementation of an algorithm in the STL and the
> equivalent implementation written by the user are usually
> so different
> in terms of syntax that the current clone detection won't
> work that
> well. If we had a constraint for estimating semantic
> equality, then it
> would be a different situation, but I don't expect an
> implementation
> of this anytime soon :).
>
> However, if we are satisfied with syntactic equality, then
> this can be
> done in a few minutes with the clone detector
> infrastructure with
> something like this pseudocode:
>
> detector.findClones(RecursiveTypeIIConstraint(),
> MinFunctionBodyCount(1), MinComplexity(20));
>
> This would detect code like this:
>
> void bubblesort(int *array, int length) {
> for (int i = 0; i < length - 1; ++i)
> for (int j = 0; j < length - i - 1; ++j)
> if (array[j] > array[j + 1]) {
> int tmp = array[j];
> array[j] = array[j + 1];
> array[j + 1] = tmp;
> }
> }
>
> int main() {
> for (int i = 0; i < length - 1; ++i) //
> expect-warning{You could
> call the function 'bubblesort' instead
> for (int j = 0; j < length - i - 1; ++j)
> if (array[j] > array[j + 1]) {
> int tmp = array[j];
> array[j] = array[j + 1];
> array[j + 1] = tmp;
> }
> }
>
> - Raphael
>
>
> That being said, patches are welcome. The current
> infrastructure is friendly for implementing semantic clone
> detection constraints (clones type III). We'd be interested to
> help if necessary.
>
> -- Vassil
>
>
> 2017-04-04 10:35 GMT+02:00 Artem Dergachev
> <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>>:
>
> +CC CloneDetector guys.
>
> Hmm, the idea of making body-farms and then using
> CloneChecker to find
> clones of synthesized bodies in the actual code looks
> curious and funny,
> though i'm not immediately seeing how is it superior
> to ASTMatchers.
>
>
>
> On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>
>
> Hi Chris,
>
> To my knowledge, there isn't.
>
> I don't recall where I got the idea, but I gave it
> a try last summer
> trying to implement a clang-tidy check doing what
> you proposed. I didn't
> have enough time to complete it, though, and I
> only managed to detect one or
> two very simple patterns.
>
> After thinking about this idea for some time I
> found that clang-tidy might
> be a perfect place for that, not sure whether a
> separate tool would be
> beneficial. The task of detecting a specific
> pattern is very similar to what
> clang-tidy checks do in a wide range of tasks.
> Also, there'd be a separate
> heuristic set for each standard algorithm, which
> makes the partitioning into
> different checks (for each popular standard
> library algorithm) natural.
>
> In my opinion, such checks would be useful, I'd be
> interested in seeing a
> proof-of-concept of some sort.
>
> One more idea I have in mind: it might be
> interesting to try using
> CloneChecker (a check of Clang Static Analyzer) to
> detect similar patterns
> in a generic way, but I'm not sure how beneficial
> that would be in practice.
> Still, might worth a try.
>
> +CC Alex, he might have some thoughts about this.
>
> Kind regards,
> Kirill
>
>
> On 01/04/17 02:42, Christopher Di Bella via
> cfe-dev wrote:
>
> Hey everyone,
>
> Just wondering if there's a clang tool that
> can analyse an algorithm to
> suggest where standard algorithms can be
> replace handwritten algos?
> E.g.
>
> int i = 0;
> for (; i < v.size(); ++i)
> if (v[i] == expected)
> break;
> if (i != v.size())
> some_function(v[i]);
>
> Can be rewritten to
>
> auto i = find(v.begin(), v.end(), expected);
> if (i != v.end())
> some_function(*i);
>
> or in C++17:
>
> if (auto i = find(v.begin(), v.end(),
> expected); i != v.end())
> some_function(*i);
>
> If not, how difficult a task is it to write
> such a tool? Is there
> anything that one should take into special
> consideration while writing this
> tool? Do you think it would have a lot of
> use-cases? (I do, based on my
> company's code base, and code I have seen
> while marking assignments).
>
> Cheers,
>
> Chris
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
More information about the cfe-dev
mailing list