[cfe-dev] Hash for Clang-Tidy findings

LÅ‘rinc Balog via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 11 06:33:23 PDT 2018


Hi,

Anyone would be interested in having a unique identifier for Clang-Tidy
findings? Hashing the bug and its context in a similar way as it was
introduced in Clang Static Analyzer in https://reviews.llvm.org/D10305
would be useful e.g. to list new defects compared to a baseline and to
recognize that a finding is the same even if it was shifted in the source
code. The hash would assure that the issues in the following two examples
could be considered as the same finding (which they are, since the
difference between the two files is an unrelated change that caused the
second finding to be shifted by 2 lines):

// test.cpp (version 1)
// 'clang-tidy test.cpp -checks=bugprone-string-constructor' output now is:
// test.cpp:4:15: warning: string constructor parameters are probably
swapped; expecting string(count, character) [bugprone-string-constructor]
//   std::string str('x', 50);
//               ^   ~~~~ ~~~
//                   50   'x'
// planned string to be hashed: bugprone-string-constructor$void
bugproneStringConstruct()$15$std::stringstr('x',50);
#include <string>

void bugproneStringConstruct() {
  std::string str('x', 50);
}

// test.cpp (version 2)
// 'clang-tidy test.cpp -checks=bugprone-string-constructor' output now is:
// test.cpp:6:15: warning: string constructor parameters are probably
swapped; expecting string(count, character) [bugprone-string-constructor]
//   std::string str('x', 50);
//               ^   ~~~~ ~~~
//                   50   'x'
// planned string to be hashed: bugprone-string-constructor$void
bugproneStringConstruct()$15$std::stringstr('x',50);
#include <string>

void bugproneStringConstruct() {
  // This function does nothing, but raises clang-tidy's
  // bugprone-string-constructor checker warning.
  std::string str('x', 50);
}

If such a hash could serve well the community, are there any thoughts on
the implementation? Based on Clang Static Analyzer, the hash would be
md5('checker name$enclosing context$column of the finding$source code line
text').
I thought about two possible solutions:
  1) The hash could be part of the diagnostic message, i.e.: test.cpp:4:15:
warning: string constructor parameters are probably swapped; expecting
string(count, character) [bugprone-string-constructor]
#0679de2a8c11b9a0e88c7e517b7301fd#
  2) The hash could be generated by an external Clang tool after the
analysis, based on the location and checker information provided by the
original diagnostic message. The major drawback of this approach is that
Clang-Tidy's output has to parsed, and the source file has to be reparsed
in order to find the enclosing context.

Thanks for your comments,
Lorinc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180611/cec8dc02/attachment.html>


More information about the cfe-dev mailing list