[PATCH] D10305: [Clang Static Analyzer] Bug identification

Honggyu Kim via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 7 06:46:29 PDT 2015


honggyu.kim added a comment.

I would like to also write about bug identification methods.
As I observed the current CmpRuns.py script, the IssueIdentifier is defined as follows:

  def getIssueIdentifier(self) :
      id = self.getFileName() + "+"
      if 'issue_context' in self._data :
        id += self._data['issue_context'] + "+"
      if 'issue_hash' in self._data :
        id += str(self._data['issue_hash'])
      return id

https://github.com/llvm-mirror/clang/blob/master/utils/analyzer/CmpRuns.py#L69-L75

It has 3 items to generate a bug identification.
(1) file name
(2) function name - issue_context
(3) line offset from the beginning of function - issue_hash

As of now, we generate issue_hash by simply calculating the line offset from the first line of the function.

  FullSourceLoc UL(SM->getExpansionLoc(UPDLoc.asLocation()),
                   *SM);
  FullSourceLoc UFunL(SM->getExpansionLoc(
    D->getUniqueingDecl()->getBody()->getLocStart()), *SM);
  o << "  <key>issue_hash</key><string>"
    << UL.getExpansionLineNumber() - UFunL.getExpansionLineNumber()
    << "</string>\n";

https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp#L423-L452

On the other hand, this patch generates BugID as follows:

  llvm::SmallString<32> clang::GetIssueHash(const SourceManager *SM,
                                            FullSourceLoc &L,
                                            StringRef CheckerName,
                                            StringRef HashField, const Decl *D) {
    static llvm::StringRef Delimiter = "$";
    
    return GetHashOfContent(
        (llvm::Twine(CheckerName) + Delimiter + GetEnclosingDeclContextSignature(D) +
         Delimiter + std::to_string(L.getExpansionColumnNumber()) + Delimiter +
         NormalizeLine(SM, L, D) + 
         Delimiter + HashField.str()).str());
  }

It has 6 items to generate a bug identification.
(1) file name (removed now)
(2) checker name
(3) function name - GetEnclosingDeclContextSignature(D)
(4) column number
(5) source line string after removing whitespace - NormalizeLine(SM, L, D)
(6) bug type - D->getBugType()

I think even if this patch is not accepted, we need to accept some of the methods suggested by this patch.
Current CmpRuns.py cannot distinguish the following 2 different bugs.

BUG 1. garbage return value

  1 int main()
  2 {
  3   int a;
  4   return a;
  5 }
  
  test.c:4:3: warning: Undefined or garbage value returned to caller
    return a;
    ^~~~~~~~

BUG 2. garbage assignment

  1 int main()
  2 {
  3   int a;
  4   int b = a;
  5   return b;
  6 }
  
  test.c:4:3: warning: Assigned value is garbage or undefined
    int b = a;
    ^~~~~   ~

In this case, getIssueIdentifier() returns the same ID for both cases as below:
<filename> + <function name> + <line offset from function>
test.c + main + 2

We cannot distinguish those cases with the current CmpRuns.py, so at least we need to add checker information from <check_name>.

BUG 3. a single line of comment is added based on BUG 1 code.

  1 int main()
  2 {
  3   // main function
  4   int a;
  5   return a;
  6 }
  
  test.c:5:3: warning: Undefined or garbage value returned to caller
    return a;
    ^~~~~~~~

If we compare BUG3 with BUG1, CmpRuns.py shows those bugs are different even though only a single line of comment is added without actual modification.

  REMOVED: 'test.c:4:3, Logic error: Undefined or garbage value returned to caller'
  ADDED: 'test.c:5:3, Logic error: Undefined or garbage value returned to caller'
  TOTAL REPORTS: 1
  TOTAL DIFFERENCES: 2

I think we need to enhance issue_hash generation method in order to avoid those cases.


http://reviews.llvm.org/D10305





More information about the cfe-commits mailing list