[PATCH] D10305: [Clang Static Analyzer] Bug identification
Honggyu Kim via cfe-commits
cfe-commits at lists.llvm.org
Mon Sep 7 06:46:29 PDT 2015
honggyu.kim added a comment.
I would like to also write about bug identification methods.
As I observed the current CmpRuns.py script, the IssueIdentifier is defined as follows:
def getIssueIdentifier(self) :
id = self.getFileName() + "+"
if 'issue_context' in self._data :
id += self._data['issue_context'] + "+"
if 'issue_hash' in self._data :
id += str(self._data['issue_hash'])
return id
https://github.com/llvm-mirror/clang/blob/master/utils/analyzer/CmpRuns.py#L69-L75
It has 3 items to generate a bug identification.
(1) file name
(2) function name - issue_context
(3) line offset from the beginning of function - issue_hash
As of now, we generate issue_hash by simply calculating the line offset from the first line of the function.
FullSourceLoc UL(SM->getExpansionLoc(UPDLoc.asLocation()),
*SM);
FullSourceLoc UFunL(SM->getExpansionLoc(
D->getUniqueingDecl()->getBody()->getLocStart()), *SM);
o << " <key>issue_hash</key><string>"
<< UL.getExpansionLineNumber() - UFunL.getExpansionLineNumber()
<< "</string>\n";
https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp#L423-L452
On the other hand, this patch generates BugID as follows:
llvm::SmallString<32> clang::GetIssueHash(const SourceManager *SM,
FullSourceLoc &L,
StringRef CheckerName,
StringRef HashField, const Decl *D) {
static llvm::StringRef Delimiter = "$";
return GetHashOfContent(
(llvm::Twine(CheckerName) + Delimiter + GetEnclosingDeclContextSignature(D) +
Delimiter + std::to_string(L.getExpansionColumnNumber()) + Delimiter +
NormalizeLine(SM, L, D) +
Delimiter + HashField.str()).str());
}
It has 6 items to generate a bug identification.
(1) file name (removed now)
(2) checker name
(3) function name - GetEnclosingDeclContextSignature(D)
(4) column number
(5) source line string after removing whitespace - NormalizeLine(SM, L, D)
(6) bug type - D->getBugType()
I think even if this patch is not accepted, we need to accept some of the methods suggested by this patch.
Current CmpRuns.py cannot distinguish the following 2 different bugs.
BUG 1. garbage return value
1 int main()
2 {
3 int a;
4 return a;
5 }
test.c:4:3: warning: Undefined or garbage value returned to caller
return a;
^~~~~~~~
BUG 2. garbage assignment
1 int main()
2 {
3 int a;
4 int b = a;
5 return b;
6 }
test.c:4:3: warning: Assigned value is garbage or undefined
int b = a;
^~~~~ ~
In this case, getIssueIdentifier() returns the same ID for both cases as below:
<filename> + <function name> + <line offset from function>
test.c + main + 2
We cannot distinguish those cases with the current CmpRuns.py, so at least we need to add checker information from <check_name>.
BUG 3. a single line of comment is added based on BUG 1 code.
1 int main()
2 {
3 // main function
4 int a;
5 return a;
6 }
test.c:5:3: warning: Undefined or garbage value returned to caller
return a;
^~~~~~~~
If we compare BUG3 with BUG1, CmpRuns.py shows those bugs are different even though only a single line of comment is added without actual modification.
REMOVED: 'test.c:4:3, Logic error: Undefined or garbage value returned to caller'
ADDED: 'test.c:5:3, Logic error: Undefined or garbage value returned to caller'
TOTAL REPORTS: 1
TOTAL DIFFERENCES: 2
I think we need to enhance issue_hash generation method in order to avoid those cases.
http://reviews.llvm.org/D10305
More information about the cfe-commits
mailing list