[PATCH] D10305: [Clang Static Analyzer] Bug identification
Babati Bence
bence.babati at ericsson.com
Tue Jul 28 05:45:15 PDT 2015
babati added a comment.
> please consider stripping the line from comments and normalize white spaces before hashing and calculating column position.
> This will make the hash more robust, as a re-identation or adding comments to a line will not spoil the hashes.
I changed the patch and the normalized line is used instead of the raw content of line. So, the whitespaces and the comments are removed from the line before the hashing.
> By redundant, I mean that this information is already encoded in the report; even if it's not part of the issue id. I can see this argument go either way. However, if we do decide to include the filename, we would need to change clang/utils/analyzer/CmpRuns.py and the current issue_hash so that it's all consistent.
The hash should be a unique identifier of a concreate defect. If a hash identifies multiple deffects in different files at the same time, that must be considered as a fault (from the user perspective).
If the user suppresses a fault then he discovers later that with that suppression 2 or more other bugs „disappeared”.
The filename should be part of the hash because there will be hash clash if for example:
-there are multiple main() functions in the codebase with the same signature (this is likely)
-and there is a same line with a defect in each of them
then the same bug hash would be generated.
Including the filename in the hash would decrease the likelyhood of such cases.
http://reviews.llvm.org/D10305
More information about the cfe-commits
mailing list