[PATCH] D10305: [Clang Static Analyzer] Bug identification

Tue Jul 28 05:45:15 PDT 2015

babati added a comment.

> please consider stripping the line from comments and normalize white spaces before hashing and calculating column position.

>  This will make the hash more robust, as a re-identation or adding comments to a line will not spoil the hashes.

I changed the patch and the normalized line is used instead of the raw content of line. So, the whitespaces and the comments are removed from the line before the hashing.

> By redundant, I mean that this information is already encoded in the report; even if it's not part of the issue id. I can see this argument go either way. However, if we do decide to include the filename, we would need to change clang/utils/analyzer/CmpRuns.py and the current issue_hash so that it's all consistent.

The hash should be a unique identifier of a concreate defect. If a hash identifies multiple deffects in different files at the same time, that must be considered as a fault (from the user perspective). 
If the user suppresses a fault then he discovers later that with that suppression 2 or more other bugs „disappeared”.

The filename should be part of the hash because there will be hash clash if for example:
-there are multiple main() functions in the codebase with the same signature (this is likely)
-and there is a same line with a defect in each of them
then the same bug hash would be generated.

Including the filename in the hash would decrease the likelyhood of such cases.

http://reviews.llvm.org/D10305