[PATCH] D10305: [Clang Static Analyzer] Bug identification

Babati Bence bence.babati at ericsson.com
Tue Jul 28 05:45:15 PDT 2015

babati added a comment.

> please consider stripping the line from comments and normalize white spaces before hashing and calculating column position.

>  This will make the hash more robust, as a re-identation or adding comments to a line will not spoil the hashes.

I changed the patch and the normalized line is used instead of the raw content of line. So, the whitespaces and the comments are removed from the line before the hashing.

> By redundant, I mean that this information is already encoded in the report; even if it's not part of the issue id. I can see this argument go either way. However, if we do decide to include the filename, we would need to change clang/utils/analyzer/CmpRuns.py and the current issue_hash so that it's all consistent.

The hash should be a unique identifier of a concreate defect. If a hash identifies multiple deffects in different files at the same time, that must be considered as a fault (from the user perspective). 
If the user suppresses a fault then he discovers later that with that suppression 2 or more other bugs „disappeared”.

The filename should be part of the hash because there will be hash clash if for example:
-there are multiple main() functions in the codebase with the same signature (this is likely)
-and there is a same line with a defect in each of them
then the same bug hash would be generated.

Including the filename in the hash would decrease the likelyhood of such cases.


More information about the cfe-commits mailing list