[PATCH] D20795: Added basic capabilities to detect source code clones.

Raphael Isemann via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 13 13:45:41 PDT 2016

teemperor added inline comments.

Comment at: include/clang/AST/CloneDetection.h:131
@@ +130,3 @@
+  bool operator<(const StmtSequence &Other) const {
+    return std::tie(S, StartIndex, EndIndex) <
+           std::tie(Other.S, Other.StartIndex, Other.EndIndex);
zaks.anna wrote:
> We are comparing the Stmt pointers here. Does this make results non-deterministic?
Good point. The updated patch only sorts by deterministic hash values which should fix this.

Comment at: lib/AST/CMakeLists.txt:10
@@ -9,3 +9,3 @@
zaks.anna wrote:
> Again, I do not think this should be in the AST. Who the users of this will be? If you envision users outside of Clang Static Analyzer, maybe we could add this to lib/Analysis/.
Oh, sorry! I wanted to bring this up in a meeting but it seems I've forgot to do so.

The point with this being in AST is that we might want to merge the Stmt::Profile implementation with the implementation of CloneDtection. And Stmt::Profile is a part of AST AFAIK, so I think we need to stay in the same library? I don't have a good overview over the clang build infrastructure yet, so I would appreciate opinions on that.

If that's something we will handle later, what would be a better place for the code?

Comment at: lib/AST/CloneDetection.cpp:103
@@ +102,3 @@
+/// \endcode
+class HashValue {
+  unsigned Value = 0;
zaks.anna wrote:
> This should probably be a utility instead of something specific to this checker. Do we have something similar to this in LLVM already?
Yes, there is similar builtin hashing available in llvm::hashing, but the API is designed to work with iterator pairs. So we either have to record all the data in a vector and hash that (which is what FoldingSetNodeID does) or use code that isn't exposed in the API (llvm::hashing::detail::hash_state). The first option creates unnecessary overhead and the second option would require either patching LLVM or using code that isn't part of the public API.

I updated the patch with something that uses FoldingSetNodeID for now. I'll update it later if we can land a patch in LLVM that adds an API for hashing without recording all necessary data beforehand.


More information about the cfe-commits mailing list