[PATCH] D67737: [clang-tidy] Add check for classes missing -hash ⚠️

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 20 05:41:04 PDT 2019


aaron.ballman accepted this revision.
aaron.ballman added a comment.
This revision is now accepted and ready to land.

LGTM!



================
Comment at: clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp:56
+  const auto *ID = Result.Nodes.getNodeAs<ObjCImplementationDecl>("impl");
+  diag(ID->getLocation(), "%0 implements -isEqual: without implementing -hash")
+      << ID;
----------------
stephanemoore wrote:
> aaron.ballman wrote:
> > Do you think we could generate a fixit to add the `hash` method? Do you think we could even add a default implementation that returns the pointer to the object (assuming that's the correct default behavior)?
> > Do you think we could generate a fixit to add the hash method?
> 
> I think it would be pretty tough to generate a reasonable hash method without knowing the equality and hashing semantics that the scenario calls for.
> 
> Here is an analogous situation presented in C++ (please excuse the hastily assembled sample code):
> ```
> namespace {
> 
> class NSObject {
>   public:
>     NSObject() {}
>     virtual ~NSObject() {}
> 
>     virtual bool isEqual(const NSObject *other) const {
>       return this == other;
>     }
>     virtual unsigned long long hash() const {
>       return (unsigned long long)this;
>     }
> };
> 
> }
> 
> #include <stdio.h>
> #include <string>
> 
> namespace {
> 
> class Movie : public virtual NSObject {
>   private:
>     std::string name;
>     std::string language;
> 
>   public:
>     Movie(std::string name, std::string language) : name(name), language(language) {}
>     ~Movie() override {}
>     bool isEqual(const NSObject *other) const override {
>       if (auto otherMovie = dynamic_cast<const Movie *>(other)) {
>         // Movies with the same name are considered equal
>         // regardless of the language of the screening.
>         return name == otherMovie->name;
>       }
>       return false;
>     }
>     unsigned long long hash() const override {
>       return name.length();
>     }
> };
> 
> }
> ```
> 
> As before, the base class uses pointer equality and the pointer as a hash. A subclass may arbitrarily add additional state but only the developer knows which added state factors into equality operations and consequently should be considered—but not necessarily required—in the hash operation. The matter can technically get even more complicated if an object stores state externally. I would hope that externally stored state would not factor into the equality operation of an object but I hesitate to make an assumption.
> 
> The developer is also in the best position to prioritize different properties of the hash function including performance, collision resistance, uniformity, and non-invertibility.
> 
> Writing effective hash functions is probably difficult independent of the programming language but it might help to consider some specific examples in Objective-C. [GPBMessage](https://github.com/protocolbuffers/protobuf/blob/ffa6bfc/objectivec/GPBMessage.m), the Objective-C base class for Google Protocol Buffer message classes, implements `-hash` but has an [extensive comment](https://github.com/protocolbuffers/protobuf/blob/ffa6bfc/objectivec/GPBMessage.m#L2749) explaining that its complex but generic implementation is not generally optimal and recommends that developers override `-hash` and `-isEqual:` to optimize for runtime performance. In contrast, the basic collection classes in Apple's Foundation framework have [surprisingly simple hash behavior](https://github.com/stephanemoore/archives/blob/master/objc/tips/hashing-basic-collections.md) that clearly indicate priority to runtime performance over uniformity and collision resistance. The former is a conservatively expensive hash function and the latter is a conservatively inexpensive hash function.
> 
> > Do you think we could even add a default implementation that returns the pointer to the object (assuming that's the correct default behavior)?
> 
> A hash returning the object pointer is already inherited from the superclass (i..e, `-[NSObject hash]`). Defining an override that returns the object pointer would be a functional no-op for classes directly derived from `NSObject` (although the explicit override could be useful as a signal of intended behavior).
> A hash returning the object pointer is already inherited from the superclass (i..e, -[NSObject hash]). Defining an override that returns the object pointer would be a functional no-op for classes directly derived from NSObject (although the explicit override could be useful as a signal of intended behavior).

Ah, my ObjC knowledge is weak and I was thinking that the one inherited from `NSObject` would be hidden. Thank you for the detailed explanation, that makes a lot of sense to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67737/new/

https://reviews.llvm.org/D67737





More information about the cfe-commits mailing list