[PATCH] D158250: [IR] Add more details to StructuralHash

Mon Aug 21 14:59:41 PDT 2023

aidengrossman added inline comments.

================
Comment at: llvm/lib/IR/StructuralHash.cpp:83-91
+          if (const IntrinsicInst *InstrinsicInstruction =
+                  dyn_cast<IntrinsicInst>(&Inst))
+            hash(InstrinsicInstruction->getIntrinsicID());
+          if (const CallInst *CallInstruction = dyn_cast<CallInst>(&Inst))
+            hash(CallInstruction->getCalledFunction()->getName());
+
+          for (unsigned I = 0; I < Inst.getNumOperands(); ++I) {
----------------
aeubanks wrote:
> aidengrossman wrote:
> > nikic wrote:
> > > aidengrossman wrote:
> > > > nikic wrote:
> > > > > This seems like a very random collection of things to add to the hash. Why isn't this just hashing all the operands? That should cover the operand types, the called function and the intrinsic ID.
> > > > I was under the impression that it wasn't possible to just hash a value. I can hash the pointer, but I'm not sure that would be correct in all cases (unless everything is uniqued appropriately).
> > > > 
> > > > https://github.com/llvm/llvm-project/blob/d9cb76bc4d5e903fe045c58a42fc791d0c70172b/llvm/include/llvm/Analysis/IRSimilarityIdentifier.h#L261 implements logic that seems to follow those assumptions (and is a similar implementation to what is here).
> > > > 
> > > > Definitely could be that my assumptions are incorrect here though.
> > > You are right that we can't "just" hash the operand pointers, but I'd still use that as the general approach. If the operand is a `Constant` you should be able to hash the pointer as those are uniqued, for Arguments you can take the argument number, and for Instructions you could use only the type for now (to handle those we'd have to number instructions).
> > Also, re: pointers, my intention with this is to have a hash that is stable across different modules (interested in looking at function deduplication across modules) given the same function which makes just hashing pointers provide incorrect results.
> > 
> > I'm planning on writing in support for other constant types in the future and instruction operands through numbering (in follow-up patches). My intention with the `Detailed` flag currently isn't to be make every semantically-meaningful difference produce a different hash but to capture most of the common cases and be "good enough" at most cases.
> we should split out a `update(Value*)` (which doesn't recursively look at operands) and hash the operands of the current instruction as well as the current instruction itself
I've split this out into an `updateInstruction` and an `updateValue` function. `updateInstruction` handles everything related to instructions and then calls `updateValue` when handling operands. I think it makes things a little bit clearer as it avoids any reasoning about recursion depending upon if the value is an instruction or not.

I've added a couple additional cases when handling operands (rather than just hashing the type) that aren't meant to be exhaustive.

================
Comment at: llvm/lib/IR/StructuralHash.cpp:114

-  void update(const Module &M) {
+  void update(const Module &M, bool DetailedHash) {
     for (const GlobalVariable &GV : M.globals())
----------------
aeubanks wrote:
> is there anywhere we don't want this new functionality?
Yes. The `MergeFunctions` pass is now using this hash after https://reviews.llvm.org/D158217 and the original hashing implementation did not include additional details (as I believe the intention was to not hash all details) and then use the `FunctionComparator` class where necessary.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158250/new/

https://reviews.llvm.org/D158250