[PATCH] D158250: [IR] Add more details to StructuralHash

Arthur Eubanks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 21 10:46:24 PDT 2023


aeubanks added inline comments.


================
Comment at: llvm/lib/IR/StructuralHash.cpp:83-91
+          if (const IntrinsicInst *InstrinsicInstruction =
+                  dyn_cast<IntrinsicInst>(&Inst))
+            hash(InstrinsicInstruction->getIntrinsicID());
+          if (const CallInst *CallInstruction = dyn_cast<CallInst>(&Inst))
+            hash(CallInstruction->getCalledFunction()->getName());
+
+          for (unsigned I = 0; I < Inst.getNumOperands(); ++I) {
----------------
aidengrossman wrote:
> nikic wrote:
> > aidengrossman wrote:
> > > nikic wrote:
> > > > This seems like a very random collection of things to add to the hash. Why isn't this just hashing all the operands? That should cover the operand types, the called function and the intrinsic ID.
> > > I was under the impression that it wasn't possible to just hash a value. I can hash the pointer, but I'm not sure that would be correct in all cases (unless everything is uniqued appropriately).
> > > 
> > > https://github.com/llvm/llvm-project/blob/d9cb76bc4d5e903fe045c58a42fc791d0c70172b/llvm/include/llvm/Analysis/IRSimilarityIdentifier.h#L261 implements logic that seems to follow those assumptions (and is a similar implementation to what is here).
> > > 
> > > Definitely could be that my assumptions are incorrect here though.
> > You are right that we can't "just" hash the operand pointers, but I'd still use that as the general approach. If the operand is a `Constant` you should be able to hash the pointer as those are uniqued, for Arguments you can take the argument number, and for Instructions you could use only the type for now (to handle those we'd have to number instructions).
> Also, re: pointers, my intention with this is to have a hash that is stable across different modules (interested in looking at function deduplication across modules) given the same function which makes just hashing pointers provide incorrect results.
> 
> I'm planning on writing in support for other constant types in the future and instruction operands through numbering (in follow-up patches). My intention with the `Detailed` flag currently isn't to be make every semantically-meaningful difference produce a different hash but to capture most of the common cases and be "good enough" at most cases.
we should split out a `update(Value*)` (which doesn't recursively look at operands) and hash the operands of the current instruction as well as the current instruction itself


================
Comment at: llvm/lib/IR/StructuralHash.cpp:114
 
-  void update(const Module &M) {
+  void update(const Module &M, bool DetailedHash) {
     for (const GlobalVariable &GV : M.globals())
----------------
is there anywhere we don't want this new functionality?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158250/new/

https://reviews.llvm.org/D158250



More information about the llvm-commits mailing list