[llvm] Add a pass to calculate machine function's cfg hash to detect whether… (PR #84145)

Sriraman Tallam via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 23 11:56:58 PDT 2024


================
@@ -0,0 +1,73 @@
+//===-- MachineFunctionHashBuilder.cpp ----------------------------------*-
+// C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the implementation of pass about calculating machine
+/// function hash.
+//===----------------------------------------------------------------------===//
+#include "llvm/CodeGen/MachineFunctionHashBuilder.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+using namespace llvm;
+
+// Calculate machine function hash in level order traversal.
+// The control flow graph is uniquely represented by its level-order traversal.
+static uint64_t calculateMBBCFGHash(const MachineFunction &MF) {
+  if (MF.empty())
+    return 0;
+  std::unordered_set<const MachineBasicBlock *> Visited;
+  MD5 Hash;
+  std::queue<const MachineBasicBlock *> WorkList;
+  WorkList.push(&*MF.begin());
+  while (!WorkList.empty()) {
+    const MachineBasicBlock *CurrentBB = WorkList.front();
+    WorkList.pop();
+    uint32_t Value = support::endian::byte_swap<uint32_t, endianness::little>(
+        CurrentBB->getBBID()->BaseID);
+    uint32_t Size = support::endian::byte_swap<uint32_t, endianness::little>(
+        CurrentBB->succ_size());
+    Hash.update(ArrayRef((uint8_t *)&Value, sizeof(Value)));
----------------
tmsri wrote:

I don't fully understand the various cases you are trying to tolerate with this hashing.  You are only hashing the size and the id of the block.   What are the cases you are trying to capture here?  Why not hash the contents of the block?

In this mechanism, the cfg hashing will not tolerate a change to the compiler where the blocks had the same CFG but somehow got re-numbered, right?  


https://github.com/llvm/llvm-project/pull/84145


More information about the llvm-commits mailing list