[llvm] Add a pass to calculate machine function's cfg hash to detect whether… (PR #84145)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 23 20:18:18 PDT 2024
================
@@ -0,0 +1,73 @@
+//===-- MachineFunctionHashBuilder.cpp ----------------------------------*-
+// C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the implementation of pass about calculating machine
+/// function hash.
+//===----------------------------------------------------------------------===//
+#include "llvm/CodeGen/MachineFunctionHashBuilder.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <queue>
+#include <unordered_map>
+#include <unordered_set>
+
+using namespace llvm;
+
+// Calculate machine function hash in level order traversal.
+// The control flow graph is uniquely represented by its level-order traversal.
+static uint64_t calculateMBBCFGHash(const MachineFunction &MF) {
+ if (MF.empty())
+ return 0;
+ std::unordered_set<const MachineBasicBlock *> Visited;
+ MD5 Hash;
+ std::queue<const MachineBasicBlock *> WorkList;
+ WorkList.push(&*MF.begin());
+ while (!WorkList.empty()) {
+ const MachineBasicBlock *CurrentBB = WorkList.front();
+ WorkList.pop();
+ uint32_t Value = support::endian::byte_swap<uint32_t, endianness::little>(
+ CurrentBB->getBBID()->BaseID);
+ uint32_t Size = support::endian::byte_swap<uint32_t, endianness::little>(
+ CurrentBB->succ_size());
+ Hash.update(ArrayRef((uint8_t *)&Value, sizeof(Value)));
----------------
lifengxiang1025 wrote:
> In this mechanism, the cfg hashing will not tolerate a change to the compiler where the blocks had the same CFG but somehow got re-numbered, right?
Thanks for reply. Yes, you are right.
I think we can divide cfg hashing to three version: loose hash, strict hash, full hash which Bolt do. For example, how about the strict hash is calculated by the size and the id of the block and the full hash is calculated by the contents of the function? More strict hash will cause more compile time.
https://github.com/llvm/llvm-project/pull/84145
More information about the llvm-commits
mailing list