[llvm] [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (PR #66164)

William Junda Huang via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 19 13:04:03 PDT 2023


================
@@ -0,0 +1,225 @@
+//===--- ProfileFuncRef.h - Sample profile function name ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the StringRefOrHashCode class. It is to represent function
+// names in a sample profile, which can be in one of two forms - either a
+// regular string, or a 64-bit hash code.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_PROFILEDATA_PROFILEFUNCREF_H
+#define LLVM_PROFILEDATA_PROFILEFUNCREF_H
+
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/Hashing.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <cstdint>
+
+namespace llvm {
+namespace sampleprof {
+
+/// This class represents a function name that is read from a sample profile. It
+/// comes with two forms: a string or a hash code. For efficient storage, a
+/// sample profile may store function names as 64-bit MD5 values, so when
+/// reading the profile, this class can represnet them without converting it to
+/// a string first.
+/// When representing a hash code, we utilize the Length field to store it, and
+/// Data is set to null. When representing a string, it is same as StringRef,
+/// and can be pointer-casted as one.
+/// We disallow implicit cast to StringRef because there are too many instances
+/// that it may cause break the code, such as using it in a StringMap.
+class ProfileFuncRef {
+
+  const char *Data = nullptr;
+
+  /// Use uint64_t instead of size_t so that it can also hold a MD5 value.
+  uint64_t Length = 0;
+
+  /// Extension to memcmp to handle hash code representation. If both are hash
+  /// values, Lhs and Rhs are both null, function returns 0 (and needs an extra
+  /// comparison using getIntValue). If only one is hash code, it is considered
+  /// less than the StringRef one. Otherwise perform normal string comparison.
+  static int compareMemory(const char *Lhs, const char *Rhs, uint64_t Length) {
+    if (Lhs == Rhs)
+      return 0;
+    if (!Lhs)
+      return -1;
+    if (!Rhs)
+      return 1;
+    return ::memcmp(Lhs, Rhs, (size_t)Length);
+  }
+
+public:
+  ProfileFuncRef() = default;
+
+  /// Constructor from a StringRef.
+  explicit ProfileFuncRef(StringRef Str)
+      : Data(Str.data()), Length(Str.size()) {}
+
+  /// Constructor from a hash code.
+  explicit ProfileFuncRef(uint64_t HashCode)
+      : Data(nullptr), Length(HashCode) {
+    assert(HashCode != 0);
+  }
+
+  /// Constructor from a string. Check if Str is a number, which is generated by
+  /// converting a MD5 sample profile to a format that does not support MD5, and
+  /// if so, convert the numerical string to a hash code first. We assume that
+  /// no function name (from a profile) can be a pure number.
+  explicit ProfileFuncRef(const std::string &Str)
+      : Data(Str.data()), Length(Str.size()) {
+    // Only need to check for base 10 digits, fail faster if otherwise.
+    if (Str.length() > 0 && isdigit(Str[0]) &&
+        !StringRef(Str).getAsInteger(10, Length))
+      Data = nullptr;
+  }
+
+  /// Check for equality. Similar to StringRef::equals, but will also cover for
+  /// the case where one or both are hash codes. Comparing their int values are
+  /// sufficient. A hash code ProfileFuncName is considered not equal to a
+  /// StringRef ProfileFuncName regardless of actual contents.
+  bool equals(const ProfileFuncRef &Other) const {
+    return Length == Other.Length &&
+           compareMemory(Data, Other.Data, Length) == 0;
+  }
+
+  /// Total order comparison. If both ProfileFuncName are StringRef, this is the
+  /// same as StringRef::compare. If one of them is StringRef, it is considered
+  /// greater than the hash code ProfileFuncName. Otherwise this is the the
----------------
huangjd wrote:

Yes, this class makes sure these two types can be compared without issue. Normally the two types are not supposed to be mixed because a profile is using either StringRef or MD5, and the reader sets a static flag to indicate which type it is. Many functions in sample profile matching pass will break if the user intentionally mix these two types, but there isn't a normal way from the front end to do so. 

https://github.com/llvm/llvm-project/pull/66164


More information about the llvm-commits mailing list