[llvm] [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (PR #66164)

William Junda Huang via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 19 13:00:01 PDT 2023


================
@@ -0,0 +1,225 @@
+//===--- ProfileFuncRef.h - Sample profile function name ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the StringRefOrHashCode class. It is to represent function
+// names in a sample profile, which can be in one of two forms - either a
+// regular string, or a 64-bit hash code.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_PROFILEDATA_PROFILEFUNCREF_H
+#define LLVM_PROFILEDATA_PROFILEFUNCREF_H
+
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/Hashing.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <cstdint>
+
+namespace llvm {
+namespace sampleprof {
+
+/// This class represents a function name that is read from a sample profile. It
+/// comes with two forms: a string or a hash code. For efficient storage, a
+/// sample profile may store function names as 64-bit MD5 values, so when
+/// reading the profile, this class can represnet them without converting it to
+/// a string first.
+/// When representing a hash code, we utilize the Length field to store it, and
+/// Data is set to null. When representing a string, it is same as StringRef,
+/// and can be pointer-casted as one.
+/// We disallow implicit cast to StringRef because there are too many instances
+/// that it may cause break the code, such as using it in a StringMap.
+class ProfileFuncRef {
+
+  const char *Data = nullptr;
+
+  /// Use uint64_t instead of size_t so that it can also hold a MD5 value.
+  uint64_t Length = 0;
+
+  /// Extension to memcmp to handle hash code representation. If both are hash
+  /// values, Lhs and Rhs are both null, function returns 0 (and needs an extra
+  /// comparison using getIntValue). If only one is hash code, it is considered
+  /// less than the StringRef one. Otherwise perform normal string comparison.
+  static int compareMemory(const char *Lhs, const char *Rhs, uint64_t Length) {
+    if (Lhs == Rhs)
+      return 0;
+    if (!Lhs)
+      return -1;
+    if (!Rhs)
+      return 1;
+    return ::memcmp(Lhs, Rhs, (size_t)Length);
+  }
+
+public:
+  ProfileFuncRef() = default;
+
+  /// Constructor from a StringRef.
+  explicit ProfileFuncRef(StringRef Str)
+      : Data(Str.data()), Length(Str.size()) {}
+
+  /// Constructor from a hash code.
+  explicit ProfileFuncRef(uint64_t HashCode)
+      : Data(nullptr), Length(HashCode) {
+    assert(HashCode != 0);
+  }
+
+  /// Constructor from a string. Check if Str is a number, which is generated by
+  /// converting a MD5 sample profile to a format that does not support MD5, and
+  /// if so, convert the numerical string to a hash code first. We assume that
+  /// no function name (from a profile) can be a pure number.
+  explicit ProfileFuncRef(const std::string &Str)
----------------
huangjd wrote:

The point of this class is to allow representation of both String name and MD5. This function is here just in case. It's only used in auxiliary tool or test case. When reading from an MD5 profile, ProfileFuncRef is directly created over the integer MD5 values. 


https://github.com/llvm/llvm-project/pull/66164


More information about the llvm-commits mailing list