[clang] [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (PR #66164)

Matthias Braun via cfe-commits cfe-commits at lists.llvm.org
Thu Oct 5 11:19:28 PDT 2023


================
@@ -0,0 +1,215 @@
+//===--- ProfileFuncRef.h - Sample profile function reference ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+///
+/// Defines ProfileFuncRef class.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_PROFILEDATA_PROFILEFUNCREF_H
+#define LLVM_PROFILEDATA_PROFILEFUNCREF_H
+
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/Hashing.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <cstdint>
+
+namespace llvm {
+namespace sampleprof {
+
+/// This class represents a function that is read from a sample profile. It
+/// comes with two forms: a string or a hash code. The latter form is the 64-bit
+/// MD5 of the function name for efficient storage supported by ExtBinary
+/// profile format, and when reading the profile, this class can represent it
+/// without converting it to a string first.
+/// When representing a hash code, we utilize the LengthOrHashCode field to
+/// store it, and Name is set to null. When representing a string, it is same as
+/// StringRef.
+class ProfileFuncRef {
+
+  const char *Data = nullptr;
+
+  /// Use uint64_t instead of size_t so that it can also hold a MD5 value.
+  uint64_t LengthOrHashCode = 0;
+
+  /// Extension to memcmp to handle hash code representation. If both are hash
+  /// values, Lhs and Rhs are both null, function returns 0 (and needs an extra
+  /// comparison using getIntValue). If only one is hash code, it is considered
+  /// less than the StringRef one. Otherwise perform normal string comparison.
+  static int compareMemory(const char *Lhs, const char *Rhs, uint64_t Length) {
+    if (Lhs == Rhs)
+      return 0;
+    if (!Lhs)
+      return -1;
+    if (!Rhs)
+      return 1;
+    return ::memcmp(Lhs, Rhs, (size_t)Length);
+  }
+
+public:
+  ProfileFuncRef() = default;
+
+  /// Constructor from a StringRef.
+  explicit ProfileFuncRef(StringRef Str)
+      : Data(Str.data()), LengthOrHashCode(Str.size()) {
+    if (!Str.getAsInteger(10, LengthOrHashCode))
+      Data = nullptr;
----------------
MatzeB wrote:

Could we keep the constructor simple and just initialize the `ProfileFuncRef` with a string and keep the "try parsing as an integer" logic to outside callers like need it?

This also seems to rely on function names not being a string of digits, which is not true on the object file level (so maybe some language or language feature I am not aware of my produce such names). I can also create such functions with the asm label extension:

```
$ cat x.cpp
void james_bond() __asm__("007");
void james_bond() {}
$ clang++ -S -o - -emit-llvm x.cpp
...
define dso_local void @"007"() #0 {
...
```



https://github.com/llvm/llvm-project/pull/66164


More information about the cfe-commits mailing list