[clang] [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (PR #66164)

via cfe-commits cfe-commits at lists.llvm.org
Wed Sep 27 15:53:57 PDT 2023


================
@@ -0,0 +1,222 @@
+//===--- ProfileFuncRef.h - Sample profile function name ---*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the StringRefOrHashCode class. It is to represent function
+// names in a sample profile, which can be in one of two forms - either a
+// regular string, or a 64-bit hash code.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_PROFILEDATA_PROFILEFUNCREF_H
+#define LLVM_PROFILEDATA_PROFILEFUNCREF_H
+
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/Hashing.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/raw_ostream.h"
+#include <cstdint>
+
+namespace llvm {
+namespace sampleprof {
+
+/// This class represents a function name that is read from a sample profile. It
+/// comes with two forms: a string or a hash code. For efficient storage, a
+/// sample profile may store function names as 64-bit MD5 values, so when
+/// reading the profile, this class can represnet them without converting it to
+/// a string first.
+/// When representing a hash code, we utilize the Length field to store it, and
+/// Data is set to null. When representing a string, it is same as StringRef,
+/// and can be pointer-casted as one.
+/// We disallow implicit cast to StringRef because there are too many instances
+/// that it may cause break the code, such as using it in a StringMap.
+class ProfileFuncRef {
+
+  const char *Data = nullptr;
+
+  /// Use uint64_t instead of size_t so that it can also hold a MD5 value.
+  uint64_t LengthOrHashCode = 0;
----------------
WenleiHe wrote:

Currently the indicator for string vs hash is implicit: Data == null means it's hash. This isn't good for readability. 

Can we represent them with union and an explicit flag? 

```
union {
    StringRef Name;
    uint64_t Hash;
};
bool UseStringName;
```

https://github.com/llvm/llvm-project/pull/66164


More information about the cfe-commits mailing list