[llvm] [CAS] Add LLVMCAS library with InMemoryCAS implementation (PR #114096)
Steven Wu via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 30 14:54:54 PDT 2024
https://github.com/cachemeifyoucan updated https://github.com/llvm/llvm-project/pull/114096
>From 9bf0f3079c410eb096ad3c2cefb89679bd34282b Mon Sep 17 00:00:00 2001
From: Steven Wu <stevenwu at apple.com>
Date: Tue, 29 Oct 2024 10:36:55 -0700
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
=?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Created using spr 1.3.5
---
llvm/docs/ContentAddressableStorage.md | 120 +++++++
llvm/docs/Reference.rst | 4 +
llvm/include/llvm/CAS/BuiltinCASContext.h | 88 +++++
llvm/include/llvm/CAS/BuiltinObjectHasher.h | 81 +++++
llvm/include/llvm/CAS/CASID.h | 156 +++++++++
llvm/include/llvm/CAS/CASReference.h | 207 +++++++++++
llvm/include/llvm/CAS/ObjectStore.h | 302 ++++++++++++++++
llvm/include/module.modulemap | 6 +
llvm/lib/CAS/BuiltinCAS.cpp | 94 +++++
llvm/lib/CAS/BuiltinCAS.h | 74 ++++
llvm/lib/CAS/CMakeLists.txt | 8 +
llvm/lib/CAS/InMemoryCAS.cpp | 320 +++++++++++++++++
llvm/lib/CAS/ObjectStore.cpp | 168 +++++++++
llvm/lib/CMakeLists.txt | 1 +
llvm/unittests/CAS/CASTestConfig.cpp | 22 ++
llvm/unittests/CAS/CASTestConfig.h | 32 ++
llvm/unittests/CAS/CMakeLists.txt | 12 +
llvm/unittests/CAS/ObjectStoreTest.cpp | 360 ++++++++++++++++++++
llvm/unittests/CMakeLists.txt | 1 +
19 files changed, 2056 insertions(+)
create mode 100644 llvm/docs/ContentAddressableStorage.md
create mode 100644 llvm/include/llvm/CAS/BuiltinCASContext.h
create mode 100644 llvm/include/llvm/CAS/BuiltinObjectHasher.h
create mode 100644 llvm/include/llvm/CAS/CASID.h
create mode 100644 llvm/include/llvm/CAS/CASReference.h
create mode 100644 llvm/include/llvm/CAS/ObjectStore.h
create mode 100644 llvm/lib/CAS/BuiltinCAS.cpp
create mode 100644 llvm/lib/CAS/BuiltinCAS.h
create mode 100644 llvm/lib/CAS/CMakeLists.txt
create mode 100644 llvm/lib/CAS/InMemoryCAS.cpp
create mode 100644 llvm/lib/CAS/ObjectStore.cpp
create mode 100644 llvm/unittests/CAS/CASTestConfig.cpp
create mode 100644 llvm/unittests/CAS/CASTestConfig.h
create mode 100644 llvm/unittests/CAS/CMakeLists.txt
create mode 100644 llvm/unittests/CAS/ObjectStoreTest.cpp
diff --git a/llvm/docs/ContentAddressableStorage.md b/llvm/docs/ContentAddressableStorage.md
new file mode 100644
index 00000000000000..4f2d9a6a3a9185
--- /dev/null
+++ b/llvm/docs/ContentAddressableStorage.md
@@ -0,0 +1,120 @@
+# Content Addressable Storage
+
+## Introduction to CAS
+
+Content Addressable Storage, or `CAS`, is a storage system where it assigns
+unique addresses to the data stored. It is very useful for data deduplicaton
+and creating unique identifiers.
+
+Unlikely other kind of storage system like file system, CAS is immutable. It
+is more reliable to model a computation when representing the inputs and outputs
+of the computation using objects stored in CAS.
+
+The basic unit of the CAS library is a CASObject, where it contains:
+
+* Data: arbitrary data
+* References: references to other CASObject
+
+It can be conceptually modeled as something like:
+
+```
+struct CASObject {
+ ArrayRef<char> Data;
+ ArrayRef<CASObject*> Refs;
+}
+```
+
+Such abstraction can allow simple composition of CASObjects into a DAG to
+represent complicated data structure while still allowing data deduplication.
+Note you can compare two DAGs by just comparing the CASObject hash of two
+root nodes.
+
+
+
+## LLVM CAS Library User Guide
+
+The CAS-like storage provided in LLVM is `llvm::cas::ObjectStore`.
+To reference a CASObject, there are few different abstractions provided
+with different trade-offs:
+
+### ObjectRef
+
+`ObjectRef` is a lightweight reference to a CASObject stored in the CAS.
+This is the most commonly used abstraction and it is cheap to copy/pass
+along. It has following properties:
+
+* `ObjectRef` is only meaningful within the `ObjectStore` that created the ref.
+`ObjectRef` created by different `ObjectStore` cannot be cross-referenced or
+compared.
+* `ObjectRef` doesn't guarantee the existence of the CASObject it points to. An
+explicitly load is required before accessing the data stored in CASObject.
+This load can also fail, for reasons like but not limited to: object does
+not exist, corrupted CAS storage, operation timeout, etc.
+* If two `ObjectRef` are equal, it is guarantee that the object they point to
+(if exists) are identical. If they are not equal, the underlying objects are
+guaranteed to be not the same.
+
+### ObjectProxy
+
+`ObjectProxy` represents a loaded CASObject. With an `ObjectProxy`, the
+underlying stored data and references can be accessed without the need
+of error handling. The class APIs also provide convenient methods to
+access underlying data. The lifetime of the underlying data is equal to
+the lifetime of the instance of `ObjectStore` unless explicitly copied.
+
+### CASID
+
+`CASID` is the hash identifier for CASObjects. It owns the underlying
+storage for hash value so it can be expensive to copy and compare depending
+on the hash algorithm. `CASID` is generally only useful in rare situations
+like printing raw hash value or exchanging hash values between different
+CAS instances with the same hashing schema.
+
+### ObjectStore
+
+`ObjectStore` is the CAS-like object storage. It provides API to save
+and load CASObjects, for example:
+
+```
+ObjectRef A, B, C;
+Expected<ObjectRef> Stored = ObjectStore.store("data", {A, B});
+Expected<ObjectProxy> Loaded = ObjectStore.getProxy(C);
+```
+
+It also provides APIs to convert between `ObjectRef`, `ObjectProxy` and
+`CASID`.
+
+
+
+## CAS Library Implementation Guide
+
+The LLVM ObjectStore APIs are designed so that it is easy to add
+customized CAS implementation that are interchangeable with builtin
+CAS implementations.
+
+To add your own implementation, you just need to add a subclass to
+`llvm::cas::ObjectStore` and implement all its pure virtual methods.
+To be interchangeable with LLVM ObjectStore, the new CAS implementation
+needs to conform to following contracts:
+
+* Different CASObject stored in the ObjectStore needs to have a different hash
+and result in a different `ObjectRef`. Vice versa, same CASObject should have
+same hash and same `ObjectRef`. Note two different CASObjects with identical
+data but different references are considered different objects.
+* `ObjectRef`s are comparable within the same `ObjectStore` instance, and can
+be used to determine the equality of the underlying CASObjects.
+* The loaded objects from the ObjectStore need to have the lifetime to be at
+least as long as the ObjectStore itself.
+
+If not specified, the behavior can be implementation defined. For example,
+`ObjectRef` can be used to point to a loaded CASObject so
+`ObjectStore` never fails to load. It is also legal to use a stricter model
+than required. For example, an `ObjectRef` that can be used to compare
+objects between different `ObjectStore` instances is legal but user
+of the ObjectStore should not depend on this behavior.
+
+For CAS library implementer, there is also a `ObjectHandle` class that
+is an internal representation of a loaded CASObject reference.
+`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, because
+just like `ObjectRef`, `ObjectHandle` is only useful when paired with
+the ObjectStore that knows about the loaded CASObject.
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index df61628b06c7db..ae03a3a7bfa9aa 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -15,6 +15,7 @@ LLVM and API reference documentation.
BranchWeightMetadata
Bugpoint
CommandGuide/index
+ ContentAddressableStorage
ConvergenceAndUniformity
ConvergentOperations
Coroutines
@@ -232,3 +233,6 @@ Additional Topics
:doc:`ConvergenceAndUniformity`
A description of uniformity analysis in the presence of irreducible
control flow, and its implementation.
+
+:doc:`ContentAddressableStorage`
+ A reference guide for using LLVM's CAS library.
diff --git a/llvm/include/llvm/CAS/BuiltinCASContext.h b/llvm/include/llvm/CAS/BuiltinCASContext.h
new file mode 100644
index 00000000000000..ebc4ca8bd1f2e9
--- /dev/null
+++ b/llvm/include/llvm/CAS/BuiltinCASContext.h
@@ -0,0 +1,88 @@
+//===- BuiltinCASContext.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_BUILTINCASCONTEXT_H
+#define LLVM_CAS_BUILTINCASCONTEXT_H
+
+#include "llvm/CAS/CASID.h"
+#include "llvm/Support/BLAKE3.h"
+#include "llvm/Support/Error.h"
+
+namespace llvm::cas::builtin {
+
+/// Current hash type for the builtin CAS.
+///
+/// FIXME: This should be configurable via an enum to allow configuring the hash
+/// function. The enum should be sent into \a createInMemoryCAS() and \a
+/// createOnDiskCAS().
+///
+/// This is important (at least) for future-proofing, when we want to make new
+/// CAS instances use BLAKE7, but still know how to read/write BLAKE3.
+///
+/// Even just for BLAKE3, it would be useful to have these values:
+///
+/// BLAKE3 => 32B hash from BLAKE3
+/// BLAKE3_16B => 16B hash from BLAKE3 (truncated)
+///
+/// ... where BLAKE3_16 uses \a TruncatedBLAKE3<16>.
+///
+/// Motivation for a truncated hash is that it's cheaper to store. It's not
+/// clear if we always (or ever) need the full 32B, and for an ephemeral
+/// in-memory CAS, we almost certainly don't need it.
+///
+/// Note that the cost is linear in the number of objects for the builtin CAS,
+/// since we're using internal offsets and/or pointers as an optimization.
+///
+/// However, it's possible we'll want to hook up a local builtin CAS to, e.g.,
+/// a distributed generic hash map to use as an ActionCache. In that scenario,
+/// the transitive closure of the structured objects that are the results of
+/// the cached actions would need to be serialized into the map, something
+/// like:
+///
+/// "action:<schema>:<key>" -> "0123"
+/// "object:<schema>:0123" -> "3,4567,89AB,CDEF,9,some data"
+/// "object:<schema>:4567" -> ...
+/// "object:<schema>:89AB" -> ...
+/// "object:<schema>:CDEF" -> ...
+///
+/// These references would be full cost.
+using HasherT = BLAKE3;
+using HashType = decltype(HasherT::hash(std::declval<ArrayRef<uint8_t> &>()));
+
+class BuiltinCASContext : public CASContext {
+ void printIDImpl(raw_ostream &OS, const CASID &ID) const final;
+ void anchor() override;
+
+public:
+ /// Get the name of the hash for any table identifiers.
+ ///
+ /// FIXME: This should be configurable via an enum, with at the following
+ /// values:
+ ///
+ /// "BLAKE3" => 32B hash from BLAKE3
+ /// "BLAKE3.16" => 16B hash from BLAKE3 (truncated)
+ ///
+ /// Enum can be sent into \a createInMemoryCAS() and \a createOnDiskCAS().
+ static StringRef getHashName() { return "BLAKE3"; }
+ StringRef getHashSchemaIdentifier() const final {
+ static const std::string ID =
+ ("llvm.cas.builtin.v2[" + getHashName() + "]").str();
+ return ID;
+ }
+
+ static const BuiltinCASContext &getDefaultContext();
+
+ BuiltinCASContext() = default;
+
+ static Expected<HashType> parseID(StringRef PrintedDigest);
+ static void printID(ArrayRef<uint8_t> Digest, raw_ostream &OS);
+};
+
+} // namespace llvm::cas::builtin
+
+#endif // LLVM_CAS_BUILTINCASCONTEXT_H
diff --git a/llvm/include/llvm/CAS/BuiltinObjectHasher.h b/llvm/include/llvm/CAS/BuiltinObjectHasher.h
new file mode 100644
index 00000000000000..22e556c5669b55
--- /dev/null
+++ b/llvm/include/llvm/CAS/BuiltinObjectHasher.h
@@ -0,0 +1,81 @@
+//===- BuiltinObjectHasher.h ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_BUILTINOBJECTHASHER_H
+#define LLVM_CAS_BUILTINOBJECTHASHER_H
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/Support/Endian.h"
+
+namespace llvm::cas {
+
+template <class HasherT> class BuiltinObjectHasher {
+public:
+ using HashT = decltype(HasherT::hash(std::declval<ArrayRef<uint8_t> &>()));
+
+ static HashT hashObject(const ObjectStore &CAS, ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) {
+ BuiltinObjectHasher H;
+ H.updateSize(Refs.size());
+ for (const ObjectRef &Ref : Refs)
+ H.updateRef(CAS, Ref);
+ H.updateArray(Data);
+ return H.finish();
+ }
+
+ static HashT hashObject(ArrayRef<ArrayRef<uint8_t>> Refs,
+ ArrayRef<char> Data) {
+ BuiltinObjectHasher H;
+ H.updateSize(Refs.size());
+ for (const ArrayRef<uint8_t> &Ref : Refs)
+ H.updateID(Ref);
+ H.updateArray(Data);
+ return H.finish();
+ }
+
+private:
+ HashT finish() { return Hasher.final(); }
+
+ void updateRef(const ObjectStore &CAS, ObjectRef Ref) {
+ updateID(CAS.getID(Ref));
+ }
+
+ void updateID(const CASID &ID) { updateID(ID.getHash()); }
+
+ void updateID(ArrayRef<uint8_t> Hash) {
+ // NOTE: Does not hash the size of the hash. That's a CAS implementation
+ // detail that shouldn't leak into the UUID for an object.
+ assert(Hash.size() == sizeof(HashT) &&
+ "Expected object ref to match the hash size");
+ Hasher.update(Hash);
+ }
+
+ void updateArray(ArrayRef<uint8_t> Bytes) {
+ updateSize(Bytes.size());
+ Hasher.update(Bytes);
+ }
+
+ void updateArray(ArrayRef<char> Bytes) {
+ updateArray(ArrayRef(reinterpret_cast<const uint8_t *>(Bytes.data()),
+ Bytes.size()));
+ }
+
+ void updateSize(uint64_t Size) {
+ Size = support::endian::byte_swap(Size, endianness::little);
+ Hasher.update(
+ ArrayRef(reinterpret_cast<const uint8_t *>(&Size), sizeof(Size)));
+ }
+
+ BuiltinObjectHasher() = default;
+ ~BuiltinObjectHasher() = default;
+ HasherT Hasher;
+};
+
+} // namespace llvm::cas
+
+#endif // LLVM_CAS_BUILTINOBJECTHASHER_H
diff --git a/llvm/include/llvm/CAS/CASID.h b/llvm/include/llvm/CAS/CASID.h
new file mode 100644
index 00000000000000..5f9110a15819ad
--- /dev/null
+++ b/llvm/include/llvm/CAS/CASID.h
@@ -0,0 +1,156 @@
+//===- llvm/CAS/CASID.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_CASID_H
+#define LLVM_CAS_CASID_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace cas {
+
+class CASID;
+
+/// Context for CAS identifiers.
+class CASContext {
+ virtual void anchor();
+
+public:
+ virtual ~CASContext() = default;
+
+ /// Get an identifer for the schema used by this CAS context. Two CAS
+ /// instances should return \c true for this identifier if and only if their
+ /// CASIDs are safe to compare by hash. This is used by \a
+ /// CASID::equalsImpl().
+ virtual StringRef getHashSchemaIdentifier() const = 0;
+
+protected:
+ /// Print \p ID to \p OS.
+ virtual void printIDImpl(raw_ostream &OS, const CASID &ID) const = 0;
+
+ friend class CASID;
+};
+
+/// Unique identifier for a CAS object.
+///
+/// Locally, stores an internal CAS identifier that's specific to a single CAS
+/// instance. It's guaranteed not to change across the view of that CAS, but
+/// might change between runs.
+///
+/// It also has \a CASIDContext pointer to allow comparison of these
+/// identifiers. If two CASIDs are from the same CASIDContext, they can be
+/// compared directly. If they are, then \a
+/// CASIDContext::getHashSchemaIdentifier() is compared to see if they can be
+/// compared by hash, in which case the result of \a getHash() is compared.
+class CASID {
+public:
+ void dump() const;
+ void print(raw_ostream &OS) const {
+ return getContext().printIDImpl(OS, *this);
+ }
+ friend raw_ostream &operator<<(raw_ostream &OS, const CASID &ID) {
+ ID.print(OS);
+ return OS;
+ }
+ std::string toString() const;
+
+ ArrayRef<uint8_t> getHash() const {
+ return arrayRefFromStringRef<uint8_t>(Hash);
+ }
+
+ friend bool operator==(const CASID &LHS, const CASID &RHS) {
+ if (LHS.Context == RHS.Context)
+ return LHS.Hash == RHS.Hash;
+
+ // EmptyKey or TombstoneKey.
+ if (!LHS.Context || !RHS.Context)
+ return false;
+
+ // CASIDs are equal when they have the same hash schema and same hash value.
+ return LHS.Context->getHashSchemaIdentifier() ==
+ RHS.Context->getHashSchemaIdentifier() &&
+ LHS.Hash == RHS.Hash;
+ }
+
+ friend bool operator!=(const CASID &LHS, const CASID &RHS) {
+ return !(LHS == RHS);
+ }
+
+ friend hash_code hash_value(const CASID &ID) {
+ ArrayRef<uint8_t> Hash = ID.getHash();
+ return hash_combine_range(Hash.begin(), Hash.end());
+ }
+
+ const CASContext &getContext() const {
+ assert(Context && "Tombstone or empty key for DenseMap?");
+ return *Context;
+ }
+
+ static CASID getDenseMapEmptyKey() {
+ return CASID(nullptr, DenseMapInfo<StringRef>::getEmptyKey());
+ }
+ static CASID getDenseMapTombstoneKey() {
+ return CASID(nullptr, DenseMapInfo<StringRef>::getTombstoneKey());
+ }
+
+ CASID() = delete;
+
+ static CASID create(const CASContext *Context, StringRef Hash) {
+ return CASID(Context, Hash);
+ }
+
+private:
+ CASID(const CASContext *Context, StringRef Hash)
+ : Context(Context), Hash(Hash) {}
+
+ const CASContext *Context;
+ SmallString<32> Hash;
+};
+
+/// This is used to workaround the issue of MSVC needing default-constructible
+/// types for \c std::promise/future.
+template <typename T> struct AsyncValue {
+ Expected<std::optional<T>> take() { return std::move(Value); }
+
+ AsyncValue() : Value(std::nullopt) {}
+ AsyncValue(Error &&E) : Value(std::move(E)) {}
+ AsyncValue(T &&V) : Value(std::move(V)) {}
+ AsyncValue(std::nullopt_t) : Value(std::nullopt) {}
+ AsyncValue(Expected<std::optional<T>> &&Obj) : Value(std::move(Obj)) {}
+
+private:
+ Expected<std::optional<T>> Value;
+};
+
+} // namespace cas
+
+template <> struct DenseMapInfo<cas::CASID> {
+ static cas::CASID getEmptyKey() { return cas::CASID::getDenseMapEmptyKey(); }
+
+ static cas::CASID getTombstoneKey() {
+ return cas::CASID::getDenseMapTombstoneKey();
+ }
+
+ static unsigned getHashValue(cas::CASID ID) {
+ return (unsigned)hash_value(ID);
+ }
+
+ static bool isEqual(cas::CASID LHS, cas::CASID RHS) { return LHS == RHS; }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CAS_CASID_H
diff --git a/llvm/include/llvm/CAS/CASReference.h b/llvm/include/llvm/CAS/CASReference.h
new file mode 100644
index 00000000000000..1f435cf306c4ca
--- /dev/null
+++ b/llvm/include/llvm/CAS/CASReference.h
@@ -0,0 +1,207 @@
+//===- llvm/CAS/CASReference.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_CASREFERENCE_H
+#define LLVM_CAS_CASREFERENCE_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace cas {
+
+class ObjectStore;
+
+class ObjectHandle;
+class ObjectRef;
+
+/// Base class for references to things in \a ObjectStore.
+class ReferenceBase {
+protected:
+ struct DenseMapEmptyTag {};
+ struct DenseMapTombstoneTag {};
+ static constexpr uint64_t getDenseMapEmptyRef() { return -1ULL; }
+ static constexpr uint64_t getDenseMapTombstoneRef() { return -2ULL; }
+
+public:
+ /// Get an internal reference.
+ uint64_t getInternalRef(const ObjectStore &ExpectedCAS) const {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+ assert(CAS == &ExpectedCAS && "Extracting reference for the wrong CAS");
+#endif
+ return InternalRef;
+ }
+
+ unsigned getDenseMapHash() const {
+ return (unsigned)llvm::hash_value(InternalRef);
+ }
+ bool isDenseMapEmpty() const { return InternalRef == getDenseMapEmptyRef(); }
+ bool isDenseMapTombstone() const {
+ return InternalRef == getDenseMapTombstoneRef();
+ }
+ bool isDenseMapSentinel() const {
+ return isDenseMapEmpty() || isDenseMapTombstone();
+ }
+
+protected:
+ void print(raw_ostream &OS, const ObjectHandle &This) const;
+ void print(raw_ostream &OS, const ObjectRef &This) const;
+
+ bool hasSameInternalRef(const ReferenceBase &RHS) const {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+ assert(
+ (isDenseMapSentinel() || RHS.isDenseMapSentinel() || CAS == RHS.CAS) &&
+ "Cannot compare across CAS instances");
+#endif
+ return InternalRef == RHS.InternalRef;
+ }
+
+protected:
+ friend class ObjectStore;
+ ReferenceBase(const ObjectStore *CAS, uint64_t InternalRef, bool IsHandle)
+ : InternalRef(InternalRef) {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+ this->CAS = CAS;
+#endif
+ assert(InternalRef != getDenseMapEmptyRef() && "Reserved for DenseMapInfo");
+ assert(InternalRef != getDenseMapTombstoneRef() &&
+ "Reserved for DenseMapInfo");
+ }
+ explicit ReferenceBase(DenseMapEmptyTag)
+ : InternalRef(getDenseMapEmptyRef()) {}
+ explicit ReferenceBase(DenseMapTombstoneTag)
+ : InternalRef(getDenseMapTombstoneRef()) {}
+
+private:
+ uint64_t InternalRef;
+
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+ const ObjectStore *CAS = nullptr;
+#endif
+};
+
+/// Reference to an object in a \a ObjectStore instance.
+///
+/// If you have an ObjectRef, you know the object exists, and you can point at
+/// it from new nodes with \a ObjectStore::store(), but you don't know anything
+/// about it. "Loading" the object is a separate step that may not have
+/// happened yet, and which can fail (due to filesystem corruption) or
+/// introduce latency (if downloading from a remote store).
+///
+/// \a ObjectStore::store() takes a list of these, and these are returned by \a
+/// ObjectStore::forEachRef() and \a ObjectStore::readRef(), which are accessors
+/// for nodes, and \a ObjectStore::getReference().
+///
+/// \a ObjectStore::load() will load the referenced object, and returns \a
+/// ObjectHandle, a variant that knows what kind of entity it is. \a
+/// ObjectStore::getReferenceKind() can expect the type of reference without
+/// asking for unloaded objects to be loaded.
+///
+/// This is a wrapper around a \c uint64_t (and a \a ObjectStore instance when
+/// assertions are on). If necessary, it can be deconstructed and reconstructed
+/// using \a Reference::getInternalRef() and \a
+/// Reference::getFromInternalRef(), but clients aren't expected to need to do
+/// this. These both require the right \a ObjectStore instance.
+class ObjectRef : public ReferenceBase {
+ struct DenseMapTag {};
+
+public:
+ friend bool operator==(const ObjectRef &LHS, const ObjectRef &RHS) {
+ return LHS.hasSameInternalRef(RHS);
+ }
+ friend bool operator!=(const ObjectRef &LHS, const ObjectRef &RHS) {
+ return !(LHS == RHS);
+ }
+
+ /// Allow a reference to be recreated after it's deconstructed.
+ static ObjectRef getFromInternalRef(const ObjectStore &CAS,
+ uint64_t InternalRef) {
+ return ObjectRef(CAS, InternalRef);
+ }
+
+ static ObjectRef getDenseMapEmptyKey() {
+ return ObjectRef(DenseMapEmptyTag{});
+ }
+ static ObjectRef getDenseMapTombstoneKey() {
+ return ObjectRef(DenseMapTombstoneTag{});
+ }
+
+ /// Print internal ref and/or CASID. Only suitable for debugging.
+ void print(raw_ostream &OS) const { return ReferenceBase::print(OS, *this); }
+
+ LLVM_DUMP_METHOD void dump() const;
+
+private:
+ friend class ObjectStore;
+ friend class ReferenceBase;
+ using ReferenceBase::ReferenceBase;
+ ObjectRef(const ObjectStore &CAS, uint64_t InternalRef)
+ : ReferenceBase(&CAS, InternalRef, /*IsHandle=*/false) {
+ assert(InternalRef != -1ULL && "Reserved for DenseMapInfo");
+ assert(InternalRef != -2ULL && "Reserved for DenseMapInfo");
+ }
+ explicit ObjectRef(DenseMapEmptyTag T) : ReferenceBase(T) {}
+ explicit ObjectRef(DenseMapTombstoneTag T) : ReferenceBase(T) {}
+ explicit ObjectRef(ReferenceBase) = delete;
+};
+
+/// Handle to a loaded object in a \a ObjectStore instance.
+///
+/// ObjectHandle encapulates a *loaded* object in the CAS. You need one
+/// of these to inspect the content of an object: to look at its stored
+/// data and references.
+class ObjectHandle : public ReferenceBase {
+public:
+ friend bool operator==(const ObjectHandle &LHS, const ObjectHandle &RHS) {
+ return LHS.hasSameInternalRef(RHS);
+ }
+ friend bool operator!=(const ObjectHandle &LHS, const ObjectHandle &RHS) {
+ return !(LHS == RHS);
+ }
+
+ /// Print internal ref and/or CASID. Only suitable for debugging.
+ void print(raw_ostream &OS) const { return ReferenceBase::print(OS, *this); }
+
+ LLVM_DUMP_METHOD void dump() const;
+
+private:
+ friend class ObjectStore;
+ friend class ReferenceBase;
+ using ReferenceBase::ReferenceBase;
+ explicit ObjectHandle(ReferenceBase) = delete;
+ ObjectHandle(const ObjectStore &CAS, uint64_t InternalRef)
+ : ReferenceBase(&CAS, InternalRef, /*IsHandle=*/true) {}
+};
+
+} // namespace cas
+
+template <> struct DenseMapInfo<cas::ObjectRef> {
+ static cas::ObjectRef getEmptyKey() {
+ return cas::ObjectRef::getDenseMapEmptyKey();
+ }
+
+ static cas::ObjectRef getTombstoneKey() {
+ return cas::ObjectRef::getDenseMapTombstoneKey();
+ }
+
+ static unsigned getHashValue(cas::ObjectRef Ref) {
+ return Ref.getDenseMapHash();
+ }
+
+ static bool isEqual(cas::ObjectRef LHS, cas::ObjectRef RHS) {
+ return LHS == RHS;
+ }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CAS_CASREFERENCE_H
diff --git a/llvm/include/llvm/CAS/ObjectStore.h b/llvm/include/llvm/CAS/ObjectStore.h
new file mode 100644
index 00000000000000..b4720c7edc1543
--- /dev/null
+++ b/llvm/include/llvm/CAS/ObjectStore.h
@@ -0,0 +1,302 @@
+//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_OBJECTSTORE_H
+#define LLVM_CAS_OBJECTSTORE_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CAS/CASID.h"
+#include "llvm/CAS/CASReference.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
+#include <cstddef>
+
+namespace llvm {
+
+class MemoryBuffer;
+template <typename T> class unique_function;
+
+namespace cas {
+
+class ObjectStore;
+class ObjectProxy;
+
+/// Content-addressable storage for objects.
+///
+/// Conceptually, objects are stored in a "unique set".
+///
+/// - Objects are immutable ("value objects") that are defined by their
+/// content. They are implicitly deduplicated by content.
+/// - Each object has a unique identifier (UID) that's derived from its content,
+/// called a \a CASID.
+/// - This UID is a fixed-size (strong) hash of the transitive content of a
+/// CAS object.
+/// - It's comparable between any two CAS instances that have the same \a
+/// CASIDContext::getHashSchemaIdentifier().
+/// - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
+/// by the same or a different CAS instance with \a
+/// ObjectStore::parseID().
+/// - An object can be looked up by content or by UID.
+/// - \a store() is "get-or-create" methods, writing an object if it
+/// doesn't exist yet, and return a ref to it in any case.
+/// - \a loadObject(const CASID&) looks up an object by its UID.
+/// - Objects can reference other objects, forming an arbitrary DAG.
+///
+/// The \a ObjectStore interface has a few ways of referencing objects:
+///
+/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
+/// opaque type that references an object inside a specific CAS. It is
+/// implementation defined if the underlying object exists or not for an
+/// ObjectRef, and it can used to speed up CAS lookup as an implementation
+/// detail. However, you don't know anything about the underlying objects.
+/// "Loading" the object is a separate step that may not have happened
+/// yet, and which can fail (e.g. due to filesystem corruption) or introduce
+/// latency (if downloading from a remote store).
+/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
+/// these to inspect the content of an object: to look at its stored
+/// data and references. This is internal to CAS implementation and not
+/// availble from CAS public APIs.
+/// - \a CASID: the UID for an object in the CAS, obtained through \a
+/// ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
+/// identifier, but may reference an object that is unknown to this CAS
+/// instance.
+/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
+/// wraps access APIs to avoid having to pass extra parameters. It is the
+/// object used for accessing underlying data and refs by CAS users.
+///
+/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
+/// are only valid with the associated ObjectStore instance.
+///
+/// There are a few options for accessing content of objects, with different
+/// lifetime tradeoffs:
+///
+/// - \a getData() accesses data without exposing lifetime at all.
+/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
+/// is independent of the CAS (it can live longer).
+/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
+/// long as \a ObjectStore.
+/// - \a readRef() and \a forEachRef() iterate through the references in an
+/// object. There is no lifetime assumption.
+class ObjectStore {
+ friend class ObjectProxy;
+ void anchor();
+
+public:
+ /// Get a \p CASID from a \p ID, which should have been generated by \a
+ /// CASID::print(). This succeeds as long as \a validateID() would pass. The
+ /// object may be unknown to this CAS instance.
+ ///
+ /// TODO: Remove, and update callers to use \a validateID() or \a
+ /// extractHashFromID().
+ virtual Expected<CASID> parseID(StringRef ID) = 0;
+
+ /// Store object into ObjectStore.
+ virtual Expected<ObjectRef> store(ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) = 0;
+ /// Get an ID for \p Ref.
+ virtual CASID getID(ObjectRef Ref) const = 0;
+
+ /// Get an existing reference to the object called \p ID.
+ ///
+ /// Returns \c None if the object is not stored in this CAS.
+ virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
+
+ /// \returns true if the object is directly available from the local CAS, for
+ /// implementations that have this kind of distinction.
+ virtual Expected<bool> isMaterialized(ObjectRef Ref) const = 0;
+
+ /// Validate the underlying object referred by CASID.
+ virtual Error validate(const CASID &ID) = 0;
+
+protected:
+ /// Load the object referenced by \p Ref.
+ ///
+ /// Errors if the object cannot be loaded.
+ /// \returns \c std::nullopt if the object is missing from the CAS.
+ virtual Expected<std::optional<ObjectHandle>> loadIfExists(ObjectRef Ref) = 0;
+
+ /// Like \c loadIfExists but returns an error if the object is missing.
+ Expected<ObjectHandle> load(ObjectRef Ref);
+
+ /// Get the size of some data.
+ virtual uint64_t getDataSize(ObjectHandle Node) const = 0;
+
+ /// Methods for handling objects.
+ virtual Error forEachRef(ObjectHandle Node,
+ function_ref<Error(ObjectRef)> Callback) const = 0;
+ virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
+ virtual size_t getNumRefs(ObjectHandle Node) const = 0;
+ virtual ArrayRef<char> getData(ObjectHandle Node,
+ bool RequiresNullTerminator = false) const = 0;
+
+ /// Get ObjectRef from open file.
+ virtual Expected<ObjectRef>
+ storeFromOpenFileImpl(sys::fs::file_t FD,
+ std::optional<sys::fs::file_status> Status);
+
+ /// Get a lifetime-extended StringRef pointing at \p Data.
+ ///
+ /// Depending on the CAS implementation, this may involve in-memory storage
+ /// overhead.
+ StringRef getDataString(ObjectHandle Node) {
+ return toStringRef(getData(Node));
+ }
+
+ /// Get a lifetime-extended MemoryBuffer pointing at \p Data.
+ ///
+ /// Depending on the CAS implementation, this may involve in-memory storage
+ /// overhead.
+ std::unique_ptr<MemoryBuffer>
+ getMemoryBuffer(ObjectHandle Node, StringRef Name = "",
+ bool RequiresNullTerminator = true);
+
+ /// Read all the refs from object in a SmallVector.
+ virtual void readRefs(ObjectHandle Node,
+ SmallVectorImpl<ObjectRef> &Refs) const;
+
+ /// Allow ObjectStore implementations to create internal handles.
+#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind) \
+ HandleKind make##HandleKind(uint64_t InternalRef) const { \
+ return HandleKind(*this, InternalRef); \
+ }
+ MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectHandle)
+ MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectRef)
+#undef MAKE_CAS_HANDLE_CONSTRUCTOR
+
+public:
+ /// Helper functions to store object and returns a ObjectProxy.
+ Expected<ObjectProxy> createProxy(ArrayRef<ObjectRef> Refs, StringRef Data);
+
+ /// Store object from StringRef.
+ Expected<ObjectRef> storeFromString(ArrayRef<ObjectRef> Refs,
+ StringRef String) {
+ return store(Refs, arrayRefFromStringRef<char>(String));
+ }
+
+ /// Default implementation reads \p FD and calls \a storeNode(). Does not
+ /// take ownership of \p FD; the caller is responsible for closing it.
+ ///
+ /// If \p Status is sent in it is to be treated as a hint. Implementations
+ /// must protect against the file size potentially growing after the status
+ /// was taken (i.e., they cannot assume that an mmap will be null-terminated
+ /// where \p Status implies).
+ ///
+ /// Returns the \a CASID and the size of the file.
+ Expected<ObjectRef>
+ storeFromOpenFile(sys::fs::file_t FD,
+ std::optional<sys::fs::file_status> Status = std::nullopt) {
+ return storeFromOpenFileImpl(FD, Status);
+ }
+
+ static Error createUnknownObjectError(const CASID &ID);
+
+ /// Create ObjectProxy from CASID. If the object doesn't exist, get an error.
+ Expected<ObjectProxy> getProxy(const CASID &ID);
+ /// Create ObjectProxy from ObjectRef. If the object can't be loaded, get an
+ /// error.
+ Expected<ObjectProxy> getProxy(ObjectRef Ref);
+
+ /// \returns \c std::nullopt if the object is missing from the CAS.
+ Expected<std::optional<ObjectProxy>> getProxyIfExists(ObjectRef Ref);
+
+ /// Read the data from \p Data into \p OS.
+ uint64_t readData(ObjectHandle Node, raw_ostream &OS, uint64_t Offset = 0,
+ uint64_t MaxBytes = -1ULL) const {
+ ArrayRef<char> Data = getData(Node);
+ assert(Offset < Data.size() && "Expected valid offset");
+ Data = Data.drop_front(Offset).take_front(MaxBytes);
+ OS << toStringRef(Data);
+ return Data.size();
+ }
+
+ /// Validate the whole node tree.
+ Error validateTree(ObjectRef Ref);
+
+ /// Print the ObjectStore internals for debugging purpose.
+ virtual void print(raw_ostream &) const {}
+ void dump() const;
+
+ /// Get CASContext
+ const CASContext &getContext() const { return Context; }
+
+ virtual ~ObjectStore() = default;
+
+protected:
+ ObjectStore(const CASContext &Context) : Context(Context) {}
+
+private:
+ const CASContext &Context;
+};
+
+/// Reference to an abstract hierarchical node, with data and references.
+/// Reference is passed by value and is expected to be valid as long as the \a
+/// ObjectStore is.
+class ObjectProxy {
+public:
+ const ObjectStore &getCAS() const { return *CAS; }
+ ObjectStore &getCAS() { return *CAS; }
+ CASID getID() const { return CAS->getID(Ref); }
+ ObjectRef getRef() const { return Ref; }
+ size_t getNumReferences() const { return CAS->getNumRefs(H); }
+ ObjectRef getReference(size_t I) const { return CAS->readRef(H, I); }
+
+ operator CASID() const { return getID(); }
+ CASID getReferenceID(size_t I) const {
+ std::optional<CASID> ID = getCAS().getID(getReference(I));
+ assert(ID && "Expected reference to be first-class object");
+ return *ID;
+ }
+
+ /// Visit each reference in order, returning an error from \p Callback to
+ /// stop early.
+ Error forEachReference(function_ref<Error(ObjectRef)> Callback) const {
+ return CAS->forEachRef(H, Callback);
+ }
+
+ std::unique_ptr<MemoryBuffer>
+ getMemoryBuffer(StringRef Name = "",
+ bool RequiresNullTerminator = true) const;
+
+ /// Get the content of the node. Valid as long as the CAS is valid.
+ StringRef getData() const { return CAS->getDataString(H); }
+
+ friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref) {
+ return Proxy.getRef() == Ref;
+ }
+ friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy) {
+ return Proxy.getRef() == Ref;
+ }
+ friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref) {
+ return !(Proxy.getRef() == Ref);
+ }
+ friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy) {
+ return !(Proxy.getRef() == Ref);
+ }
+
+public:
+ ObjectProxy() = delete;
+
+ static ObjectProxy load(ObjectStore &CAS, ObjectRef Ref, ObjectHandle Node) {
+ return ObjectProxy(CAS, Ref, Node);
+ }
+
+private:
+ ObjectProxy(ObjectStore &CAS, ObjectRef Ref, ObjectHandle H)
+ : CAS(&CAS), Ref(Ref), H(H) {}
+
+ ObjectStore *CAS;
+ ObjectRef Ref;
+ ObjectHandle H;
+};
+
+std::unique_ptr<ObjectStore> createInMemoryCAS();
+
+} // namespace cas
+} // namespace llvm
+
+#endif // LLVM_CAS_OBJECTSTORE_H
diff --git a/llvm/include/module.modulemap b/llvm/include/module.modulemap
index b00da6d7cd28c7..d44d395fa8ef46 100644
--- a/llvm/include/module.modulemap
+++ b/llvm/include/module.modulemap
@@ -105,6 +105,12 @@ module LLVM_BinaryFormat {
textual header "llvm/BinaryFormat/MsgPack.def"
}
+module LLVM_CAS {
+ requires cplusplus
+ umbrella "llvm/CAS"
+ module * { export * }
+}
+
module LLVM_Config {
requires cplusplus
umbrella "llvm/Config"
diff --git a/llvm/lib/CAS/BuiltinCAS.cpp b/llvm/lib/CAS/BuiltinCAS.cpp
new file mode 100644
index 00000000000000..73646ad2c3528e
--- /dev/null
+++ b/llvm/lib/CAS/BuiltinCAS.cpp
@@ -0,0 +1,94 @@
+//===- BuiltinCAS.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BuiltinCAS.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/CAS/BuiltinObjectHasher.h"
+#include "llvm/Support/Process.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+using namespace llvm::cas::builtin;
+
+static StringRef getCASIDPrefix() { return "llvmcas://"; }
+void BuiltinCASContext::anchor() {}
+
+Expected<HashType> BuiltinCASContext::parseID(StringRef Reference) {
+ if (!Reference.consume_front(getCASIDPrefix()))
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "invalid cas-id '" + Reference + "'");
+
+ // FIXME: Allow shortened references?
+ if (Reference.size() != 2 * sizeof(HashType))
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "wrong size for cas-id hash '" + Reference + "'");
+
+ std::string Binary;
+ if (!tryGetFromHex(Reference, Binary))
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "invalid hash in cas-id '" + Reference + "'");
+
+ assert(Binary.size() == sizeof(HashType));
+ HashType Digest;
+ llvm::copy(Binary, Digest.data());
+ return Digest;
+}
+
+Expected<CASID> BuiltinCAS::parseID(StringRef Reference) {
+ Expected<HashType> Digest = BuiltinCASContext::parseID(Reference);
+ if (!Digest)
+ return Digest.takeError();
+
+ return CASID::create(&getContext(), toStringRef(*Digest));
+}
+
+void BuiltinCASContext::printID(ArrayRef<uint8_t> Digest, raw_ostream &OS) {
+ SmallString<64> Hash;
+ toHex(Digest, /*LowerCase=*/true, Hash);
+ OS << getCASIDPrefix() << Hash;
+}
+
+void BuiltinCASContext::printIDImpl(raw_ostream &OS, const CASID &ID) const {
+ BuiltinCASContext::printID(ID.getHash(), OS);
+}
+
+const BuiltinCASContext &BuiltinCASContext::getDefaultContext() {
+ static BuiltinCASContext DefaultContext;
+ return DefaultContext;
+}
+
+Expected<ObjectRef> BuiltinCAS::store(ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) {
+ return storeImpl(BuiltinObjectHasher<HasherT>::hashObject(*this, Refs, Data),
+ Refs, Data);
+}
+
+Error BuiltinCAS::validate(const CASID &ID) {
+ auto Ref = getReference(ID);
+ if (!Ref)
+ return createUnknownObjectError(ID);
+
+ auto Handle = load(*Ref);
+ if (!Handle)
+ return Handle.takeError();
+
+ auto Proxy = ObjectProxy::load(*this, *Ref, *Handle);
+ SmallVector<ObjectRef> Refs;
+ if (auto E = Proxy.forEachReference([&](ObjectRef Ref) -> Error {
+ Refs.push_back(Ref);
+ return Error::success();
+ }))
+ return E;
+
+ ArrayRef<char> Data(Proxy.getData().data(), Proxy.getData().size());
+ auto Hash = BuiltinObjectHasher<HasherT>::hashObject(*this, Refs, Data);
+ if (!ID.getHash().equals(Hash))
+ return createCorruptObjectError(ID);
+
+ return Error::success();
+}
diff --git a/llvm/lib/CAS/BuiltinCAS.h b/llvm/lib/CAS/BuiltinCAS.h
new file mode 100644
index 00000000000000..1a4f640e4e2da8
--- /dev/null
+++ b/llvm/lib/CAS/BuiltinCAS.h
@@ -0,0 +1,74 @@
+//===- BuiltinCAS.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_CAS_BUILTINCAS_H
+#define LLVM_LIB_CAS_BUILTINCAS_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CAS/BuiltinCASContext.h"
+#include "llvm/CAS/ObjectStore.h"
+
+namespace llvm::cas {
+class ActionCache;
+namespace builtin {
+
+class BuiltinCAS : public ObjectStore {
+public:
+ BuiltinCAS() : ObjectStore(BuiltinCASContext::getDefaultContext()) {}
+
+ Expected<CASID> parseID(StringRef Reference) final;
+
+ Expected<ObjectRef> store(ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) final;
+ virtual Expected<ObjectRef> storeImpl(ArrayRef<uint8_t> ComputedHash,
+ ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) = 0;
+
+ virtual Expected<ObjectRef>
+ storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+ sys::fs::mapped_file_region Map) {
+ return storeImpl(ComputedHash, std::nullopt,
+ ArrayRef(Map.data(), Map.size()));
+ }
+
+ /// Both builtin CAS implementations provide lifetime for free, so this can
+ /// be const, and readData() and getDataSize() can be implemented on top of
+ /// it.
+ virtual ArrayRef<char> getDataConst(ObjectHandle Node) const = 0;
+
+ ArrayRef<char> getData(ObjectHandle Node,
+ bool RequiresNullTerminator) const final {
+ // BuiltinCAS Objects are always null terminated.
+ return getDataConst(Node);
+ }
+ uint64_t getDataSize(ObjectHandle Node) const final {
+ return getDataConst(Node).size();
+ }
+
+ Error createUnknownObjectError(const CASID &ID) const {
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "unknown object '" + ID.toString() + "'");
+ }
+
+ Error createCorruptObjectError(const CASID &ID) const {
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "corrupt object '" + ID.toString() + "'");
+ }
+
+ Error createCorruptStorageError() const {
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "corrupt storage");
+ }
+
+ Error validate(const CASID &ID) final;
+};
+
+} // end namespace builtin
+} // end namespace llvm::cas
+
+#endif // LLVM_LIB_CAS_BUILTINCAS_H
diff --git a/llvm/lib/CAS/CMakeLists.txt b/llvm/lib/CAS/CMakeLists.txt
new file mode 100644
index 00000000000000..a486ab66ae4266
--- /dev/null
+++ b/llvm/lib/CAS/CMakeLists.txt
@@ -0,0 +1,8 @@
+add_llvm_component_library(LLVMCAS
+ BuiltinCAS.cpp
+ InMemoryCAS.cpp
+ ObjectStore.cpp
+
+ ADDITIONAL_HEADER_DIRS
+ ${LLVM_MAIN_INCLUDE_DIR}/llvm/CAS
+)
diff --git a/llvm/lib/CAS/InMemoryCAS.cpp b/llvm/lib/CAS/InMemoryCAS.cpp
new file mode 100644
index 00000000000000..abdd7ed3ef8051
--- /dev/null
+++ b/llvm/lib/CAS/InMemoryCAS.cpp
@@ -0,0 +1,320 @@
+//===- InMemoryCAS.cpp ------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BuiltinCAS.h"
+#include "llvm/ADT/LazyAtomicPointer.h"
+#include "llvm/ADT/PointerIntPair.h"
+#include "llvm/ADT/TrieRawHashMap.h"
+#include "llvm/Support/Allocator.h"
+#include "llvm/Support/Casting.h"
+#include "llvm/Support/ThreadSafeAllocator.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+using namespace llvm::cas::builtin;
+
+namespace {
+
+class InMemoryObject;
+
+/// Index of referenced IDs (map: Hash -> InMemoryObject*). Uses
+/// LazyAtomicPointer to coordinate creation of objects.
+using InMemoryIndexT =
+ ThreadSafeTrieRawHashMap<LazyAtomicPointer<const InMemoryObject>,
+ sizeof(HashType)>;
+
+/// Values in \a InMemoryIndexT. \a InMemoryObject's point at this to access
+/// their hash.
+using InMemoryIndexValueT = InMemoryIndexT::value_type;
+
+class InMemoryObject {
+public:
+ enum class Kind {
+ /// Node with refs and data.
+ RefNode,
+
+ /// Node with refs and data co-allocated.
+ InlineNode,
+
+ Max = InlineNode,
+ };
+
+ Kind getKind() const { return IndexAndKind.getInt(); }
+ const InMemoryIndexValueT &getIndex() const {
+ assert(IndexAndKind.getPointer());
+ return *IndexAndKind.getPointer();
+ }
+
+ ArrayRef<uint8_t> getHash() const { return getIndex().Hash; }
+
+ InMemoryObject() = delete;
+ InMemoryObject(InMemoryObject &&) = delete;
+ InMemoryObject(const InMemoryObject &) = delete;
+
+protected:
+ InMemoryObject(Kind K, const InMemoryIndexValueT &I) : IndexAndKind(&I, K) {}
+
+private:
+ enum Counts : int {
+ NumKindBits = 2,
+ };
+ PointerIntPair<const InMemoryIndexValueT *, NumKindBits, Kind> IndexAndKind;
+ static_assert((1U << NumKindBits) <= alignof(InMemoryIndexValueT),
+ "Kind will clobber pointer");
+ static_assert(((int)Kind::Max >> NumKindBits) == 0, "Kind will be truncated");
+
+public:
+ inline ArrayRef<char> getData() const;
+
+ inline ArrayRef<const InMemoryObject *> getRefs() const;
+};
+
+class InMemoryRefObject : public InMemoryObject {
+public:
+ static constexpr Kind KindValue = Kind::RefNode;
+ static bool classof(const InMemoryObject *O) {
+ return O->getKind() == KindValue;
+ }
+
+ ArrayRef<const InMemoryObject *> getRefsImpl() const { return Refs; }
+ ArrayRef<const InMemoryObject *> getRefs() const { return Refs; }
+ ArrayRef<char> getDataImpl() const { return Data; }
+ ArrayRef<char> getData() const { return Data; }
+
+ static InMemoryRefObject &create(function_ref<void *(size_t Size)> Allocate,
+ const InMemoryIndexValueT &I,
+ ArrayRef<const InMemoryObject *> Refs,
+ ArrayRef<char> Data) {
+ void *Mem = Allocate(sizeof(InMemoryRefObject));
+ return *new (Mem) InMemoryRefObject(I, Refs, Data);
+ }
+
+private:
+ InMemoryRefObject(const InMemoryIndexValueT &I,
+ ArrayRef<const InMemoryObject *> Refs, ArrayRef<char> Data)
+ : InMemoryObject(KindValue, I), Refs(Refs), Data(Data) {
+ assert(isAddrAligned(Align(8), this) && "Expected 8-byte alignment");
+ assert(isAddrAligned(Align(8), Data.data()) && "Expected 8-byte alignment");
+ assert(*Data.end() == 0 && "Expected null-termination");
+ }
+
+ ArrayRef<const InMemoryObject *> Refs;
+ ArrayRef<char> Data;
+};
+
+class InMemoryInlineObject : public InMemoryObject {
+public:
+ static constexpr Kind KindValue = Kind::InlineNode;
+ static bool classof(const InMemoryObject *O) {
+ return O->getKind() == KindValue;
+ }
+
+ ArrayRef<const InMemoryObject *> getRefs() const { return getRefsImpl(); }
+ ArrayRef<const InMemoryObject *> getRefsImpl() const {
+ return ArrayRef(reinterpret_cast<const InMemoryObject *const *>(this + 1),
+ NumRefs);
+ }
+
+ ArrayRef<char> getData() const { return getDataImpl(); }
+ ArrayRef<char> getDataImpl() const {
+ ArrayRef<const InMemoryObject *> Refs = getRefs();
+ return ArrayRef(reinterpret_cast<const char *>(Refs.data() + Refs.size()),
+ DataSize);
+ }
+
+ static InMemoryInlineObject &
+ create(function_ref<void *(size_t Size)> Allocate,
+ const InMemoryIndexValueT &I, ArrayRef<const InMemoryObject *> Refs,
+ ArrayRef<char> Data) {
+ void *Mem = Allocate(sizeof(InMemoryInlineObject) +
+ sizeof(uintptr_t) * Refs.size() + Data.size() + 1);
+ return *new (Mem) InMemoryInlineObject(I, Refs, Data);
+ }
+
+private:
+ InMemoryInlineObject(const InMemoryIndexValueT &I,
+ ArrayRef<const InMemoryObject *> Refs,
+ ArrayRef<char> Data)
+ : InMemoryObject(KindValue, I), NumRefs(Refs.size()),
+ DataSize(Data.size()) {
+ auto *BeginRefs = reinterpret_cast<const InMemoryObject **>(this + 1);
+ llvm::copy(Refs, BeginRefs);
+ auto *BeginData = reinterpret_cast<char *>(BeginRefs + NumRefs);
+ llvm::copy(Data, BeginData);
+ BeginData[Data.size()] = 0;
+ }
+ uint32_t NumRefs;
+ uint32_t DataSize;
+};
+
+/// In-memory CAS database and action cache (the latter should be separated).
+class InMemoryCAS : public BuiltinCAS {
+public:
+ Expected<ObjectRef> storeImpl(ArrayRef<uint8_t> ComputedHash,
+ ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) final;
+
+ Expected<ObjectRef>
+ storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+ sys::fs::mapped_file_region Map) override;
+
+ CASID getID(const InMemoryIndexValueT &I) const {
+ StringRef Hash = toStringRef(I.Hash);
+ return CASID::create(&getContext(), Hash);
+ }
+ CASID getID(const InMemoryObject &O) const { return getID(O.getIndex()); }
+
+ ObjectHandle getObjectHandle(const InMemoryObject &Node) const {
+ assert(!(reinterpret_cast<uintptr_t>(&Node) & 0x1ULL));
+ return makeObjectHandle(reinterpret_cast<uintptr_t>(&Node));
+ }
+
+ Expected<std::optional<ObjectHandle>> loadIfExists(ObjectRef Ref) override {
+ return getObjectHandle(asInMemoryObject(Ref));
+ }
+
+ InMemoryIndexValueT &indexHash(ArrayRef<uint8_t> Hash) {
+ return *Index.insertLazy(
+ Hash, [](auto ValueConstructor) { ValueConstructor.emplace(nullptr); });
+ }
+
+ /// TODO: Consider callers to actually do an insert and to return a handle to
+ /// the slot in the trie.
+ const InMemoryObject *getInMemoryObject(CASID ID) const {
+ assert(ID.getContext().getHashSchemaIdentifier() ==
+ getContext().getHashSchemaIdentifier() &&
+ "Expected ID from same hash schema");
+ if (InMemoryIndexT::const_pointer P = Index.find(ID.getHash()))
+ return P->Data;
+ return nullptr;
+ }
+
+ const InMemoryObject &getInMemoryObject(ObjectHandle OH) const {
+ return *reinterpret_cast<const InMemoryObject *>(
+ (uintptr_t)OH.getInternalRef(*this));
+ }
+
+ const InMemoryObject &asInMemoryObject(ReferenceBase Ref) const {
+ uintptr_t P = Ref.getInternalRef(*this);
+ return *reinterpret_cast<const InMemoryObject *>(P);
+ }
+ ObjectRef toReference(const InMemoryObject &O) const {
+ return makeObjectRef(reinterpret_cast<uintptr_t>(&O));
+ }
+
+ CASID getID(ObjectRef Ref) const final { return getIDImpl(Ref); }
+ CASID getIDImpl(ReferenceBase Ref) const {
+ return getID(asInMemoryObject(Ref));
+ }
+
+ std::optional<ObjectRef> getReference(const CASID &ID) const final {
+ if (const InMemoryObject *Object = getInMemoryObject(ID))
+ return toReference(*Object);
+ return std::nullopt;
+ }
+
+ Expected<bool> isMaterialized(ObjectRef Ref) const final { return true; }
+
+ ArrayRef<char> getDataConst(ObjectHandle Node) const final {
+ return cast<InMemoryObject>(asInMemoryObject(Node)).getData();
+ }
+
+ InMemoryCAS() = default;
+
+private:
+ size_t getNumRefs(ObjectHandle Node) const final {
+ return getInMemoryObject(Node).getRefs().size();
+ }
+ ObjectRef readRef(ObjectHandle Node, size_t I) const final {
+ return toReference(*getInMemoryObject(Node).getRefs()[I]);
+ }
+ Error forEachRef(ObjectHandle Node,
+ function_ref<Error(ObjectRef)> Callback) const final;
+
+ /// Index of referenced IDs (map: Hash -> InMemoryObject*). Mapped to nullptr
+ /// as a convenient way to store hashes.
+ ///
+ /// - Insert nullptr on lookups.
+ /// - InMemoryObject points back to here.
+ InMemoryIndexT Index;
+
+ ThreadSafeAllocator<BumpPtrAllocator> Objects;
+ ThreadSafeAllocator<SpecificBumpPtrAllocator<sys::fs::mapped_file_region>>
+ MemoryMaps;
+};
+
+} // end anonymous namespace
+
+ArrayRef<char> InMemoryObject::getData() const {
+ if (auto *Derived = dyn_cast<InMemoryRefObject>(this))
+ return Derived->getDataImpl();
+ return cast<InMemoryInlineObject>(this)->getDataImpl();
+}
+
+ArrayRef<const InMemoryObject *> InMemoryObject::getRefs() const {
+ if (auto *Derived = dyn_cast<InMemoryRefObject>(this))
+ return Derived->getRefsImpl();
+ return cast<InMemoryInlineObject>(this)->getRefsImpl();
+}
+
+Expected<ObjectRef>
+InMemoryCAS::storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+ sys::fs::mapped_file_region Map) {
+ // Look up the hash in the index, initializing to nullptr if it's new.
+ ArrayRef<char> Data(Map.data(), Map.size());
+ auto &I = indexHash(ComputedHash);
+
+ // Load or generate.
+ auto Allocator = [&](size_t Size) -> void * {
+ return Objects.Allocate(Size, alignof(InMemoryObject));
+ };
+ auto Generator = [&]() -> const InMemoryObject * {
+ return &InMemoryRefObject::create(Allocator, I, std::nullopt, Data);
+ };
+ const InMemoryObject &Node =
+ cast<InMemoryObject>(I.Data.loadOrGenerate(Generator));
+
+ // Save Map if the winning node uses it.
+ if (auto *RefNode = dyn_cast<InMemoryRefObject>(&Node))
+ if (RefNode->getData().data() == Map.data())
+ new (MemoryMaps.Allocate(1)) sys::fs::mapped_file_region(std::move(Map));
+
+ return toReference(Node);
+}
+
+Expected<ObjectRef> InMemoryCAS::storeImpl(ArrayRef<uint8_t> ComputedHash,
+ ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) {
+ // Look up the hash in the index, initializing to nullptr if it's new.
+ auto &I = indexHash(ComputedHash);
+
+ // Create the node.
+ SmallVector<const InMemoryObject *> InternalRefs;
+ for (ObjectRef Ref : Refs)
+ InternalRefs.push_back(&asInMemoryObject(Ref));
+ auto Allocator = [&](size_t Size) -> void * {
+ return Objects.Allocate(Size, alignof(InMemoryObject));
+ };
+ auto Generator = [&]() -> const InMemoryObject * {
+ return &InMemoryInlineObject::create(Allocator, I, InternalRefs, Data);
+ };
+ return toReference(cast<InMemoryObject>(I.Data.loadOrGenerate(Generator)));
+}
+
+Error InMemoryCAS::forEachRef(ObjectHandle Handle,
+ function_ref<Error(ObjectRef)> Callback) const {
+ auto &Node = getInMemoryObject(Handle);
+ for (const InMemoryObject *Ref : Node.getRefs())
+ if (Error E = Callback(toReference(*Ref)))
+ return E;
+ return Error::success();
+}
+
+std::unique_ptr<ObjectStore> cas::createInMemoryCAS() {
+ return std::make_unique<InMemoryCAS>();
+}
diff --git a/llvm/lib/CAS/ObjectStore.cpp b/llvm/lib/CAS/ObjectStore.cpp
new file mode 100644
index 00000000000000..a938c4e215382e
--- /dev/null
+++ b/llvm/lib/CAS/ObjectStore.cpp
@@ -0,0 +1,168 @@
+//===- ObjectStore.cpp ------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+void CASContext::anchor() {}
+void ObjectStore::anchor() {}
+
+LLVM_DUMP_METHOD void CASID::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectStore::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectRef::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectHandle::dump() const { print(dbgs()); }
+
+std::string CASID::toString() const {
+ std::string S;
+ raw_string_ostream(S) << *this;
+ return S;
+}
+
+static void printReferenceBase(raw_ostream &OS, StringRef Kind,
+ uint64_t InternalRef, std::optional<CASID> ID) {
+ OS << Kind << "=" << InternalRef;
+ if (ID)
+ OS << "[" << *ID << "]";
+}
+
+void ReferenceBase::print(raw_ostream &OS, const ObjectHandle &This) const {
+ assert(this == &This);
+ printReferenceBase(OS, "object-handle", InternalRef, std::nullopt);
+}
+
+void ReferenceBase::print(raw_ostream &OS, const ObjectRef &This) const {
+ assert(this == &This);
+
+ std::optional<CASID> ID;
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+ if (CAS)
+ ID = CAS->getID(This);
+#endif
+ printReferenceBase(OS, "object-ref", InternalRef, ID);
+}
+
+Expected<ObjectHandle> ObjectStore::load(ObjectRef Ref) {
+ std::optional<ObjectHandle> Handle;
+ if (Error E = loadIfExists(Ref).moveInto(Handle))
+ return std::move(E);
+ if (!Handle)
+ return createStringError(errc::invalid_argument,
+ "missing object '" + getID(Ref).toString() + "'");
+ return *Handle;
+}
+
+std::unique_ptr<MemoryBuffer>
+ObjectStore::getMemoryBuffer(ObjectHandle Node, StringRef Name,
+ bool RequiresNullTerminator) {
+ return MemoryBuffer::getMemBuffer(
+ toStringRef(getData(Node, RequiresNullTerminator)), Name,
+ RequiresNullTerminator);
+}
+
+void ObjectStore::readRefs(ObjectHandle Node,
+ SmallVectorImpl<ObjectRef> &Refs) const {
+ consumeError(forEachRef(Node, [&Refs](ObjectRef Ref) -> Error {
+ Refs.push_back(Ref);
+ return Error::success();
+ }));
+}
+
+Expected<ObjectProxy> ObjectStore::getProxy(const CASID &ID) {
+ std::optional<ObjectRef> Ref = getReference(ID);
+ if (!Ref)
+ return createUnknownObjectError(ID);
+
+ return getProxy(*Ref);
+}
+
+Expected<ObjectProxy> ObjectStore::getProxy(ObjectRef Ref) {
+ std::optional<ObjectHandle> H;
+ if (Error E = load(Ref).moveInto(H))
+ return std::move(E);
+
+ return ObjectProxy::load(*this, Ref, *H);
+}
+
+Expected<std::optional<ObjectProxy>>
+ObjectStore::getProxyIfExists(ObjectRef Ref) {
+ std::optional<ObjectHandle> H;
+ if (Error E = loadIfExists(Ref).moveInto(H))
+ return std::move(E);
+ if (!H)
+ return std::nullopt;
+ return ObjectProxy::load(*this, Ref, *H);
+}
+
+Error ObjectStore::createUnknownObjectError(const CASID &ID) {
+ return createStringError(std::make_error_code(std::errc::invalid_argument),
+ "unknown object '" + ID.toString() + "'");
+}
+
+Expected<ObjectProxy> ObjectStore::createProxy(ArrayRef<ObjectRef> Refs,
+ StringRef Data) {
+ Expected<ObjectRef> Ref = store(Refs, arrayRefFromStringRef<char>(Data));
+ if (!Ref)
+ return Ref.takeError();
+ return getProxy(*Ref);
+}
+
+Expected<ObjectRef>
+ObjectStore::storeFromOpenFileImpl(sys::fs::file_t FD,
+ std::optional<sys::fs::file_status> Status) {
+ // Copy the file into an immutable memory buffer and call \c store on that.
+ // Using \c mmap would be unsafe because there's a race window between when we
+ // get the digest hash for the \c mmap contents and when we store the data; if
+ // the file changes in-between we will create an invalid object.
+
+ // FIXME: For the on-disk CAS implementation use cloning to store it as a
+ // standalone file if the file-system supports it and the file is large.
+
+ constexpr size_t ChunkSize = 4 * 4096;
+ SmallString<0> Data;
+ Data.reserve(ChunkSize * 2);
+ if (Error E = sys::fs::readNativeFileToEOF(FD, Data, ChunkSize))
+ return std::move(E);
+ return store(std::nullopt, ArrayRef(Data.data(), Data.size()));
+}
+
+Error ObjectStore::validateTree(ObjectRef Root) {
+ SmallDenseSet<ObjectRef> ValidatedRefs;
+ SmallVector<ObjectRef, 16> RefsToValidate;
+ RefsToValidate.push_back(Root);
+
+ while (!RefsToValidate.empty()) {
+ ObjectRef Ref = RefsToValidate.pop_back_val();
+ auto [I, Inserted] = ValidatedRefs.insert(Ref);
+ if (!Inserted)
+ continue; // already validated.
+ if (Error E = validate(getID(Ref)))
+ return E;
+ Expected<ObjectHandle> Obj = load(Ref);
+ if (!Obj)
+ return Obj.takeError();
+ if (Error E = forEachRef(*Obj, [&RefsToValidate](ObjectRef R) -> Error {
+ RefsToValidate.push_back(R);
+ return Error::success();
+ }))
+ return E;
+ }
+ return Error::success();
+}
+
+std::unique_ptr<MemoryBuffer>
+ObjectProxy::getMemoryBuffer(StringRef Name,
+ bool RequiresNullTerminator) const {
+ return CAS->getMemoryBuffer(H, Name, RequiresNullTerminator);
+}
diff --git a/llvm/lib/CMakeLists.txt b/llvm/lib/CMakeLists.txt
index 503c77cb13bd07..b06f4ffd83ff5a 100644
--- a/llvm/lib/CMakeLists.txt
+++ b/llvm/lib/CMakeLists.txt
@@ -9,6 +9,7 @@ add_subdirectory(FileCheck)
add_subdirectory(InterfaceStub)
add_subdirectory(IRPrinter)
add_subdirectory(IRReader)
+add_subdirectory(CAS)
add_subdirectory(CGData)
add_subdirectory(CodeGen)
add_subdirectory(CodeGenTypes)
diff --git a/llvm/unittests/CAS/CASTestConfig.cpp b/llvm/unittests/CAS/CASTestConfig.cpp
new file mode 100644
index 00000000000000..bb06ee5573134f
--- /dev/null
+++ b/llvm/unittests/CAS/CASTestConfig.cpp
@@ -0,0 +1,22 @@
+//===- CASTestConfig.cpp --------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "CASTestConfig.h"
+#include "llvm/CAS/ObjectStore.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+CASTestingEnv createInMemory(int I) {
+ std::unique_ptr<ObjectStore> CAS = createInMemoryCAS();
+ return CASTestingEnv{std::move(CAS)};
+}
+
+INSTANTIATE_TEST_SUITE_P(InMemoryCAS, CASTest,
+ ::testing::Values(createInMemory));
diff --git a/llvm/unittests/CAS/CASTestConfig.h b/llvm/unittests/CAS/CASTestConfig.h
new file mode 100644
index 00000000000000..d9f9e52033c2da
--- /dev/null
+++ b/llvm/unittests/CAS/CASTestConfig.h
@@ -0,0 +1,32 @@
+//===- CASTestConfig.h ----------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "gtest/gtest.h"
+
+#ifndef LLVM_UNITTESTS_CASTESTCONFIG_H
+#define LLVM_UNITTESTS_CASTESTCONFIG_H
+
+struct CASTestingEnv {
+ std::unique_ptr<llvm::cas::ObjectStore> CAS;
+};
+
+class CASTest
+ : public testing::TestWithParam<std::function<CASTestingEnv(int)>> {
+protected:
+ std::optional<int> NextCASIndex;
+
+ std::unique_ptr<llvm::cas::ObjectStore> createObjectStore() {
+ auto TD = GetParam()(++(*NextCASIndex));
+ return std::move(TD.CAS);
+ }
+ void SetUp() { NextCASIndex = 0; }
+ void TearDown() { NextCASIndex = std::nullopt; }
+};
+
+#endif
diff --git a/llvm/unittests/CAS/CMakeLists.txt b/llvm/unittests/CAS/CMakeLists.txt
new file mode 100644
index 00000000000000..39a2100c4909ee
--- /dev/null
+++ b/llvm/unittests/CAS/CMakeLists.txt
@@ -0,0 +1,12 @@
+set(LLVM_LINK_COMPONENTS
+ Support
+ CAS
+ TestingSupport
+ )
+
+add_llvm_unittest(CASTests
+ CASTestConfig.cpp
+ ObjectStoreTest.cpp
+ )
+
+target_link_libraries(CASTests PRIVATE LLVMTestingSupport)
diff --git a/llvm/unittests/CAS/ObjectStoreTest.cpp b/llvm/unittests/CAS/ObjectStoreTest.cpp
new file mode 100644
index 00000000000000..0d94731330b1d3
--- /dev/null
+++ b/llvm/unittests/CAS/ObjectStoreTest.cpp
@@ -0,0 +1,360 @@
+//===- ObjectStoreTest.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/Support/Process.h"
+#include "llvm/Support/ThreadPool.h"
+#include "llvm/Testing/Support/Error.h"
+#include "gtest/gtest.h"
+
+#include "CASTestConfig.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+TEST_P(CASTest, PrintIDs) {
+ std::unique_ptr<ObjectStore> CAS = createObjectStore();
+
+ std::optional<CASID> ID1, ID2;
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, "1").moveInto(ID1),
+ Succeeded());
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, "2").moveInto(ID2),
+ Succeeded());
+ EXPECT_NE(ID1, ID2);
+ std::string PrintedID1 = ID1->toString();
+ std::string PrintedID2 = ID2->toString();
+ EXPECT_NE(PrintedID1, PrintedID2);
+
+ std::optional<CASID> ParsedID1, ParsedID2;
+ ASSERT_THAT_ERROR(CAS->parseID(PrintedID1).moveInto(ParsedID1), Succeeded());
+ ASSERT_THAT_ERROR(CAS->parseID(PrintedID2).moveInto(ParsedID2), Succeeded());
+ EXPECT_EQ(ID1, ParsedID1);
+ EXPECT_EQ(ID2, ParsedID2);
+}
+
+TEST_P(CASTest, Blobs) {
+ std::unique_ptr<ObjectStore> CAS1 = createObjectStore();
+ StringRef ContentStrings[] = {
+ "word",
+ "some longer text std::string's local memory",
+ R"(multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text)",
+ };
+
+ SmallVector<CASID> IDs;
+ for (StringRef Content : ContentStrings) {
+ // Use StringRef::str() to create a temporary std::string. This could cause
+ // problems if the CAS is storing references to the input string instead of
+ // copying it.
+ std::optional<ObjectProxy> Blob;
+ ASSERT_THAT_ERROR(CAS1->createProxy(std::nullopt, Content).moveInto(Blob),
+ Succeeded());
+ IDs.push_back(Blob->getID());
+
+ // Check basic printing of IDs.
+ EXPECT_EQ(IDs.back().toString(), IDs.back().toString());
+ if (IDs.size() > 2)
+ EXPECT_NE(IDs.front().toString(), IDs.back().toString());
+ }
+
+ // Check that the blobs give the same IDs later.
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ std::optional<ObjectProxy> Blob;
+ ASSERT_THAT_ERROR(
+ CAS1->createProxy(std::nullopt, ContentStrings[I]).moveInto(Blob),
+ Succeeded());
+ EXPECT_EQ(IDs[I], Blob->getID());
+ }
+
+ // Run validation on all CASIDs.
+ for (int I = 0, E = IDs.size(); I != E; ++I)
+ ASSERT_THAT_ERROR(CAS1->validate(IDs[I]), Succeeded());
+
+ // Check that the blobs can be retrieved multiple times.
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ for (int J = 0, JE = 3; J != JE; ++J) {
+ std::optional<ObjectProxy> Buffer;
+ ASSERT_THAT_ERROR(CAS1->getProxy(IDs[I]).moveInto(Buffer), Succeeded());
+ EXPECT_EQ(ContentStrings[I], Buffer->getData());
+ }
+ }
+
+ // Confirm these blobs don't exist in a fresh CAS instance.
+ std::unique_ptr<ObjectStore> CAS2 = createObjectStore();
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ std::optional<ObjectProxy> Proxy;
+ EXPECT_THAT_ERROR(CAS2->getProxy(IDs[I]).moveInto(Proxy), Failed());
+ }
+
+ // Insert into the second CAS and confirm the IDs are stable. Getting them
+ // should work now.
+ for (int I = IDs.size(), E = 0; I != E; --I) {
+ auto &ID = IDs[I - 1];
+ auto &Content = ContentStrings[I - 1];
+ std::optional<ObjectProxy> Blob;
+ ASSERT_THAT_ERROR(CAS2->createProxy(std::nullopt, Content).moveInto(Blob),
+ Succeeded());
+ EXPECT_EQ(ID, Blob->getID());
+
+ std::optional<ObjectProxy> Buffer;
+ ASSERT_THAT_ERROR(CAS2->getProxy(ID).moveInto(Buffer), Succeeded());
+ EXPECT_EQ(Content, Buffer->getData());
+ }
+}
+
+TEST_P(CASTest, BlobsBig) {
+ // A little bit of validation that bigger blobs are okay. Climb up to 1MB.
+ std::unique_ptr<ObjectStore> CAS = createObjectStore();
+ SmallString<256> String1 = StringRef("a few words");
+ SmallString<256> String2 = StringRef("others");
+ while (String1.size() < 1024U * 1024U) {
+ std::optional<CASID> ID1;
+ std::optional<CASID> ID2;
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String1).moveInto(ID1),
+ Succeeded());
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String1).moveInto(ID2),
+ Succeeded());
+ ASSERT_THAT_ERROR(CAS->validate(*ID1), Succeeded());
+ ASSERT_THAT_ERROR(CAS->validate(*ID2), Succeeded());
+ ASSERT_EQ(ID1, ID2);
+
+ String1.append(String2);
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String2).moveInto(ID1),
+ Succeeded());
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String2).moveInto(ID2),
+ Succeeded());
+ ASSERT_THAT_ERROR(CAS->validate(*ID1), Succeeded());
+ ASSERT_THAT_ERROR(CAS->validate(*ID2), Succeeded());
+ ASSERT_EQ(ID1, ID2);
+ String2.append(String1);
+ }
+
+ // Specifically check near 1MB for objects large enough they're likely to be
+ // stored externally in an on-disk CAS and will be near a page boundary.
+ SmallString<0> Storage;
+ const size_t InterestingSize = 1024U * 1024ULL;
+ const size_t SizeE = InterestingSize + 2;
+ if (Storage.size() < SizeE)
+ Storage.resize(SizeE, '\01');
+ for (size_t Size = InterestingSize - 2; Size != SizeE; ++Size) {
+ StringRef Data(Storage.data(), Size);
+ std::optional<ObjectProxy> Blob;
+ ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, Data).moveInto(Blob),
+ Succeeded());
+ ASSERT_EQ(Data, Blob->getData());
+ ASSERT_EQ(0, Blob->getData().end()[0]);
+ }
+}
+
+TEST_P(CASTest, LeafNodes) {
+ std::unique_ptr<ObjectStore> CAS1 = createObjectStore();
+ StringRef ContentStrings[] = {
+ "word",
+ "some longer text std::string's local memory",
+ R"(multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text)",
+ };
+
+ SmallVector<ObjectRef> Nodes;
+ SmallVector<CASID> IDs;
+ for (StringRef Content : ContentStrings) {
+ // Use StringRef::str() to create a temporary std::string. This could cause
+ // problems if the CAS is storing references to the input string instead of
+ // copying it.
+ std::optional<ObjectRef> Node;
+ ASSERT_THAT_ERROR(
+ CAS1->store(std::nullopt, arrayRefFromStringRef<char>(Content))
+ .moveInto(Node),
+ Succeeded());
+ Nodes.push_back(*Node);
+
+ // Check basic printing of IDs.
+ IDs.push_back(CAS1->getID(*Node));
+ EXPECT_EQ(IDs.back().toString(), IDs.back().toString());
+ EXPECT_EQ(Nodes.front(), Nodes.front());
+ EXPECT_EQ(Nodes.back(), Nodes.back());
+ EXPECT_EQ(IDs.front(), IDs.front());
+ EXPECT_EQ(IDs.back(), IDs.back());
+ if (Nodes.size() <= 1)
+ continue;
+ EXPECT_NE(Nodes.front(), Nodes.back());
+ EXPECT_NE(IDs.front(), IDs.back());
+ }
+
+ // Check that the blobs give the same IDs later.
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ std::optional<ObjectRef> Node;
+ ASSERT_THAT_ERROR(CAS1->store(std::nullopt, arrayRefFromStringRef<char>(
+ ContentStrings[I]))
+ .moveInto(Node),
+ Succeeded());
+ EXPECT_EQ(IDs[I], CAS1->getID(*Node));
+ }
+
+ // Check that the blobs can be retrieved multiple times.
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ for (int J = 0, JE = 3; J != JE; ++J) {
+ std::optional<ObjectProxy> Object;
+ ASSERT_THAT_ERROR(CAS1->getProxy(IDs[I]).moveInto(Object), Succeeded());
+ ASSERT_TRUE(Object);
+ EXPECT_EQ(ContentStrings[I], Object->getData());
+ }
+ }
+
+ // Confirm these blobs don't exist in a fresh CAS instance.
+ std::unique_ptr<ObjectStore> CAS2 = createObjectStore();
+ for (int I = 0, E = IDs.size(); I != E; ++I) {
+ std::optional<ObjectProxy> Object;
+ EXPECT_THAT_ERROR(CAS2->getProxy(IDs[I]).moveInto(Object), Failed());
+ }
+
+ // Insert into the second CAS and confirm the IDs are stable. Getting them
+ // should work now.
+ for (int I = IDs.size(), E = 0; I != E; --I) {
+ auto &ID = IDs[I - 1];
+ auto &Content = ContentStrings[I - 1];
+ std::optional<ObjectRef> Node;
+ ASSERT_THAT_ERROR(
+ CAS2->store(std::nullopt, arrayRefFromStringRef<char>(Content))
+ .moveInto(Node),
+ Succeeded());
+ EXPECT_EQ(ID, CAS2->getID(*Node));
+
+ std::optional<ObjectProxy> Object;
+ ASSERT_THAT_ERROR(CAS2->getProxy(ID).moveInto(Object), Succeeded());
+ ASSERT_TRUE(Object);
+ EXPECT_EQ(Content, Object->getData());
+ }
+}
+
+TEST_P(CASTest, NodesBig) {
+ std::unique_ptr<ObjectStore> CAS = createObjectStore();
+
+ // Specifically check near 1MB for objects large enough they're likely to be
+ // stored externally in an on-disk CAS, and such that one of them will be
+ // near a page boundary.
+ SmallString<0> Storage;
+ constexpr size_t InterestingSize = 1024U * 1024ULL;
+ constexpr size_t WordSize = sizeof(void *);
+
+ // Start much smaller to account for headers.
+ constexpr size_t SizeB = InterestingSize - 8 * WordSize;
+ constexpr size_t SizeE = InterestingSize + 1;
+ if (Storage.size() < SizeE)
+ Storage.resize(SizeE, '\01');
+
+ SmallVector<ObjectRef, 4> CreatedNodes;
+ // Avoid checking every size because this is an expensive test. Just check
+ // for data that is 8B-word-aligned, and one less. Also appending the created
+ // nodes as the references in the next block to check references are created
+ // correctly.
+ for (size_t Size = SizeB; Size < SizeE; Size += WordSize) {
+ for (bool IsAligned : {false, true}) {
+ StringRef Data(Storage.data(), Size - (IsAligned ? 0 : 1));
+ std::optional<ObjectProxy> Node;
+ ASSERT_THAT_ERROR(CAS->createProxy(CreatedNodes, Data).moveInto(Node),
+ Succeeded());
+ ASSERT_EQ(Data, Node->getData());
+ ASSERT_EQ(0, Node->getData().end()[0]);
+ ASSERT_EQ(Node->getNumReferences(), CreatedNodes.size());
+ CreatedNodes.emplace_back(Node->getRef());
+ }
+ }
+
+ for (auto ID : CreatedNodes)
+ ASSERT_THAT_ERROR(CAS->validate(CAS->getID(ID)), Succeeded());
+}
+
+/// Common test functionality for creating blobs in parallel. You can vary which
+/// cas instances are the same or different, and the size of the created blobs.
+static void testBlobsParallel(ObjectStore &Read1, ObjectStore &Read2,
+ ObjectStore &Write1, ObjectStore &Write2,
+ uint64_t BlobSize) {
+ SCOPED_TRACE(testBlobsParallel);
+ unsigned BlobCount = 100;
+ std::vector<std::string> Blobs;
+ Blobs.reserve(BlobCount);
+ for (unsigned I = 0; I < BlobCount; ++I) {
+ std::string Blob;
+ Blob.reserve(BlobSize);
+ while (Blob.size() < BlobSize) {
+ auto R = sys::Process::GetRandomNumber();
+ Blob.append((char *)&R, sizeof(R));
+ }
+ assert(Blob.size() >= BlobSize);
+ Blob.resize(BlobSize);
+ Blobs.push_back(std::move(Blob));
+ }
+
+ std::mutex NodesMtx;
+ std::vector<std::optional<CASID>> CreatedNodes(BlobCount);
+
+ auto Producer = [&](unsigned I, ObjectStore *CAS) {
+ std::optional<ObjectProxy> Node;
+ EXPECT_THAT_ERROR(CAS->createProxy({}, Blobs[I]).moveInto(Node),
+ Succeeded());
+ {
+ std::lock_guard<std::mutex> L(NodesMtx);
+ CreatedNodes[I] = Node ? Node->getID() : CASID::getDenseMapTombstoneKey();
+ }
+ };
+
+ auto Consumer = [&](unsigned I, ObjectStore *CAS) {
+ std::optional<CASID> ID;
+ while (!ID) {
+ // Busy wait.
+ std::lock_guard<std::mutex> L(NodesMtx);
+ ID = CreatedNodes[I];
+ }
+ if (ID == CASID::getDenseMapTombstoneKey())
+ // Producer failed; already reported.
+ return;
+
+ std::optional<ObjectProxy> Node;
+ ASSERT_THAT_ERROR(CAS->getProxy(*ID).moveInto(Node), Succeeded());
+ EXPECT_EQ(Node->getData(), Blobs[I]);
+ };
+
+ DefaultThreadPool Threads;
+ for (unsigned I = 0; I < BlobCount; ++I) {
+ Threads.async(Consumer, I, &Read1);
+ Threads.async(Consumer, I, &Read2);
+ Threads.async(Producer, I, &Write1);
+ Threads.async(Producer, I, &Write2);
+ }
+
+ Threads.wait();
+}
+
+static void testBlobsParallel1(ObjectStore &CAS, uint64_t BlobSize) {
+ SCOPED_TRACE(testBlobsParallel1);
+ testBlobsParallel(CAS, CAS, CAS, CAS, BlobSize);
+}
+
+TEST_P(CASTest, BlobsParallel) {
+ std::shared_ptr<ObjectStore> CAS = createObjectStore();
+ uint64_t Size = 1ULL * 1024;
+ ASSERT_NO_FATAL_FAILURE(testBlobsParallel1(*CAS, Size));
+}
+
+#ifdef EXPENSIVE_CHECKS
+TEST_P(CASTest, BlobsBigParallel) {
+ std::shared_ptr<ObjectStore> CAS = createObjectStore();
+ // 100k is large enough to be standalone files in our on-disk cas.
+ uint64_t Size = 100ULL * 1024;
+ ASSERT_NO_FATAL_FAILURE(testBlobsParallel1(*CAS, Size));
+}
+#endif
diff --git a/llvm/unittests/CMakeLists.txt b/llvm/unittests/CMakeLists.txt
index 8892f3e75729ab..5ebdc3bb4cac13 100644
--- a/llvm/unittests/CMakeLists.txt
+++ b/llvm/unittests/CMakeLists.txt
@@ -34,6 +34,7 @@ add_subdirectory(AsmParser)
add_subdirectory(BinaryFormat)
add_subdirectory(Bitcode)
add_subdirectory(Bitstream)
+add_subdirectory(CAS)
add_subdirectory(CGData)
add_subdirectory(CodeGen)
add_subdirectory(DebugInfo)
>From ee98c85d7f5274a7e0b86cc839cc9d0ad5a1e05f Mon Sep 17 00:00:00 2001
From: Steven Wu <stevenwu at apple.com>
Date: Wed, 30 Oct 2024 14:54:44 -0700
Subject: [PATCH 2/2] Address review feedback
Created using spr 1.3.5
---
llvm/docs/ContentAddressableStorage.md | 55 +++++++++++++-------------
llvm/include/llvm/CAS/CASReference.h | 14 +------
llvm/lib/CAS/InMemoryCAS.cpp | 23 ++++++-----
llvm/lib/CAS/ObjectStore.cpp | 20 ++++------
4 files changed, 50 insertions(+), 62 deletions(-)
diff --git a/llvm/docs/ContentAddressableStorage.md b/llvm/docs/ContentAddressableStorage.md
index 4f2d9a6a3a9185..1cd788382c653f 100644
--- a/llvm/docs/ContentAddressableStorage.md
+++ b/llvm/docs/ContentAddressableStorage.md
@@ -6,8 +6,8 @@ Content Addressable Storage, or `CAS`, is a storage system where it assigns
unique addresses to the data stored. It is very useful for data deduplicaton
and creating unique identifiers.
-Unlikely other kind of storage system like file system, CAS is immutable. It
-is more reliable to model a computation when representing the inputs and outputs
+Unlike other kinds of storage system like a file system, CAS is immutable. It
+is more reliable to model a computation by representing the inputs and outputs
of the computation using objects stored in CAS.
The basic unit of the CAS library is a CASObject, where it contains:
@@ -24,11 +24,10 @@ struct CASObject {
}
```
-Such abstraction can allow simple composition of CASObjects into a DAG to
-represent complicated data structure while still allowing data deduplication.
-Note you can compare two DAGs by just comparing the CASObject hash of two
-root nodes.
-
+With this abstraction, it is possible to compose CASObjects into a DAG that is
+capable of representing complicated data structures, while still allowing data
+deduplication. Note you can compare two DAGs by just comparing the CASObject
+hash of two root nodes.
## LLVM CAS Library User Guide
@@ -47,11 +46,11 @@ along. It has following properties:
`ObjectRef` created by different `ObjectStore` cannot be cross-referenced or
compared.
* `ObjectRef` doesn't guarantee the existence of the CASObject it points to. An
-explicitly load is required before accessing the data stored in CASObject.
-This load can also fail, for reasons like but not limited to: object does
+explicit load is required before accessing the data stored in CASObject.
+This load can also fail, for reasons like (but not limited to): object does
not exist, corrupted CAS storage, operation timeout, etc.
-* If two `ObjectRef` are equal, it is guarantee that the object they point to
-(if exists) are identical. If they are not equal, the underlying objects are
+* If two `ObjectRef` are equal, it is guaranteed that the object they point to
+are identical (if they exist). If they are not equal, the underlying objects are
guaranteed to be not the same.
### ObjectProxy
@@ -88,33 +87,33 @@ It also provides APIs to convert between `ObjectRef`, `ObjectProxy` and
## CAS Library Implementation Guide
-The LLVM ObjectStore APIs are designed so that it is easy to add
-customized CAS implementation that are interchangeable with builtin
-CAS implementations.
+The LLVM ObjectStore API was designed so that it is easy to add
+customized CAS implementations that are interchangeable with the builtin
+ones.
To add your own implementation, you just need to add a subclass to
`llvm::cas::ObjectStore` and implement all its pure virtual methods.
To be interchangeable with LLVM ObjectStore, the new CAS implementation
needs to conform to following contracts:
-* Different CASObject stored in the ObjectStore needs to have a different hash
-and result in a different `ObjectRef`. Vice versa, same CASObject should have
-same hash and same `ObjectRef`. Note two different CASObjects with identical
-data but different references are considered different objects.
-* `ObjectRef`s are comparable within the same `ObjectStore` instance, and can
-be used to determine the equality of the underlying CASObjects.
-* The loaded objects from the ObjectStore need to have the lifetime to be at
-least as long as the ObjectStore itself.
+* Different CASObjects stored in the ObjectStore need to have a different hash
+and result in a different `ObjectRef`. Similarly, the same CASObject should have
+the same hash and the same `ObjectRef`. Note: two different CASObjects with
+identical data but different references are considered different objects.
+* `ObjectRef`s are only comparable within the same `ObjectStore` instance, and
+can be used to determine the equality of the underlying CASObjects.
+* The loaded objects from the ObjectStore need to have a lifetime at least as
+long as the ObjectStore itself.
If not specified, the behavior can be implementation defined. For example,
`ObjectRef` can be used to point to a loaded CASObject so
`ObjectStore` never fails to load. It is also legal to use a stricter model
-than required. For example, an `ObjectRef` that can be used to compare
-objects between different `ObjectStore` instances is legal but user
-of the ObjectStore should not depend on this behavior.
+than required. For example, an `ObjectRef` can be an unique indentity of
+the objects across multiple `ObjectStore` instances but users of the LLVMCAS
+should not depend on this behavior.
-For CAS library implementer, there is also a `ObjectHandle` class that
+For CAS library implementers, there is also an `ObjectHandle` class that
is an internal representation of a loaded CASObject reference.
-`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, because
+`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, and
just like `ObjectRef`, `ObjectHandle` is only useful when paired with
-the ObjectStore that knows about the loaded CASObject.
+the `ObjectStore` that knows about the loaded CASObject.
diff --git a/llvm/include/llvm/CAS/CASReference.h b/llvm/include/llvm/CAS/CASReference.h
index 1f435cf306c4ca..e41c04ca2655d8 100644
--- a/llvm/include/llvm/CAS/CASReference.h
+++ b/llvm/include/llvm/CAS/CASReference.h
@@ -89,7 +89,7 @@ class ReferenceBase {
#endif
};
-/// Reference to an object in a \a ObjectStore instance.
+/// Reference to an object in an \a ObjectStore instance.
///
/// If you have an ObjectRef, you know the object exists, and you can point at
/// it from new nodes with \a ObjectStore::store(), but you don't know anything
@@ -105,12 +105,6 @@ class ReferenceBase {
/// ObjectHandle, a variant that knows what kind of entity it is. \a
/// ObjectStore::getReferenceKind() can expect the type of reference without
/// asking for unloaded objects to be loaded.
-///
-/// This is a wrapper around a \c uint64_t (and a \a ObjectStore instance when
-/// assertions are on). If necessary, it can be deconstructed and reconstructed
-/// using \a Reference::getInternalRef() and \a
-/// Reference::getFromInternalRef(), but clients aren't expected to need to do
-/// this. These both require the right \a ObjectStore instance.
class ObjectRef : public ReferenceBase {
struct DenseMapTag {};
@@ -122,12 +116,6 @@ class ObjectRef : public ReferenceBase {
return !(LHS == RHS);
}
- /// Allow a reference to be recreated after it's deconstructed.
- static ObjectRef getFromInternalRef(const ObjectStore &CAS,
- uint64_t InternalRef) {
- return ObjectRef(CAS, InternalRef);
- }
-
static ObjectRef getDenseMapEmptyKey() {
return ObjectRef(DenseMapEmptyTag{});
}
diff --git a/llvm/lib/CAS/InMemoryCAS.cpp b/llvm/lib/CAS/InMemoryCAS.cpp
index abdd7ed3ef8051..f0305e0d4eafae 100644
--- a/llvm/lib/CAS/InMemoryCAS.cpp
+++ b/llvm/lib/CAS/InMemoryCAS.cpp
@@ -13,6 +13,7 @@
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/ThreadSafeAllocator.h"
+#include "llvm/Support/TrailingObjects.h"
using namespace llvm;
using namespace llvm::cas;
@@ -69,12 +70,12 @@ class InMemoryObject {
static_assert(((int)Kind::Max >> NumKindBits) == 0, "Kind will be truncated");
public:
- inline ArrayRef<char> getData() const;
+ ArrayRef<char> getData() const;
- inline ArrayRef<const InMemoryObject *> getRefs() const;
+ ArrayRef<const InMemoryObject *> getRefs() const;
};
-class InMemoryRefObject : public InMemoryObject {
+class InMemoryRefObject final : public InMemoryObject {
public:
static constexpr Kind KindValue = Kind::RefNode;
static bool classof(const InMemoryObject *O) {
@@ -107,7 +108,10 @@ class InMemoryRefObject : public InMemoryObject {
ArrayRef<char> Data;
};
-class InMemoryInlineObject : public InMemoryObject {
+class InMemoryInlineObject final
+ : public InMemoryObject,
+ public TrailingObjects<InMemoryInlineObject, const InMemoryObject *,
+ char> {
public:
static constexpr Kind KindValue = Kind::InlineNode;
static bool classof(const InMemoryObject *O) {
@@ -116,15 +120,12 @@ class InMemoryInlineObject : public InMemoryObject {
ArrayRef<const InMemoryObject *> getRefs() const { return getRefsImpl(); }
ArrayRef<const InMemoryObject *> getRefsImpl() const {
- return ArrayRef(reinterpret_cast<const InMemoryObject *const *>(this + 1),
- NumRefs);
+ return ArrayRef(getTrailingObjects<const InMemoryObject *>(), NumRefs);
}
ArrayRef<char> getData() const { return getDataImpl(); }
ArrayRef<char> getDataImpl() const {
- ArrayRef<const InMemoryObject *> Refs = getRefs();
- return ArrayRef(reinterpret_cast<const char *>(Refs.data() + Refs.size()),
- DataSize);
+ return ArrayRef(getTrailingObjects<char>(), DataSize);
}
static InMemoryInlineObject &
@@ -136,6 +137,10 @@ class InMemoryInlineObject : public InMemoryObject {
return *new (Mem) InMemoryInlineObject(I, Refs, Data);
}
+ size_t numTrailingObjects(OverloadToken<const InMemoryObject *>) const {
+ return NumRefs;
+ }
+
private:
InMemoryInlineObject(const InMemoryIndexValueT &I,
ArrayRef<const InMemoryObject *> Refs,
diff --git a/llvm/lib/CAS/ObjectStore.cpp b/llvm/lib/CAS/ObjectStore.cpp
index a938c4e215382e..179621cfa296c3 100644
--- a/llvm/lib/CAS/ObjectStore.cpp
+++ b/llvm/lib/CAS/ObjectStore.cpp
@@ -12,6 +12,7 @@
#include "llvm/Support/Errc.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/MemoryBuffer.h"
+#include <optional>
using namespace llvm;
using namespace llvm::cas;
@@ -121,20 +122,15 @@ Expected<ObjectProxy> ObjectStore::createProxy(ArrayRef<ObjectRef> Refs,
Expected<ObjectRef>
ObjectStore::storeFromOpenFileImpl(sys::fs::file_t FD,
std::optional<sys::fs::file_status> Status) {
- // Copy the file into an immutable memory buffer and call \c store on that.
- // Using \c mmap would be unsafe because there's a race window between when we
- // get the digest hash for the \c mmap contents and when we store the data; if
- // the file changes in-between we will create an invalid object.
-
- // FIXME: For the on-disk CAS implementation use cloning to store it as a
+ // TODO: For the on-disk CAS implementation use cloning to store it as a
// standalone file if the file-system supports it and the file is large.
+ uint64_t Size = Status ? Status->getSize() : -1;
+ auto Buffer = MemoryBuffer::getOpenFile(FD, /*Filename=*/"", Size);
+ if (Buffer)
+ return errorCodeToError(Buffer.getError());
- constexpr size_t ChunkSize = 4 * 4096;
- SmallString<0> Data;
- Data.reserve(ChunkSize * 2);
- if (Error E = sys::fs::readNativeFileToEOF(FD, Data, ChunkSize))
- return std::move(E);
- return store(std::nullopt, ArrayRef(Data.data(), Data.size()));
+ return store(std::nullopt,
+ arrayRefFromStringRef<char>((*Buffer)->getBuffer()));
}
Error ObjectStore::validateTree(ObjectRef Root) {
More information about the llvm-commits
mailing list