[llvm] [CAS] Add LLVMCAS library with InMemoryCAS implementation (PR #114096)

Steven Wu via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 30 14:54:54 PDT 2024


https://github.com/cachemeifyoucan updated https://github.com/llvm/llvm-project/pull/114096

>From 9bf0f3079c410eb096ad3c2cefb89679bd34282b Mon Sep 17 00:00:00 2001
From: Steven Wu <stevenwu at apple.com>
Date: Tue, 29 Oct 2024 10:36:55 -0700
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.5
---
 llvm/docs/ContentAddressableStorage.md      | 120 +++++++
 llvm/docs/Reference.rst                     |   4 +
 llvm/include/llvm/CAS/BuiltinCASContext.h   |  88 +++++
 llvm/include/llvm/CAS/BuiltinObjectHasher.h |  81 +++++
 llvm/include/llvm/CAS/CASID.h               | 156 +++++++++
 llvm/include/llvm/CAS/CASReference.h        | 207 +++++++++++
 llvm/include/llvm/CAS/ObjectStore.h         | 302 ++++++++++++++++
 llvm/include/module.modulemap               |   6 +
 llvm/lib/CAS/BuiltinCAS.cpp                 |  94 +++++
 llvm/lib/CAS/BuiltinCAS.h                   |  74 ++++
 llvm/lib/CAS/CMakeLists.txt                 |   8 +
 llvm/lib/CAS/InMemoryCAS.cpp                | 320 +++++++++++++++++
 llvm/lib/CAS/ObjectStore.cpp                | 168 +++++++++
 llvm/lib/CMakeLists.txt                     |   1 +
 llvm/unittests/CAS/CASTestConfig.cpp        |  22 ++
 llvm/unittests/CAS/CASTestConfig.h          |  32 ++
 llvm/unittests/CAS/CMakeLists.txt           |  12 +
 llvm/unittests/CAS/ObjectStoreTest.cpp      | 360 ++++++++++++++++++++
 llvm/unittests/CMakeLists.txt               |   1 +
 19 files changed, 2056 insertions(+)
 create mode 100644 llvm/docs/ContentAddressableStorage.md
 create mode 100644 llvm/include/llvm/CAS/BuiltinCASContext.h
 create mode 100644 llvm/include/llvm/CAS/BuiltinObjectHasher.h
 create mode 100644 llvm/include/llvm/CAS/CASID.h
 create mode 100644 llvm/include/llvm/CAS/CASReference.h
 create mode 100644 llvm/include/llvm/CAS/ObjectStore.h
 create mode 100644 llvm/lib/CAS/BuiltinCAS.cpp
 create mode 100644 llvm/lib/CAS/BuiltinCAS.h
 create mode 100644 llvm/lib/CAS/CMakeLists.txt
 create mode 100644 llvm/lib/CAS/InMemoryCAS.cpp
 create mode 100644 llvm/lib/CAS/ObjectStore.cpp
 create mode 100644 llvm/unittests/CAS/CASTestConfig.cpp
 create mode 100644 llvm/unittests/CAS/CASTestConfig.h
 create mode 100644 llvm/unittests/CAS/CMakeLists.txt
 create mode 100644 llvm/unittests/CAS/ObjectStoreTest.cpp

diff --git a/llvm/docs/ContentAddressableStorage.md b/llvm/docs/ContentAddressableStorage.md
new file mode 100644
index 00000000000000..4f2d9a6a3a9185
--- /dev/null
+++ b/llvm/docs/ContentAddressableStorage.md
@@ -0,0 +1,120 @@
+# Content Addressable Storage
+
+## Introduction to CAS
+
+Content Addressable Storage, or `CAS`, is a storage system where it assigns
+unique addresses to the data stored. It is very useful for data deduplicaton
+and creating unique identifiers.
+
+Unlikely other kind of storage system like file system, CAS is immutable. It
+is more reliable to model a computation when representing the inputs and outputs
+of the computation using objects stored in CAS.
+
+The basic unit of the CAS library is a CASObject, where it contains:
+
+* Data: arbitrary data
+* References: references to other CASObject
+
+It can be conceptually modeled as something like:
+
+```
+struct CASObject {
+  ArrayRef<char> Data;
+  ArrayRef<CASObject*> Refs;
+}
+```
+
+Such abstraction can allow simple composition of CASObjects into a DAG to
+represent complicated data structure while still allowing data deduplication.
+Note you can compare two DAGs by just comparing the CASObject hash of two
+root nodes.
+
+
+
+## LLVM CAS Library User Guide
+
+The CAS-like storage provided in LLVM is `llvm::cas::ObjectStore`.
+To reference a CASObject, there are few different abstractions provided
+with different trade-offs:
+
+### ObjectRef
+
+`ObjectRef` is a lightweight reference to a CASObject stored in the CAS.
+This is the most commonly used abstraction and it is cheap to copy/pass
+along. It has following properties:
+
+* `ObjectRef` is only meaningful within the `ObjectStore` that created the ref.
+`ObjectRef` created by different `ObjectStore` cannot be cross-referenced or
+compared.
+* `ObjectRef` doesn't guarantee the existence of the CASObject it points to. An
+explicitly load is required before accessing the data stored in CASObject.
+This load can also fail, for reasons like but not limited to: object does
+not exist, corrupted CAS storage, operation timeout, etc.
+* If two `ObjectRef` are equal, it is guarantee that the object they point to
+(if exists) are identical. If they are not equal, the underlying objects are
+guaranteed to be not the same.
+
+### ObjectProxy
+
+`ObjectProxy` represents a loaded CASObject. With an `ObjectProxy`, the
+underlying stored data and references can be accessed without the need
+of error handling. The class APIs also provide convenient methods to
+access underlying data. The lifetime of the underlying data is equal to
+the lifetime of the instance of `ObjectStore` unless explicitly copied.
+
+### CASID
+
+`CASID` is the hash identifier for CASObjects. It owns the underlying
+storage for hash value so it can be expensive to copy and compare depending
+on the hash algorithm. `CASID` is generally only useful in rare situations
+like printing raw hash value or exchanging hash values between different
+CAS instances with the same hashing schema.
+
+### ObjectStore
+
+`ObjectStore` is the CAS-like object storage. It provides API to save
+and load CASObjects, for example:
+
+```
+ObjectRef A, B, C;
+Expected<ObjectRef> Stored = ObjectStore.store("data", {A, B});
+Expected<ObjectProxy> Loaded = ObjectStore.getProxy(C);
+```
+
+It also provides APIs to convert between `ObjectRef`, `ObjectProxy` and
+`CASID`.
+
+
+
+## CAS Library Implementation Guide
+
+The LLVM ObjectStore APIs are designed so that it is easy to add
+customized CAS implementation that are interchangeable with builtin
+CAS implementations.
+
+To add your own implementation, you just need to add a subclass to
+`llvm::cas::ObjectStore` and implement all its pure virtual methods.
+To be interchangeable with LLVM ObjectStore, the new CAS implementation
+needs to conform to following contracts:
+
+* Different CASObject stored in the ObjectStore needs to have a different hash
+and result in a different `ObjectRef`. Vice versa, same CASObject should have
+same hash and same `ObjectRef`. Note two different CASObjects with identical
+data but different references are considered different objects.
+* `ObjectRef`s are comparable within the same `ObjectStore` instance, and can
+be used to determine the equality of the underlying CASObjects.
+* The loaded objects from the ObjectStore need to have the lifetime to be at
+least as long as the ObjectStore itself.
+
+If not specified, the behavior can be implementation defined. For example,
+`ObjectRef` can be used to point to a loaded CASObject so
+`ObjectStore` never fails to load. It is also legal to use a stricter model
+than required. For example, an `ObjectRef` that can be used to compare
+objects between different `ObjectStore` instances is legal but user
+of the ObjectStore should not depend on this behavior.
+
+For CAS library implementer, there is also a `ObjectHandle` class that
+is an internal representation of a loaded CASObject reference.
+`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, because
+just like `ObjectRef`, `ObjectHandle` is only useful when paired with
+the ObjectStore that knows about the loaded CASObject.
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
index df61628b06c7db..ae03a3a7bfa9aa 100644
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -15,6 +15,7 @@ LLVM and API reference documentation.
    BranchWeightMetadata
    Bugpoint
    CommandGuide/index
+   ContentAddressableStorage
    ConvergenceAndUniformity
    ConvergentOperations
    Coroutines
@@ -232,3 +233,6 @@ Additional Topics
 :doc:`ConvergenceAndUniformity`
    A description of uniformity analysis in the presence of irreducible
    control flow, and its implementation.
+
+:doc:`ContentAddressableStorage`
+   A reference guide for using LLVM's CAS library.
diff --git a/llvm/include/llvm/CAS/BuiltinCASContext.h b/llvm/include/llvm/CAS/BuiltinCASContext.h
new file mode 100644
index 00000000000000..ebc4ca8bd1f2e9
--- /dev/null
+++ b/llvm/include/llvm/CAS/BuiltinCASContext.h
@@ -0,0 +1,88 @@
+//===- BuiltinCASContext.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_BUILTINCASCONTEXT_H
+#define LLVM_CAS_BUILTINCASCONTEXT_H
+
+#include "llvm/CAS/CASID.h"
+#include "llvm/Support/BLAKE3.h"
+#include "llvm/Support/Error.h"
+
+namespace llvm::cas::builtin {
+
+/// Current hash type for the builtin CAS.
+///
+/// FIXME: This should be configurable via an enum to allow configuring the hash
+/// function. The enum should be sent into \a createInMemoryCAS() and \a
+/// createOnDiskCAS().
+///
+/// This is important (at least) for future-proofing, when we want to make new
+/// CAS instances use BLAKE7, but still know how to read/write BLAKE3.
+///
+/// Even just for BLAKE3, it would be useful to have these values:
+///
+///     BLAKE3     => 32B hash from BLAKE3
+///     BLAKE3_16B => 16B hash from BLAKE3 (truncated)
+///
+/// ... where BLAKE3_16 uses \a TruncatedBLAKE3<16>.
+///
+/// Motivation for a truncated hash is that it's cheaper to store. It's not
+/// clear if we always (or ever) need the full 32B, and for an ephemeral
+/// in-memory CAS, we almost certainly don't need it.
+///
+/// Note that the cost is linear in the number of objects for the builtin CAS,
+/// since we're using internal offsets and/or pointers as an optimization.
+///
+/// However, it's possible we'll want to hook up a local builtin CAS to, e.g.,
+/// a distributed generic hash map to use as an ActionCache. In that scenario,
+/// the transitive closure of the structured objects that are the results of
+/// the cached actions would need to be serialized into the map, something
+/// like:
+///
+///     "action:<schema>:<key>" -> "0123"
+///     "object:<schema>:0123"  -> "3,4567,89AB,CDEF,9,some data"
+///     "object:<schema>:4567"  -> ...
+///     "object:<schema>:89AB"  -> ...
+///     "object:<schema>:CDEF"  -> ...
+///
+/// These references would be full cost.
+using HasherT = BLAKE3;
+using HashType = decltype(HasherT::hash(std::declval<ArrayRef<uint8_t> &>()));
+
+class BuiltinCASContext : public CASContext {
+  void printIDImpl(raw_ostream &OS, const CASID &ID) const final;
+  void anchor() override;
+
+public:
+  /// Get the name of the hash for any table identifiers.
+  ///
+  /// FIXME: This should be configurable via an enum, with at the following
+  /// values:
+  ///
+  ///     "BLAKE3"    => 32B hash from BLAKE3
+  ///     "BLAKE3.16" => 16B hash from BLAKE3 (truncated)
+  ///
+  /// Enum can be sent into \a createInMemoryCAS() and \a createOnDiskCAS().
+  static StringRef getHashName() { return "BLAKE3"; }
+  StringRef getHashSchemaIdentifier() const final {
+    static const std::string ID =
+        ("llvm.cas.builtin.v2[" + getHashName() + "]").str();
+    return ID;
+  }
+
+  static const BuiltinCASContext &getDefaultContext();
+
+  BuiltinCASContext() = default;
+
+  static Expected<HashType> parseID(StringRef PrintedDigest);
+  static void printID(ArrayRef<uint8_t> Digest, raw_ostream &OS);
+};
+
+} // namespace llvm::cas::builtin
+
+#endif // LLVM_CAS_BUILTINCASCONTEXT_H
diff --git a/llvm/include/llvm/CAS/BuiltinObjectHasher.h b/llvm/include/llvm/CAS/BuiltinObjectHasher.h
new file mode 100644
index 00000000000000..22e556c5669b55
--- /dev/null
+++ b/llvm/include/llvm/CAS/BuiltinObjectHasher.h
@@ -0,0 +1,81 @@
+//===- BuiltinObjectHasher.h ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_BUILTINOBJECTHASHER_H
+#define LLVM_CAS_BUILTINOBJECTHASHER_H
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/Support/Endian.h"
+
+namespace llvm::cas {
+
+template <class HasherT> class BuiltinObjectHasher {
+public:
+  using HashT = decltype(HasherT::hash(std::declval<ArrayRef<uint8_t> &>()));
+
+  static HashT hashObject(const ObjectStore &CAS, ArrayRef<ObjectRef> Refs,
+                          ArrayRef<char> Data) {
+    BuiltinObjectHasher H;
+    H.updateSize(Refs.size());
+    for (const ObjectRef &Ref : Refs)
+      H.updateRef(CAS, Ref);
+    H.updateArray(Data);
+    return H.finish();
+  }
+
+  static HashT hashObject(ArrayRef<ArrayRef<uint8_t>> Refs,
+                          ArrayRef<char> Data) {
+    BuiltinObjectHasher H;
+    H.updateSize(Refs.size());
+    for (const ArrayRef<uint8_t> &Ref : Refs)
+      H.updateID(Ref);
+    H.updateArray(Data);
+    return H.finish();
+  }
+
+private:
+  HashT finish() { return Hasher.final(); }
+
+  void updateRef(const ObjectStore &CAS, ObjectRef Ref) {
+    updateID(CAS.getID(Ref));
+  }
+
+  void updateID(const CASID &ID) { updateID(ID.getHash()); }
+
+  void updateID(ArrayRef<uint8_t> Hash) {
+    // NOTE: Does not hash the size of the hash. That's a CAS implementation
+    // detail that shouldn't leak into the UUID for an object.
+    assert(Hash.size() == sizeof(HashT) &&
+           "Expected object ref to match the hash size");
+    Hasher.update(Hash);
+  }
+
+  void updateArray(ArrayRef<uint8_t> Bytes) {
+    updateSize(Bytes.size());
+    Hasher.update(Bytes);
+  }
+
+  void updateArray(ArrayRef<char> Bytes) {
+    updateArray(ArrayRef(reinterpret_cast<const uint8_t *>(Bytes.data()),
+                         Bytes.size()));
+  }
+
+  void updateSize(uint64_t Size) {
+    Size = support::endian::byte_swap(Size, endianness::little);
+    Hasher.update(
+        ArrayRef(reinterpret_cast<const uint8_t *>(&Size), sizeof(Size)));
+  }
+
+  BuiltinObjectHasher() = default;
+  ~BuiltinObjectHasher() = default;
+  HasherT Hasher;
+};
+
+} // namespace llvm::cas
+
+#endif // LLVM_CAS_BUILTINOBJECTHASHER_H
diff --git a/llvm/include/llvm/CAS/CASID.h b/llvm/include/llvm/CAS/CASID.h
new file mode 100644
index 00000000000000..5f9110a15819ad
--- /dev/null
+++ b/llvm/include/llvm/CAS/CASID.h
@@ -0,0 +1,156 @@
+//===- llvm/CAS/CASID.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_CASID_H
+#define LLVM_CAS_CASID_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace cas {
+
+class CASID;
+
+/// Context for CAS identifiers.
+class CASContext {
+  virtual void anchor();
+
+public:
+  virtual ~CASContext() = default;
+
+  /// Get an identifer for the schema used by this CAS context. Two CAS
+  /// instances should return \c true for this identifier if and only if their
+  /// CASIDs are safe to compare by hash. This is used by \a
+  /// CASID::equalsImpl().
+  virtual StringRef getHashSchemaIdentifier() const = 0;
+
+protected:
+  /// Print \p ID to \p OS.
+  virtual void printIDImpl(raw_ostream &OS, const CASID &ID) const = 0;
+
+  friend class CASID;
+};
+
+/// Unique identifier for a CAS object.
+///
+/// Locally, stores an internal CAS identifier that's specific to a single CAS
+/// instance. It's guaranteed not to change across the view of that CAS, but
+/// might change between runs.
+///
+/// It also has \a CASIDContext pointer to allow comparison of these
+/// identifiers. If two CASIDs are from the same CASIDContext, they can be
+/// compared directly. If they are, then \a
+/// CASIDContext::getHashSchemaIdentifier() is compared to see if they can be
+/// compared by hash, in which case the result of \a getHash() is compared.
+class CASID {
+public:
+  void dump() const;
+  void print(raw_ostream &OS) const {
+    return getContext().printIDImpl(OS, *this);
+  }
+  friend raw_ostream &operator<<(raw_ostream &OS, const CASID &ID) {
+    ID.print(OS);
+    return OS;
+  }
+  std::string toString() const;
+
+  ArrayRef<uint8_t> getHash() const {
+    return arrayRefFromStringRef<uint8_t>(Hash);
+  }
+
+  friend bool operator==(const CASID &LHS, const CASID &RHS) {
+    if (LHS.Context == RHS.Context)
+      return LHS.Hash == RHS.Hash;
+
+    // EmptyKey or TombstoneKey.
+    if (!LHS.Context || !RHS.Context)
+      return false;
+
+    // CASIDs are equal when they have the same hash schema and same hash value.
+    return LHS.Context->getHashSchemaIdentifier() ==
+               RHS.Context->getHashSchemaIdentifier() &&
+           LHS.Hash == RHS.Hash;
+  }
+
+  friend bool operator!=(const CASID &LHS, const CASID &RHS) {
+    return !(LHS == RHS);
+  }
+
+  friend hash_code hash_value(const CASID &ID) {
+    ArrayRef<uint8_t> Hash = ID.getHash();
+    return hash_combine_range(Hash.begin(), Hash.end());
+  }
+
+  const CASContext &getContext() const {
+    assert(Context && "Tombstone or empty key for DenseMap?");
+    return *Context;
+  }
+
+  static CASID getDenseMapEmptyKey() {
+    return CASID(nullptr, DenseMapInfo<StringRef>::getEmptyKey());
+  }
+  static CASID getDenseMapTombstoneKey() {
+    return CASID(nullptr, DenseMapInfo<StringRef>::getTombstoneKey());
+  }
+
+  CASID() = delete;
+
+  static CASID create(const CASContext *Context, StringRef Hash) {
+    return CASID(Context, Hash);
+  }
+
+private:
+  CASID(const CASContext *Context, StringRef Hash)
+      : Context(Context), Hash(Hash) {}
+
+  const CASContext *Context;
+  SmallString<32> Hash;
+};
+
+/// This is used to workaround the issue of MSVC needing default-constructible
+/// types for \c std::promise/future.
+template <typename T> struct AsyncValue {
+  Expected<std::optional<T>> take() { return std::move(Value); }
+
+  AsyncValue() : Value(std::nullopt) {}
+  AsyncValue(Error &&E) : Value(std::move(E)) {}
+  AsyncValue(T &&V) : Value(std::move(V)) {}
+  AsyncValue(std::nullopt_t) : Value(std::nullopt) {}
+  AsyncValue(Expected<std::optional<T>> &&Obj) : Value(std::move(Obj)) {}
+
+private:
+  Expected<std::optional<T>> Value;
+};
+
+} // namespace cas
+
+template <> struct DenseMapInfo<cas::CASID> {
+  static cas::CASID getEmptyKey() { return cas::CASID::getDenseMapEmptyKey(); }
+
+  static cas::CASID getTombstoneKey() {
+    return cas::CASID::getDenseMapTombstoneKey();
+  }
+
+  static unsigned getHashValue(cas::CASID ID) {
+    return (unsigned)hash_value(ID);
+  }
+
+  static bool isEqual(cas::CASID LHS, cas::CASID RHS) { return LHS == RHS; }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CAS_CASID_H
diff --git a/llvm/include/llvm/CAS/CASReference.h b/llvm/include/llvm/CAS/CASReference.h
new file mode 100644
index 00000000000000..1f435cf306c4ca
--- /dev/null
+++ b/llvm/include/llvm/CAS/CASReference.h
@@ -0,0 +1,207 @@
+//===- llvm/CAS/CASReference.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_CASREFERENCE_H
+#define LLVM_CAS_CASREFERENCE_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMapInfo.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace llvm {
+
+class raw_ostream;
+
+namespace cas {
+
+class ObjectStore;
+
+class ObjectHandle;
+class ObjectRef;
+
+/// Base class for references to things in \a ObjectStore.
+class ReferenceBase {
+protected:
+  struct DenseMapEmptyTag {};
+  struct DenseMapTombstoneTag {};
+  static constexpr uint64_t getDenseMapEmptyRef() { return -1ULL; }
+  static constexpr uint64_t getDenseMapTombstoneRef() { return -2ULL; }
+
+public:
+  /// Get an internal reference.
+  uint64_t getInternalRef(const ObjectStore &ExpectedCAS) const {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+    assert(CAS == &ExpectedCAS && "Extracting reference for the wrong CAS");
+#endif
+    return InternalRef;
+  }
+
+  unsigned getDenseMapHash() const {
+    return (unsigned)llvm::hash_value(InternalRef);
+  }
+  bool isDenseMapEmpty() const { return InternalRef == getDenseMapEmptyRef(); }
+  bool isDenseMapTombstone() const {
+    return InternalRef == getDenseMapTombstoneRef();
+  }
+  bool isDenseMapSentinel() const {
+    return isDenseMapEmpty() || isDenseMapTombstone();
+  }
+
+protected:
+  void print(raw_ostream &OS, const ObjectHandle &This) const;
+  void print(raw_ostream &OS, const ObjectRef &This) const;
+
+  bool hasSameInternalRef(const ReferenceBase &RHS) const {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+    assert(
+        (isDenseMapSentinel() || RHS.isDenseMapSentinel() || CAS == RHS.CAS) &&
+        "Cannot compare across CAS instances");
+#endif
+    return InternalRef == RHS.InternalRef;
+  }
+
+protected:
+  friend class ObjectStore;
+  ReferenceBase(const ObjectStore *CAS, uint64_t InternalRef, bool IsHandle)
+      : InternalRef(InternalRef) {
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+    this->CAS = CAS;
+#endif
+    assert(InternalRef != getDenseMapEmptyRef() && "Reserved for DenseMapInfo");
+    assert(InternalRef != getDenseMapTombstoneRef() &&
+           "Reserved for DenseMapInfo");
+  }
+  explicit ReferenceBase(DenseMapEmptyTag)
+      : InternalRef(getDenseMapEmptyRef()) {}
+  explicit ReferenceBase(DenseMapTombstoneTag)
+      : InternalRef(getDenseMapTombstoneRef()) {}
+
+private:
+  uint64_t InternalRef;
+
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+  const ObjectStore *CAS = nullptr;
+#endif
+};
+
+/// Reference to an object in a \a ObjectStore instance.
+///
+/// If you have an ObjectRef, you know the object exists, and you can point at
+/// it from new nodes with \a ObjectStore::store(), but you don't know anything
+/// about it. "Loading" the object is a separate step that may not have
+/// happened yet, and which can fail (due to filesystem corruption) or
+/// introduce latency (if downloading from a remote store).
+///
+/// \a ObjectStore::store() takes a list of these, and these are returned by \a
+/// ObjectStore::forEachRef() and \a ObjectStore::readRef(), which are accessors
+/// for nodes, and \a ObjectStore::getReference().
+///
+/// \a ObjectStore::load() will load the referenced object, and returns \a
+/// ObjectHandle, a variant that knows what kind of entity it is. \a
+/// ObjectStore::getReferenceKind() can expect the type of reference without
+/// asking for unloaded objects to be loaded.
+///
+/// This is a wrapper around a \c uint64_t (and a \a ObjectStore instance when
+/// assertions are on). If necessary, it can be deconstructed and reconstructed
+/// using \a Reference::getInternalRef() and \a
+/// Reference::getFromInternalRef(), but clients aren't expected to need to do
+/// this. These both require the right \a ObjectStore instance.
+class ObjectRef : public ReferenceBase {
+  struct DenseMapTag {};
+
+public:
+  friend bool operator==(const ObjectRef &LHS, const ObjectRef &RHS) {
+    return LHS.hasSameInternalRef(RHS);
+  }
+  friend bool operator!=(const ObjectRef &LHS, const ObjectRef &RHS) {
+    return !(LHS == RHS);
+  }
+
+  /// Allow a reference to be recreated after it's deconstructed.
+  static ObjectRef getFromInternalRef(const ObjectStore &CAS,
+                                      uint64_t InternalRef) {
+    return ObjectRef(CAS, InternalRef);
+  }
+
+  static ObjectRef getDenseMapEmptyKey() {
+    return ObjectRef(DenseMapEmptyTag{});
+  }
+  static ObjectRef getDenseMapTombstoneKey() {
+    return ObjectRef(DenseMapTombstoneTag{});
+  }
+
+  /// Print internal ref and/or CASID. Only suitable for debugging.
+  void print(raw_ostream &OS) const { return ReferenceBase::print(OS, *this); }
+
+  LLVM_DUMP_METHOD void dump() const;
+
+private:
+  friend class ObjectStore;
+  friend class ReferenceBase;
+  using ReferenceBase::ReferenceBase;
+  ObjectRef(const ObjectStore &CAS, uint64_t InternalRef)
+      : ReferenceBase(&CAS, InternalRef, /*IsHandle=*/false) {
+    assert(InternalRef != -1ULL && "Reserved for DenseMapInfo");
+    assert(InternalRef != -2ULL && "Reserved for DenseMapInfo");
+  }
+  explicit ObjectRef(DenseMapEmptyTag T) : ReferenceBase(T) {}
+  explicit ObjectRef(DenseMapTombstoneTag T) : ReferenceBase(T) {}
+  explicit ObjectRef(ReferenceBase) = delete;
+};
+
+/// Handle to a loaded object in a \a ObjectStore instance.
+///
+/// ObjectHandle encapulates a *loaded* object in the CAS. You need one
+/// of these to inspect the content of an object: to look at its stored
+/// data and references.
+class ObjectHandle : public ReferenceBase {
+public:
+  friend bool operator==(const ObjectHandle &LHS, const ObjectHandle &RHS) {
+    return LHS.hasSameInternalRef(RHS);
+  }
+  friend bool operator!=(const ObjectHandle &LHS, const ObjectHandle &RHS) {
+    return !(LHS == RHS);
+  }
+
+  /// Print internal ref and/or CASID. Only suitable for debugging.
+  void print(raw_ostream &OS) const { return ReferenceBase::print(OS, *this); }
+
+  LLVM_DUMP_METHOD void dump() const;
+
+private:
+  friend class ObjectStore;
+  friend class ReferenceBase;
+  using ReferenceBase::ReferenceBase;
+  explicit ObjectHandle(ReferenceBase) = delete;
+  ObjectHandle(const ObjectStore &CAS, uint64_t InternalRef)
+      : ReferenceBase(&CAS, InternalRef, /*IsHandle=*/true) {}
+};
+
+} // namespace cas
+
+template <> struct DenseMapInfo<cas::ObjectRef> {
+  static cas::ObjectRef getEmptyKey() {
+    return cas::ObjectRef::getDenseMapEmptyKey();
+  }
+
+  static cas::ObjectRef getTombstoneKey() {
+    return cas::ObjectRef::getDenseMapTombstoneKey();
+  }
+
+  static unsigned getHashValue(cas::ObjectRef Ref) {
+    return Ref.getDenseMapHash();
+  }
+
+  static bool isEqual(cas::ObjectRef LHS, cas::ObjectRef RHS) {
+    return LHS == RHS;
+  }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CAS_CASREFERENCE_H
diff --git a/llvm/include/llvm/CAS/ObjectStore.h b/llvm/include/llvm/CAS/ObjectStore.h
new file mode 100644
index 00000000000000..b4720c7edc1543
--- /dev/null
+++ b/llvm/include/llvm/CAS/ObjectStore.h
@@ -0,0 +1,302 @@
+//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_OBJECTSTORE_H
+#define LLVM_CAS_OBJECTSTORE_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CAS/CASID.h"
+#include "llvm/CAS/CASReference.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
+#include <cstddef>
+
+namespace llvm {
+
+class MemoryBuffer;
+template <typename T> class unique_function;
+
+namespace cas {
+
+class ObjectStore;
+class ObjectProxy;
+
+/// Content-addressable storage for objects.
+///
+/// Conceptually, objects are stored in a "unique set".
+///
+/// - Objects are immutable ("value objects") that are defined by their
+///   content. They are implicitly deduplicated by content.
+/// - Each object has a unique identifier (UID) that's derived from its content,
+///   called a \a CASID.
+///     - This UID is a fixed-size (strong) hash of the transitive content of a
+///       CAS object.
+///     - It's comparable between any two CAS instances that have the same \a
+///       CASIDContext::getHashSchemaIdentifier().
+///     - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
+///       by the same or a different CAS instance with \a
+///       ObjectStore::parseID().
+/// - An object can be looked up by content or by UID.
+///     - \a store() is "get-or-create"  methods, writing an object if it
+///       doesn't exist yet, and return a ref to it in any case.
+///     - \a loadObject(const CASID&) looks up an object by its UID.
+/// - Objects can reference other objects, forming an arbitrary DAG.
+///
+/// The \a ObjectStore interface has a few ways of referencing objects:
+///
+/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
+///   opaque type that references an object inside a specific CAS. It is
+///   implementation defined if the underlying object exists or not for an
+///   ObjectRef, and it can used to speed up CAS lookup as an implementation
+///   detail. However, you don't know anything about the underlying objects.
+///   "Loading" the object is a separate step that may not have happened
+///   yet, and which can fail (e.g. due to filesystem corruption) or introduce
+///   latency (if downloading from a remote store).
+/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
+///   these to inspect the content of an object: to look at its stored
+///   data and references. This is internal to CAS implementation and not
+///   availble from CAS public APIs.
+/// - \a CASID: the UID for an object in the CAS, obtained through \a
+///   ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
+///   identifier, but may reference an object that is unknown to this CAS
+///   instance.
+/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
+///   wraps access APIs to avoid having to pass extra parameters. It is the
+///   object used for accessing underlying data and refs by CAS users.
+///
+/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
+/// are only valid with the associated ObjectStore instance.
+///
+/// There are a few options for accessing content of objects, with different
+/// lifetime tradeoffs:
+///
+/// - \a getData() accesses data without exposing lifetime at all.
+/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
+///   is independent of the CAS (it can live longer).
+/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
+///   long as \a ObjectStore.
+/// - \a readRef() and \a forEachRef() iterate through the references in an
+///   object. There is no lifetime assumption.
+class ObjectStore {
+  friend class ObjectProxy;
+  void anchor();
+
+public:
+  /// Get a \p CASID from a \p ID, which should have been generated by \a
+  /// CASID::print(). This succeeds as long as \a validateID() would pass. The
+  /// object may be unknown to this CAS instance.
+  ///
+  /// TODO: Remove, and update callers to use \a validateID() or \a
+  /// extractHashFromID().
+  virtual Expected<CASID> parseID(StringRef ID) = 0;
+
+  /// Store object into ObjectStore.
+  virtual Expected<ObjectRef> store(ArrayRef<ObjectRef> Refs,
+                                    ArrayRef<char> Data) = 0;
+  /// Get an ID for \p Ref.
+  virtual CASID getID(ObjectRef Ref) const = 0;
+
+  /// Get an existing reference to the object called \p ID.
+  ///
+  /// Returns \c None if the object is not stored in this CAS.
+  virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
+
+  /// \returns true if the object is directly available from the local CAS, for
+  /// implementations that have this kind of distinction.
+  virtual Expected<bool> isMaterialized(ObjectRef Ref) const = 0;
+
+  /// Validate the underlying object referred by CASID.
+  virtual Error validate(const CASID &ID) = 0;
+
+protected:
+  /// Load the object referenced by \p Ref.
+  ///
+  /// Errors if the object cannot be loaded.
+  /// \returns \c std::nullopt if the object is missing from the CAS.
+  virtual Expected<std::optional<ObjectHandle>> loadIfExists(ObjectRef Ref) = 0;
+
+  /// Like \c loadIfExists but returns an error if the object is missing.
+  Expected<ObjectHandle> load(ObjectRef Ref);
+
+  /// Get the size of some data.
+  virtual uint64_t getDataSize(ObjectHandle Node) const = 0;
+
+  /// Methods for handling objects.
+  virtual Error forEachRef(ObjectHandle Node,
+                           function_ref<Error(ObjectRef)> Callback) const = 0;
+  virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
+  virtual size_t getNumRefs(ObjectHandle Node) const = 0;
+  virtual ArrayRef<char> getData(ObjectHandle Node,
+                                 bool RequiresNullTerminator = false) const = 0;
+
+  /// Get ObjectRef from open file.
+  virtual Expected<ObjectRef>
+  storeFromOpenFileImpl(sys::fs::file_t FD,
+                        std::optional<sys::fs::file_status> Status);
+
+  /// Get a lifetime-extended StringRef pointing at \p Data.
+  ///
+  /// Depending on the CAS implementation, this may involve in-memory storage
+  /// overhead.
+  StringRef getDataString(ObjectHandle Node) {
+    return toStringRef(getData(Node));
+  }
+
+  /// Get a lifetime-extended MemoryBuffer pointing at \p Data.
+  ///
+  /// Depending on the CAS implementation, this may involve in-memory storage
+  /// overhead.
+  std::unique_ptr<MemoryBuffer>
+  getMemoryBuffer(ObjectHandle Node, StringRef Name = "",
+                  bool RequiresNullTerminator = true);
+
+  /// Read all the refs from object in a SmallVector.
+  virtual void readRefs(ObjectHandle Node,
+                        SmallVectorImpl<ObjectRef> &Refs) const;
+
+  /// Allow ObjectStore implementations to create internal handles.
+#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind)                                \
+  HandleKind make##HandleKind(uint64_t InternalRef) const {                    \
+    return HandleKind(*this, InternalRef);                                     \
+  }
+  MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectHandle)
+  MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectRef)
+#undef MAKE_CAS_HANDLE_CONSTRUCTOR
+
+public:
+  /// Helper functions to store object and returns a ObjectProxy.
+  Expected<ObjectProxy> createProxy(ArrayRef<ObjectRef> Refs, StringRef Data);
+
+  /// Store object from StringRef.
+  Expected<ObjectRef> storeFromString(ArrayRef<ObjectRef> Refs,
+                                      StringRef String) {
+    return store(Refs, arrayRefFromStringRef<char>(String));
+  }
+
+  /// Default implementation reads \p FD and calls \a storeNode(). Does not
+  /// take ownership of \p FD; the caller is responsible for closing it.
+  ///
+  /// If \p Status is sent in it is to be treated as a hint. Implementations
+  /// must protect against the file size potentially growing after the status
+  /// was taken (i.e., they cannot assume that an mmap will be null-terminated
+  /// where \p Status implies).
+  ///
+  /// Returns the \a CASID and the size of the file.
+  Expected<ObjectRef>
+  storeFromOpenFile(sys::fs::file_t FD,
+                    std::optional<sys::fs::file_status> Status = std::nullopt) {
+    return storeFromOpenFileImpl(FD, Status);
+  }
+
+  static Error createUnknownObjectError(const CASID &ID);
+
+  /// Create ObjectProxy from CASID. If the object doesn't exist, get an error.
+  Expected<ObjectProxy> getProxy(const CASID &ID);
+  /// Create ObjectProxy from ObjectRef. If the object can't be loaded, get an
+  /// error.
+  Expected<ObjectProxy> getProxy(ObjectRef Ref);
+
+  /// \returns \c std::nullopt if the object is missing from the CAS.
+  Expected<std::optional<ObjectProxy>> getProxyIfExists(ObjectRef Ref);
+
+  /// Read the data from \p Data into \p OS.
+  uint64_t readData(ObjectHandle Node, raw_ostream &OS, uint64_t Offset = 0,
+                    uint64_t MaxBytes = -1ULL) const {
+    ArrayRef<char> Data = getData(Node);
+    assert(Offset < Data.size() && "Expected valid offset");
+    Data = Data.drop_front(Offset).take_front(MaxBytes);
+    OS << toStringRef(Data);
+    return Data.size();
+  }
+
+  /// Validate the whole node tree.
+  Error validateTree(ObjectRef Ref);
+
+  /// Print the ObjectStore internals for debugging purpose.
+  virtual void print(raw_ostream &) const {}
+  void dump() const;
+
+  /// Get CASContext
+  const CASContext &getContext() const { return Context; }
+
+  virtual ~ObjectStore() = default;
+
+protected:
+  ObjectStore(const CASContext &Context) : Context(Context) {}
+
+private:
+  const CASContext &Context;
+};
+
+/// Reference to an abstract hierarchical node, with data and references.
+/// Reference is passed by value and is expected to be valid as long as the \a
+/// ObjectStore is.
+class ObjectProxy {
+public:
+  const ObjectStore &getCAS() const { return *CAS; }
+  ObjectStore &getCAS() { return *CAS; }
+  CASID getID() const { return CAS->getID(Ref); }
+  ObjectRef getRef() const { return Ref; }
+  size_t getNumReferences() const { return CAS->getNumRefs(H); }
+  ObjectRef getReference(size_t I) const { return CAS->readRef(H, I); }
+
+  operator CASID() const { return getID(); }
+  CASID getReferenceID(size_t I) const {
+    std::optional<CASID> ID = getCAS().getID(getReference(I));
+    assert(ID && "Expected reference to be first-class object");
+    return *ID;
+  }
+
+  /// Visit each reference in order, returning an error from \p Callback to
+  /// stop early.
+  Error forEachReference(function_ref<Error(ObjectRef)> Callback) const {
+    return CAS->forEachRef(H, Callback);
+  }
+
+  std::unique_ptr<MemoryBuffer>
+  getMemoryBuffer(StringRef Name = "",
+                  bool RequiresNullTerminator = true) const;
+
+  /// Get the content of the node. Valid as long as the CAS is valid.
+  StringRef getData() const { return CAS->getDataString(H); }
+
+  friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref) {
+    return Proxy.getRef() == Ref;
+  }
+  friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy) {
+    return Proxy.getRef() == Ref;
+  }
+  friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref) {
+    return !(Proxy.getRef() == Ref);
+  }
+  friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy) {
+    return !(Proxy.getRef() == Ref);
+  }
+
+public:
+  ObjectProxy() = delete;
+
+  static ObjectProxy load(ObjectStore &CAS, ObjectRef Ref, ObjectHandle Node) {
+    return ObjectProxy(CAS, Ref, Node);
+  }
+
+private:
+  ObjectProxy(ObjectStore &CAS, ObjectRef Ref, ObjectHandle H)
+      : CAS(&CAS), Ref(Ref), H(H) {}
+
+  ObjectStore *CAS;
+  ObjectRef Ref;
+  ObjectHandle H;
+};
+
+std::unique_ptr<ObjectStore> createInMemoryCAS();
+
+} // namespace cas
+} // namespace llvm
+
+#endif // LLVM_CAS_OBJECTSTORE_H
diff --git a/llvm/include/module.modulemap b/llvm/include/module.modulemap
index b00da6d7cd28c7..d44d395fa8ef46 100644
--- a/llvm/include/module.modulemap
+++ b/llvm/include/module.modulemap
@@ -105,6 +105,12 @@ module LLVM_BinaryFormat {
     textual header "llvm/BinaryFormat/MsgPack.def"
 }
 
+module LLVM_CAS {
+  requires cplusplus
+  umbrella "llvm/CAS"
+  module * { export * }
+}
+
 module LLVM_Config {
   requires cplusplus
   umbrella "llvm/Config"
diff --git a/llvm/lib/CAS/BuiltinCAS.cpp b/llvm/lib/CAS/BuiltinCAS.cpp
new file mode 100644
index 00000000000000..73646ad2c3528e
--- /dev/null
+++ b/llvm/lib/CAS/BuiltinCAS.cpp
@@ -0,0 +1,94 @@
+//===- BuiltinCAS.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BuiltinCAS.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/CAS/BuiltinObjectHasher.h"
+#include "llvm/Support/Process.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+using namespace llvm::cas::builtin;
+
+static StringRef getCASIDPrefix() { return "llvmcas://"; }
+void BuiltinCASContext::anchor() {}
+
+Expected<HashType> BuiltinCASContext::parseID(StringRef Reference) {
+  if (!Reference.consume_front(getCASIDPrefix()))
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "invalid cas-id '" + Reference + "'");
+
+  // FIXME: Allow shortened references?
+  if (Reference.size() != 2 * sizeof(HashType))
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "wrong size for cas-id hash '" + Reference + "'");
+
+  std::string Binary;
+  if (!tryGetFromHex(Reference, Binary))
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "invalid hash in cas-id '" + Reference + "'");
+
+  assert(Binary.size() == sizeof(HashType));
+  HashType Digest;
+  llvm::copy(Binary, Digest.data());
+  return Digest;
+}
+
+Expected<CASID> BuiltinCAS::parseID(StringRef Reference) {
+  Expected<HashType> Digest = BuiltinCASContext::parseID(Reference);
+  if (!Digest)
+    return Digest.takeError();
+
+  return CASID::create(&getContext(), toStringRef(*Digest));
+}
+
+void BuiltinCASContext::printID(ArrayRef<uint8_t> Digest, raw_ostream &OS) {
+  SmallString<64> Hash;
+  toHex(Digest, /*LowerCase=*/true, Hash);
+  OS << getCASIDPrefix() << Hash;
+}
+
+void BuiltinCASContext::printIDImpl(raw_ostream &OS, const CASID &ID) const {
+  BuiltinCASContext::printID(ID.getHash(), OS);
+}
+
+const BuiltinCASContext &BuiltinCASContext::getDefaultContext() {
+  static BuiltinCASContext DefaultContext;
+  return DefaultContext;
+}
+
+Expected<ObjectRef> BuiltinCAS::store(ArrayRef<ObjectRef> Refs,
+                                      ArrayRef<char> Data) {
+  return storeImpl(BuiltinObjectHasher<HasherT>::hashObject(*this, Refs, Data),
+                   Refs, Data);
+}
+
+Error BuiltinCAS::validate(const CASID &ID) {
+  auto Ref = getReference(ID);
+  if (!Ref)
+    return createUnknownObjectError(ID);
+
+  auto Handle = load(*Ref);
+  if (!Handle)
+    return Handle.takeError();
+
+  auto Proxy = ObjectProxy::load(*this, *Ref, *Handle);
+  SmallVector<ObjectRef> Refs;
+  if (auto E = Proxy.forEachReference([&](ObjectRef Ref) -> Error {
+        Refs.push_back(Ref);
+        return Error::success();
+      }))
+    return E;
+
+  ArrayRef<char> Data(Proxy.getData().data(), Proxy.getData().size());
+  auto Hash = BuiltinObjectHasher<HasherT>::hashObject(*this, Refs, Data);
+  if (!ID.getHash().equals(Hash))
+    return createCorruptObjectError(ID);
+
+  return Error::success();
+}
diff --git a/llvm/lib/CAS/BuiltinCAS.h b/llvm/lib/CAS/BuiltinCAS.h
new file mode 100644
index 00000000000000..1a4f640e4e2da8
--- /dev/null
+++ b/llvm/lib/CAS/BuiltinCAS.h
@@ -0,0 +1,74 @@
+//===- BuiltinCAS.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_CAS_BUILTINCAS_H
+#define LLVM_LIB_CAS_BUILTINCAS_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CAS/BuiltinCASContext.h"
+#include "llvm/CAS/ObjectStore.h"
+
+namespace llvm::cas {
+class ActionCache;
+namespace builtin {
+
+class BuiltinCAS : public ObjectStore {
+public:
+  BuiltinCAS() : ObjectStore(BuiltinCASContext::getDefaultContext()) {}
+
+  Expected<CASID> parseID(StringRef Reference) final;
+
+  Expected<ObjectRef> store(ArrayRef<ObjectRef> Refs,
+                            ArrayRef<char> Data) final;
+  virtual Expected<ObjectRef> storeImpl(ArrayRef<uint8_t> ComputedHash,
+                                        ArrayRef<ObjectRef> Refs,
+                                        ArrayRef<char> Data) = 0;
+
+  virtual Expected<ObjectRef>
+  storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+                                sys::fs::mapped_file_region Map) {
+    return storeImpl(ComputedHash, std::nullopt,
+                     ArrayRef(Map.data(), Map.size()));
+  }
+
+  /// Both builtin CAS implementations provide lifetime for free, so this can
+  /// be const, and readData() and getDataSize() can be implemented on top of
+  /// it.
+  virtual ArrayRef<char> getDataConst(ObjectHandle Node) const = 0;
+
+  ArrayRef<char> getData(ObjectHandle Node,
+                         bool RequiresNullTerminator) const final {
+    // BuiltinCAS Objects are always null terminated.
+    return getDataConst(Node);
+  }
+  uint64_t getDataSize(ObjectHandle Node) const final {
+    return getDataConst(Node).size();
+  }
+
+  Error createUnknownObjectError(const CASID &ID) const {
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "unknown object '" + ID.toString() + "'");
+  }
+
+  Error createCorruptObjectError(const CASID &ID) const {
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "corrupt object '" + ID.toString() + "'");
+  }
+
+  Error createCorruptStorageError() const {
+    return createStringError(std::make_error_code(std::errc::invalid_argument),
+                             "corrupt storage");
+  }
+
+  Error validate(const CASID &ID) final;
+};
+
+} // end namespace builtin
+} // end namespace llvm::cas
+
+#endif // LLVM_LIB_CAS_BUILTINCAS_H
diff --git a/llvm/lib/CAS/CMakeLists.txt b/llvm/lib/CAS/CMakeLists.txt
new file mode 100644
index 00000000000000..a486ab66ae4266
--- /dev/null
+++ b/llvm/lib/CAS/CMakeLists.txt
@@ -0,0 +1,8 @@
+add_llvm_component_library(LLVMCAS
+  BuiltinCAS.cpp
+  InMemoryCAS.cpp
+  ObjectStore.cpp
+
+  ADDITIONAL_HEADER_DIRS
+  ${LLVM_MAIN_INCLUDE_DIR}/llvm/CAS
+)
diff --git a/llvm/lib/CAS/InMemoryCAS.cpp b/llvm/lib/CAS/InMemoryCAS.cpp
new file mode 100644
index 00000000000000..abdd7ed3ef8051
--- /dev/null
+++ b/llvm/lib/CAS/InMemoryCAS.cpp
@@ -0,0 +1,320 @@
+//===- InMemoryCAS.cpp ------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BuiltinCAS.h"
+#include "llvm/ADT/LazyAtomicPointer.h"
+#include "llvm/ADT/PointerIntPair.h"
+#include "llvm/ADT/TrieRawHashMap.h"
+#include "llvm/Support/Allocator.h"
+#include "llvm/Support/Casting.h"
+#include "llvm/Support/ThreadSafeAllocator.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+using namespace llvm::cas::builtin;
+
+namespace {
+
+class InMemoryObject;
+
+/// Index of referenced IDs (map: Hash -> InMemoryObject*). Uses
+/// LazyAtomicPointer to coordinate creation of objects.
+using InMemoryIndexT =
+    ThreadSafeTrieRawHashMap<LazyAtomicPointer<const InMemoryObject>,
+                             sizeof(HashType)>;
+
+/// Values in \a InMemoryIndexT. \a InMemoryObject's point at this to access
+/// their hash.
+using InMemoryIndexValueT = InMemoryIndexT::value_type;
+
+class InMemoryObject {
+public:
+  enum class Kind {
+    /// Node with refs and data.
+    RefNode,
+
+    /// Node with refs and data co-allocated.
+    InlineNode,
+
+    Max = InlineNode,
+  };
+
+  Kind getKind() const { return IndexAndKind.getInt(); }
+  const InMemoryIndexValueT &getIndex() const {
+    assert(IndexAndKind.getPointer());
+    return *IndexAndKind.getPointer();
+  }
+
+  ArrayRef<uint8_t> getHash() const { return getIndex().Hash; }
+
+  InMemoryObject() = delete;
+  InMemoryObject(InMemoryObject &&) = delete;
+  InMemoryObject(const InMemoryObject &) = delete;
+
+protected:
+  InMemoryObject(Kind K, const InMemoryIndexValueT &I) : IndexAndKind(&I, K) {}
+
+private:
+  enum Counts : int {
+    NumKindBits = 2,
+  };
+  PointerIntPair<const InMemoryIndexValueT *, NumKindBits, Kind> IndexAndKind;
+  static_assert((1U << NumKindBits) <= alignof(InMemoryIndexValueT),
+                "Kind will clobber pointer");
+  static_assert(((int)Kind::Max >> NumKindBits) == 0, "Kind will be truncated");
+
+public:
+  inline ArrayRef<char> getData() const;
+
+  inline ArrayRef<const InMemoryObject *> getRefs() const;
+};
+
+class InMemoryRefObject : public InMemoryObject {
+public:
+  static constexpr Kind KindValue = Kind::RefNode;
+  static bool classof(const InMemoryObject *O) {
+    return O->getKind() == KindValue;
+  }
+
+  ArrayRef<const InMemoryObject *> getRefsImpl() const { return Refs; }
+  ArrayRef<const InMemoryObject *> getRefs() const { return Refs; }
+  ArrayRef<char> getDataImpl() const { return Data; }
+  ArrayRef<char> getData() const { return Data; }
+
+  static InMemoryRefObject &create(function_ref<void *(size_t Size)> Allocate,
+                                   const InMemoryIndexValueT &I,
+                                   ArrayRef<const InMemoryObject *> Refs,
+                                   ArrayRef<char> Data) {
+    void *Mem = Allocate(sizeof(InMemoryRefObject));
+    return *new (Mem) InMemoryRefObject(I, Refs, Data);
+  }
+
+private:
+  InMemoryRefObject(const InMemoryIndexValueT &I,
+                    ArrayRef<const InMemoryObject *> Refs, ArrayRef<char> Data)
+      : InMemoryObject(KindValue, I), Refs(Refs), Data(Data) {
+    assert(isAddrAligned(Align(8), this) && "Expected 8-byte alignment");
+    assert(isAddrAligned(Align(8), Data.data()) && "Expected 8-byte alignment");
+    assert(*Data.end() == 0 && "Expected null-termination");
+  }
+
+  ArrayRef<const InMemoryObject *> Refs;
+  ArrayRef<char> Data;
+};
+
+class InMemoryInlineObject : public InMemoryObject {
+public:
+  static constexpr Kind KindValue = Kind::InlineNode;
+  static bool classof(const InMemoryObject *O) {
+    return O->getKind() == KindValue;
+  }
+
+  ArrayRef<const InMemoryObject *> getRefs() const { return getRefsImpl(); }
+  ArrayRef<const InMemoryObject *> getRefsImpl() const {
+    return ArrayRef(reinterpret_cast<const InMemoryObject *const *>(this + 1),
+                    NumRefs);
+  }
+
+  ArrayRef<char> getData() const { return getDataImpl(); }
+  ArrayRef<char> getDataImpl() const {
+    ArrayRef<const InMemoryObject *> Refs = getRefs();
+    return ArrayRef(reinterpret_cast<const char *>(Refs.data() + Refs.size()),
+                    DataSize);
+  }
+
+  static InMemoryInlineObject &
+  create(function_ref<void *(size_t Size)> Allocate,
+         const InMemoryIndexValueT &I, ArrayRef<const InMemoryObject *> Refs,
+         ArrayRef<char> Data) {
+    void *Mem = Allocate(sizeof(InMemoryInlineObject) +
+                         sizeof(uintptr_t) * Refs.size() + Data.size() + 1);
+    return *new (Mem) InMemoryInlineObject(I, Refs, Data);
+  }
+
+private:
+  InMemoryInlineObject(const InMemoryIndexValueT &I,
+                       ArrayRef<const InMemoryObject *> Refs,
+                       ArrayRef<char> Data)
+      : InMemoryObject(KindValue, I), NumRefs(Refs.size()),
+        DataSize(Data.size()) {
+    auto *BeginRefs = reinterpret_cast<const InMemoryObject **>(this + 1);
+    llvm::copy(Refs, BeginRefs);
+    auto *BeginData = reinterpret_cast<char *>(BeginRefs + NumRefs);
+    llvm::copy(Data, BeginData);
+    BeginData[Data.size()] = 0;
+  }
+  uint32_t NumRefs;
+  uint32_t DataSize;
+};
+
+/// In-memory CAS database and action cache (the latter should be separated).
+class InMemoryCAS : public BuiltinCAS {
+public:
+  Expected<ObjectRef> storeImpl(ArrayRef<uint8_t> ComputedHash,
+                                ArrayRef<ObjectRef> Refs,
+                                ArrayRef<char> Data) final;
+
+  Expected<ObjectRef>
+  storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+                                sys::fs::mapped_file_region Map) override;
+
+  CASID getID(const InMemoryIndexValueT &I) const {
+    StringRef Hash = toStringRef(I.Hash);
+    return CASID::create(&getContext(), Hash);
+  }
+  CASID getID(const InMemoryObject &O) const { return getID(O.getIndex()); }
+
+  ObjectHandle getObjectHandle(const InMemoryObject &Node) const {
+    assert(!(reinterpret_cast<uintptr_t>(&Node) & 0x1ULL));
+    return makeObjectHandle(reinterpret_cast<uintptr_t>(&Node));
+  }
+
+  Expected<std::optional<ObjectHandle>> loadIfExists(ObjectRef Ref) override {
+    return getObjectHandle(asInMemoryObject(Ref));
+  }
+
+  InMemoryIndexValueT &indexHash(ArrayRef<uint8_t> Hash) {
+    return *Index.insertLazy(
+        Hash, [](auto ValueConstructor) { ValueConstructor.emplace(nullptr); });
+  }
+
+  /// TODO: Consider callers to actually do an insert and to return a handle to
+  /// the slot in the trie.
+  const InMemoryObject *getInMemoryObject(CASID ID) const {
+    assert(ID.getContext().getHashSchemaIdentifier() ==
+               getContext().getHashSchemaIdentifier() &&
+           "Expected ID from same hash schema");
+    if (InMemoryIndexT::const_pointer P = Index.find(ID.getHash()))
+      return P->Data;
+    return nullptr;
+  }
+
+  const InMemoryObject &getInMemoryObject(ObjectHandle OH) const {
+    return *reinterpret_cast<const InMemoryObject *>(
+        (uintptr_t)OH.getInternalRef(*this));
+  }
+
+  const InMemoryObject &asInMemoryObject(ReferenceBase Ref) const {
+    uintptr_t P = Ref.getInternalRef(*this);
+    return *reinterpret_cast<const InMemoryObject *>(P);
+  }
+  ObjectRef toReference(const InMemoryObject &O) const {
+    return makeObjectRef(reinterpret_cast<uintptr_t>(&O));
+  }
+
+  CASID getID(ObjectRef Ref) const final { return getIDImpl(Ref); }
+  CASID getIDImpl(ReferenceBase Ref) const {
+    return getID(asInMemoryObject(Ref));
+  }
+
+  std::optional<ObjectRef> getReference(const CASID &ID) const final {
+    if (const InMemoryObject *Object = getInMemoryObject(ID))
+      return toReference(*Object);
+    return std::nullopt;
+  }
+
+  Expected<bool> isMaterialized(ObjectRef Ref) const final { return true; }
+
+  ArrayRef<char> getDataConst(ObjectHandle Node) const final {
+    return cast<InMemoryObject>(asInMemoryObject(Node)).getData();
+  }
+
+  InMemoryCAS() = default;
+
+private:
+  size_t getNumRefs(ObjectHandle Node) const final {
+    return getInMemoryObject(Node).getRefs().size();
+  }
+  ObjectRef readRef(ObjectHandle Node, size_t I) const final {
+    return toReference(*getInMemoryObject(Node).getRefs()[I]);
+  }
+  Error forEachRef(ObjectHandle Node,
+                   function_ref<Error(ObjectRef)> Callback) const final;
+
+  /// Index of referenced IDs (map: Hash -> InMemoryObject*). Mapped to nullptr
+  /// as a convenient way to store hashes.
+  ///
+  /// - Insert nullptr on lookups.
+  /// - InMemoryObject points back to here.
+  InMemoryIndexT Index;
+
+  ThreadSafeAllocator<BumpPtrAllocator> Objects;
+  ThreadSafeAllocator<SpecificBumpPtrAllocator<sys::fs::mapped_file_region>>
+      MemoryMaps;
+};
+
+} // end anonymous namespace
+
+ArrayRef<char> InMemoryObject::getData() const {
+  if (auto *Derived = dyn_cast<InMemoryRefObject>(this))
+    return Derived->getDataImpl();
+  return cast<InMemoryInlineObject>(this)->getDataImpl();
+}
+
+ArrayRef<const InMemoryObject *> InMemoryObject::getRefs() const {
+  if (auto *Derived = dyn_cast<InMemoryRefObject>(this))
+    return Derived->getRefsImpl();
+  return cast<InMemoryInlineObject>(this)->getRefsImpl();
+}
+
+Expected<ObjectRef>
+InMemoryCAS::storeFromNullTerminatedRegion(ArrayRef<uint8_t> ComputedHash,
+                                           sys::fs::mapped_file_region Map) {
+  // Look up the hash in the index, initializing to nullptr if it's new.
+  ArrayRef<char> Data(Map.data(), Map.size());
+  auto &I = indexHash(ComputedHash);
+
+  // Load or generate.
+  auto Allocator = [&](size_t Size) -> void * {
+    return Objects.Allocate(Size, alignof(InMemoryObject));
+  };
+  auto Generator = [&]() -> const InMemoryObject * {
+    return &InMemoryRefObject::create(Allocator, I, std::nullopt, Data);
+  };
+  const InMemoryObject &Node =
+      cast<InMemoryObject>(I.Data.loadOrGenerate(Generator));
+
+  // Save Map if the winning node uses it.
+  if (auto *RefNode = dyn_cast<InMemoryRefObject>(&Node))
+    if (RefNode->getData().data() == Map.data())
+      new (MemoryMaps.Allocate(1)) sys::fs::mapped_file_region(std::move(Map));
+
+  return toReference(Node);
+}
+
+Expected<ObjectRef> InMemoryCAS::storeImpl(ArrayRef<uint8_t> ComputedHash,
+                                           ArrayRef<ObjectRef> Refs,
+                                           ArrayRef<char> Data) {
+  // Look up the hash in the index, initializing to nullptr if it's new.
+  auto &I = indexHash(ComputedHash);
+
+  // Create the node.
+  SmallVector<const InMemoryObject *> InternalRefs;
+  for (ObjectRef Ref : Refs)
+    InternalRefs.push_back(&asInMemoryObject(Ref));
+  auto Allocator = [&](size_t Size) -> void * {
+    return Objects.Allocate(Size, alignof(InMemoryObject));
+  };
+  auto Generator = [&]() -> const InMemoryObject * {
+    return &InMemoryInlineObject::create(Allocator, I, InternalRefs, Data);
+  };
+  return toReference(cast<InMemoryObject>(I.Data.loadOrGenerate(Generator)));
+}
+
+Error InMemoryCAS::forEachRef(ObjectHandle Handle,
+                              function_ref<Error(ObjectRef)> Callback) const {
+  auto &Node = getInMemoryObject(Handle);
+  for (const InMemoryObject *Ref : Node.getRefs())
+    if (Error E = Callback(toReference(*Ref)))
+      return E;
+  return Error::success();
+}
+
+std::unique_ptr<ObjectStore> cas::createInMemoryCAS() {
+  return std::make_unique<InMemoryCAS>();
+}
diff --git a/llvm/lib/CAS/ObjectStore.cpp b/llvm/lib/CAS/ObjectStore.cpp
new file mode 100644
index 00000000000000..a938c4e215382e
--- /dev/null
+++ b/llvm/lib/CAS/ObjectStore.cpp
@@ -0,0 +1,168 @@
+//===- ObjectStore.cpp ------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+void CASContext::anchor() {}
+void ObjectStore::anchor() {}
+
+LLVM_DUMP_METHOD void CASID::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectStore::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectRef::dump() const { print(dbgs()); }
+LLVM_DUMP_METHOD void ObjectHandle::dump() const { print(dbgs()); }
+
+std::string CASID::toString() const {
+  std::string S;
+  raw_string_ostream(S) << *this;
+  return S;
+}
+
+static void printReferenceBase(raw_ostream &OS, StringRef Kind,
+                               uint64_t InternalRef, std::optional<CASID> ID) {
+  OS << Kind << "=" << InternalRef;
+  if (ID)
+    OS << "[" << *ID << "]";
+}
+
+void ReferenceBase::print(raw_ostream &OS, const ObjectHandle &This) const {
+  assert(this == &This);
+  printReferenceBase(OS, "object-handle", InternalRef, std::nullopt);
+}
+
+void ReferenceBase::print(raw_ostream &OS, const ObjectRef &This) const {
+  assert(this == &This);
+
+  std::optional<CASID> ID;
+#if LLVM_ENABLE_ABI_BREAKING_CHECKS
+  if (CAS)
+    ID = CAS->getID(This);
+#endif
+  printReferenceBase(OS, "object-ref", InternalRef, ID);
+}
+
+Expected<ObjectHandle> ObjectStore::load(ObjectRef Ref) {
+  std::optional<ObjectHandle> Handle;
+  if (Error E = loadIfExists(Ref).moveInto(Handle))
+    return std::move(E);
+  if (!Handle)
+    return createStringError(errc::invalid_argument,
+                             "missing object '" + getID(Ref).toString() + "'");
+  return *Handle;
+}
+
+std::unique_ptr<MemoryBuffer>
+ObjectStore::getMemoryBuffer(ObjectHandle Node, StringRef Name,
+                             bool RequiresNullTerminator) {
+  return MemoryBuffer::getMemBuffer(
+      toStringRef(getData(Node, RequiresNullTerminator)), Name,
+      RequiresNullTerminator);
+}
+
+void ObjectStore::readRefs(ObjectHandle Node,
+                           SmallVectorImpl<ObjectRef> &Refs) const {
+  consumeError(forEachRef(Node, [&Refs](ObjectRef Ref) -> Error {
+    Refs.push_back(Ref);
+    return Error::success();
+  }));
+}
+
+Expected<ObjectProxy> ObjectStore::getProxy(const CASID &ID) {
+  std::optional<ObjectRef> Ref = getReference(ID);
+  if (!Ref)
+    return createUnknownObjectError(ID);
+
+  return getProxy(*Ref);
+}
+
+Expected<ObjectProxy> ObjectStore::getProxy(ObjectRef Ref) {
+  std::optional<ObjectHandle> H;
+  if (Error E = load(Ref).moveInto(H))
+    return std::move(E);
+
+  return ObjectProxy::load(*this, Ref, *H);
+}
+
+Expected<std::optional<ObjectProxy>>
+ObjectStore::getProxyIfExists(ObjectRef Ref) {
+  std::optional<ObjectHandle> H;
+  if (Error E = loadIfExists(Ref).moveInto(H))
+    return std::move(E);
+  if (!H)
+    return std::nullopt;
+  return ObjectProxy::load(*this, Ref, *H);
+}
+
+Error ObjectStore::createUnknownObjectError(const CASID &ID) {
+  return createStringError(std::make_error_code(std::errc::invalid_argument),
+                           "unknown object '" + ID.toString() + "'");
+}
+
+Expected<ObjectProxy> ObjectStore::createProxy(ArrayRef<ObjectRef> Refs,
+                                               StringRef Data) {
+  Expected<ObjectRef> Ref = store(Refs, arrayRefFromStringRef<char>(Data));
+  if (!Ref)
+    return Ref.takeError();
+  return getProxy(*Ref);
+}
+
+Expected<ObjectRef>
+ObjectStore::storeFromOpenFileImpl(sys::fs::file_t FD,
+                                   std::optional<sys::fs::file_status> Status) {
+  // Copy the file into an immutable memory buffer and call \c store on that.
+  // Using \c mmap would be unsafe because there's a race window between when we
+  // get the digest hash for the \c mmap contents and when we store the data; if
+  // the file changes in-between we will create an invalid object.
+
+  // FIXME: For the on-disk CAS implementation use cloning to store it as a
+  // standalone file if the file-system supports it and the file is large.
+
+  constexpr size_t ChunkSize = 4 * 4096;
+  SmallString<0> Data;
+  Data.reserve(ChunkSize * 2);
+  if (Error E = sys::fs::readNativeFileToEOF(FD, Data, ChunkSize))
+    return std::move(E);
+  return store(std::nullopt, ArrayRef(Data.data(), Data.size()));
+}
+
+Error ObjectStore::validateTree(ObjectRef Root) {
+  SmallDenseSet<ObjectRef> ValidatedRefs;
+  SmallVector<ObjectRef, 16> RefsToValidate;
+  RefsToValidate.push_back(Root);
+
+  while (!RefsToValidate.empty()) {
+    ObjectRef Ref = RefsToValidate.pop_back_val();
+    auto [I, Inserted] = ValidatedRefs.insert(Ref);
+    if (!Inserted)
+      continue; // already validated.
+    if (Error E = validate(getID(Ref)))
+      return E;
+    Expected<ObjectHandle> Obj = load(Ref);
+    if (!Obj)
+      return Obj.takeError();
+    if (Error E = forEachRef(*Obj, [&RefsToValidate](ObjectRef R) -> Error {
+          RefsToValidate.push_back(R);
+          return Error::success();
+        }))
+      return E;
+  }
+  return Error::success();
+}
+
+std::unique_ptr<MemoryBuffer>
+ObjectProxy::getMemoryBuffer(StringRef Name,
+                             bool RequiresNullTerminator) const {
+  return CAS->getMemoryBuffer(H, Name, RequiresNullTerminator);
+}
diff --git a/llvm/lib/CMakeLists.txt b/llvm/lib/CMakeLists.txt
index 503c77cb13bd07..b06f4ffd83ff5a 100644
--- a/llvm/lib/CMakeLists.txt
+++ b/llvm/lib/CMakeLists.txt
@@ -9,6 +9,7 @@ add_subdirectory(FileCheck)
 add_subdirectory(InterfaceStub)
 add_subdirectory(IRPrinter)
 add_subdirectory(IRReader)
+add_subdirectory(CAS)
 add_subdirectory(CGData)
 add_subdirectory(CodeGen)
 add_subdirectory(CodeGenTypes)
diff --git a/llvm/unittests/CAS/CASTestConfig.cpp b/llvm/unittests/CAS/CASTestConfig.cpp
new file mode 100644
index 00000000000000..bb06ee5573134f
--- /dev/null
+++ b/llvm/unittests/CAS/CASTestConfig.cpp
@@ -0,0 +1,22 @@
+//===- CASTestConfig.cpp --------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "CASTestConfig.h"
+#include "llvm/CAS/ObjectStore.h"
+#include "gtest/gtest.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+CASTestingEnv createInMemory(int I) {
+  std::unique_ptr<ObjectStore> CAS = createInMemoryCAS();
+  return CASTestingEnv{std::move(CAS)};
+}
+
+INSTANTIATE_TEST_SUITE_P(InMemoryCAS, CASTest,
+                         ::testing::Values(createInMemory));
diff --git a/llvm/unittests/CAS/CASTestConfig.h b/llvm/unittests/CAS/CASTestConfig.h
new file mode 100644
index 00000000000000..d9f9e52033c2da
--- /dev/null
+++ b/llvm/unittests/CAS/CASTestConfig.h
@@ -0,0 +1,32 @@
+//===- CASTestConfig.h ----------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "gtest/gtest.h"
+
+#ifndef LLVM_UNITTESTS_CASTESTCONFIG_H
+#define LLVM_UNITTESTS_CASTESTCONFIG_H
+
+struct CASTestingEnv {
+  std::unique_ptr<llvm::cas::ObjectStore> CAS;
+};
+
+class CASTest
+    : public testing::TestWithParam<std::function<CASTestingEnv(int)>> {
+protected:
+  std::optional<int> NextCASIndex;
+
+  std::unique_ptr<llvm::cas::ObjectStore> createObjectStore() {
+    auto TD = GetParam()(++(*NextCASIndex));
+    return std::move(TD.CAS);
+  }
+  void SetUp() { NextCASIndex = 0; }
+  void TearDown() { NextCASIndex = std::nullopt; }
+};
+
+#endif
diff --git a/llvm/unittests/CAS/CMakeLists.txt b/llvm/unittests/CAS/CMakeLists.txt
new file mode 100644
index 00000000000000..39a2100c4909ee
--- /dev/null
+++ b/llvm/unittests/CAS/CMakeLists.txt
@@ -0,0 +1,12 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  CAS
+  TestingSupport
+  )
+
+add_llvm_unittest(CASTests
+  CASTestConfig.cpp
+  ObjectStoreTest.cpp
+  )
+
+target_link_libraries(CASTests PRIVATE LLVMTestingSupport)
diff --git a/llvm/unittests/CAS/ObjectStoreTest.cpp b/llvm/unittests/CAS/ObjectStoreTest.cpp
new file mode 100644
index 00000000000000..0d94731330b1d3
--- /dev/null
+++ b/llvm/unittests/CAS/ObjectStoreTest.cpp
@@ -0,0 +1,360 @@
+//===- ObjectStoreTest.cpp ------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CAS/ObjectStore.h"
+#include "llvm/Support/Process.h"
+#include "llvm/Support/ThreadPool.h"
+#include "llvm/Testing/Support/Error.h"
+#include "gtest/gtest.h"
+
+#include "CASTestConfig.h"
+
+using namespace llvm;
+using namespace llvm::cas;
+
+TEST_P(CASTest, PrintIDs) {
+  std::unique_ptr<ObjectStore> CAS = createObjectStore();
+
+  std::optional<CASID> ID1, ID2;
+  ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, "1").moveInto(ID1),
+                    Succeeded());
+  ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, "2").moveInto(ID2),
+                    Succeeded());
+  EXPECT_NE(ID1, ID2);
+  std::string PrintedID1 = ID1->toString();
+  std::string PrintedID2 = ID2->toString();
+  EXPECT_NE(PrintedID1, PrintedID2);
+
+  std::optional<CASID> ParsedID1, ParsedID2;
+  ASSERT_THAT_ERROR(CAS->parseID(PrintedID1).moveInto(ParsedID1), Succeeded());
+  ASSERT_THAT_ERROR(CAS->parseID(PrintedID2).moveInto(ParsedID2), Succeeded());
+  EXPECT_EQ(ID1, ParsedID1);
+  EXPECT_EQ(ID2, ParsedID2);
+}
+
+TEST_P(CASTest, Blobs) {
+  std::unique_ptr<ObjectStore> CAS1 = createObjectStore();
+  StringRef ContentStrings[] = {
+      "word",
+      "some longer text std::string's local memory",
+      R"(multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text)",
+  };
+
+  SmallVector<CASID> IDs;
+  for (StringRef Content : ContentStrings) {
+    // Use StringRef::str() to create a temporary std::string. This could cause
+    // problems if the CAS is storing references to the input string instead of
+    // copying it.
+    std::optional<ObjectProxy> Blob;
+    ASSERT_THAT_ERROR(CAS1->createProxy(std::nullopt, Content).moveInto(Blob),
+                      Succeeded());
+    IDs.push_back(Blob->getID());
+
+    // Check basic printing of IDs.
+    EXPECT_EQ(IDs.back().toString(), IDs.back().toString());
+    if (IDs.size() > 2)
+      EXPECT_NE(IDs.front().toString(), IDs.back().toString());
+  }
+
+  // Check that the blobs give the same IDs later.
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    std::optional<ObjectProxy> Blob;
+    ASSERT_THAT_ERROR(
+        CAS1->createProxy(std::nullopt, ContentStrings[I]).moveInto(Blob),
+        Succeeded());
+    EXPECT_EQ(IDs[I], Blob->getID());
+  }
+
+  // Run validation on all CASIDs.
+  for (int I = 0, E = IDs.size(); I != E; ++I)
+    ASSERT_THAT_ERROR(CAS1->validate(IDs[I]), Succeeded());
+
+  // Check that the blobs can be retrieved multiple times.
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    for (int J = 0, JE = 3; J != JE; ++J) {
+      std::optional<ObjectProxy> Buffer;
+      ASSERT_THAT_ERROR(CAS1->getProxy(IDs[I]).moveInto(Buffer), Succeeded());
+      EXPECT_EQ(ContentStrings[I], Buffer->getData());
+    }
+  }
+
+  // Confirm these blobs don't exist in a fresh CAS instance.
+  std::unique_ptr<ObjectStore> CAS2 = createObjectStore();
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    std::optional<ObjectProxy> Proxy;
+    EXPECT_THAT_ERROR(CAS2->getProxy(IDs[I]).moveInto(Proxy), Failed());
+  }
+
+  // Insert into the second CAS and confirm the IDs are stable. Getting them
+  // should work now.
+  for (int I = IDs.size(), E = 0; I != E; --I) {
+    auto &ID = IDs[I - 1];
+    auto &Content = ContentStrings[I - 1];
+    std::optional<ObjectProxy> Blob;
+    ASSERT_THAT_ERROR(CAS2->createProxy(std::nullopt, Content).moveInto(Blob),
+                      Succeeded());
+    EXPECT_EQ(ID, Blob->getID());
+
+    std::optional<ObjectProxy> Buffer;
+    ASSERT_THAT_ERROR(CAS2->getProxy(ID).moveInto(Buffer), Succeeded());
+    EXPECT_EQ(Content, Buffer->getData());
+  }
+}
+
+TEST_P(CASTest, BlobsBig) {
+  // A little bit of validation that bigger blobs are okay. Climb up to 1MB.
+  std::unique_ptr<ObjectStore> CAS = createObjectStore();
+  SmallString<256> String1 = StringRef("a few words");
+  SmallString<256> String2 = StringRef("others");
+  while (String1.size() < 1024U * 1024U) {
+    std::optional<CASID> ID1;
+    std::optional<CASID> ID2;
+    ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String1).moveInto(ID1),
+                      Succeeded());
+    ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String1).moveInto(ID2),
+                      Succeeded());
+    ASSERT_THAT_ERROR(CAS->validate(*ID1), Succeeded());
+    ASSERT_THAT_ERROR(CAS->validate(*ID2), Succeeded());
+    ASSERT_EQ(ID1, ID2);
+
+    String1.append(String2);
+    ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String2).moveInto(ID1),
+                      Succeeded());
+    ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, String2).moveInto(ID2),
+                      Succeeded());
+    ASSERT_THAT_ERROR(CAS->validate(*ID1), Succeeded());
+    ASSERT_THAT_ERROR(CAS->validate(*ID2), Succeeded());
+    ASSERT_EQ(ID1, ID2);
+    String2.append(String1);
+  }
+
+  // Specifically check near 1MB for objects large enough they're likely to be
+  // stored externally in an on-disk CAS and will be near a page boundary.
+  SmallString<0> Storage;
+  const size_t InterestingSize = 1024U * 1024ULL;
+  const size_t SizeE = InterestingSize + 2;
+  if (Storage.size() < SizeE)
+    Storage.resize(SizeE, '\01');
+  for (size_t Size = InterestingSize - 2; Size != SizeE; ++Size) {
+    StringRef Data(Storage.data(), Size);
+    std::optional<ObjectProxy> Blob;
+    ASSERT_THAT_ERROR(CAS->createProxy(std::nullopt, Data).moveInto(Blob),
+                      Succeeded());
+    ASSERT_EQ(Data, Blob->getData());
+    ASSERT_EQ(0, Blob->getData().end()[0]);
+  }
+}
+
+TEST_P(CASTest, LeafNodes) {
+  std::unique_ptr<ObjectStore> CAS1 = createObjectStore();
+  StringRef ContentStrings[] = {
+      "word",
+      "some longer text std::string's local memory",
+      R"(multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text
+multiline text multiline text multiline text multiline text multiline text)",
+  };
+
+  SmallVector<ObjectRef> Nodes;
+  SmallVector<CASID> IDs;
+  for (StringRef Content : ContentStrings) {
+    // Use StringRef::str() to create a temporary std::string. This could cause
+    // problems if the CAS is storing references to the input string instead of
+    // copying it.
+    std::optional<ObjectRef> Node;
+    ASSERT_THAT_ERROR(
+        CAS1->store(std::nullopt, arrayRefFromStringRef<char>(Content))
+            .moveInto(Node),
+        Succeeded());
+    Nodes.push_back(*Node);
+
+    // Check basic printing of IDs.
+    IDs.push_back(CAS1->getID(*Node));
+    EXPECT_EQ(IDs.back().toString(), IDs.back().toString());
+    EXPECT_EQ(Nodes.front(), Nodes.front());
+    EXPECT_EQ(Nodes.back(), Nodes.back());
+    EXPECT_EQ(IDs.front(), IDs.front());
+    EXPECT_EQ(IDs.back(), IDs.back());
+    if (Nodes.size() <= 1)
+      continue;
+    EXPECT_NE(Nodes.front(), Nodes.back());
+    EXPECT_NE(IDs.front(), IDs.back());
+  }
+
+  // Check that the blobs give the same IDs later.
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    std::optional<ObjectRef> Node;
+    ASSERT_THAT_ERROR(CAS1->store(std::nullopt, arrayRefFromStringRef<char>(
+                                                    ContentStrings[I]))
+                          .moveInto(Node),
+                      Succeeded());
+    EXPECT_EQ(IDs[I], CAS1->getID(*Node));
+  }
+
+  // Check that the blobs can be retrieved multiple times.
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    for (int J = 0, JE = 3; J != JE; ++J) {
+      std::optional<ObjectProxy> Object;
+      ASSERT_THAT_ERROR(CAS1->getProxy(IDs[I]).moveInto(Object), Succeeded());
+      ASSERT_TRUE(Object);
+      EXPECT_EQ(ContentStrings[I], Object->getData());
+    }
+  }
+
+  // Confirm these blobs don't exist in a fresh CAS instance.
+  std::unique_ptr<ObjectStore> CAS2 = createObjectStore();
+  for (int I = 0, E = IDs.size(); I != E; ++I) {
+    std::optional<ObjectProxy> Object;
+    EXPECT_THAT_ERROR(CAS2->getProxy(IDs[I]).moveInto(Object), Failed());
+  }
+
+  // Insert into the second CAS and confirm the IDs are stable. Getting them
+  // should work now.
+  for (int I = IDs.size(), E = 0; I != E; --I) {
+    auto &ID = IDs[I - 1];
+    auto &Content = ContentStrings[I - 1];
+    std::optional<ObjectRef> Node;
+    ASSERT_THAT_ERROR(
+        CAS2->store(std::nullopt, arrayRefFromStringRef<char>(Content))
+            .moveInto(Node),
+        Succeeded());
+    EXPECT_EQ(ID, CAS2->getID(*Node));
+
+    std::optional<ObjectProxy> Object;
+    ASSERT_THAT_ERROR(CAS2->getProxy(ID).moveInto(Object), Succeeded());
+    ASSERT_TRUE(Object);
+    EXPECT_EQ(Content, Object->getData());
+  }
+}
+
+TEST_P(CASTest, NodesBig) {
+  std::unique_ptr<ObjectStore> CAS = createObjectStore();
+
+  // Specifically check near 1MB for objects large enough they're likely to be
+  // stored externally in an on-disk CAS, and such that one of them will be
+  // near a page boundary.
+  SmallString<0> Storage;
+  constexpr size_t InterestingSize = 1024U * 1024ULL;
+  constexpr size_t WordSize = sizeof(void *);
+
+  // Start much smaller to account for headers.
+  constexpr size_t SizeB = InterestingSize - 8 * WordSize;
+  constexpr size_t SizeE = InterestingSize + 1;
+  if (Storage.size() < SizeE)
+    Storage.resize(SizeE, '\01');
+
+  SmallVector<ObjectRef, 4> CreatedNodes;
+  // Avoid checking every size because this is an expensive test. Just check
+  // for data that is 8B-word-aligned, and one less. Also appending the created
+  // nodes as the references in the next block to check references are created
+  // correctly.
+  for (size_t Size = SizeB; Size < SizeE; Size += WordSize) {
+    for (bool IsAligned : {false, true}) {
+      StringRef Data(Storage.data(), Size - (IsAligned ? 0 : 1));
+      std::optional<ObjectProxy> Node;
+      ASSERT_THAT_ERROR(CAS->createProxy(CreatedNodes, Data).moveInto(Node),
+                        Succeeded());
+      ASSERT_EQ(Data, Node->getData());
+      ASSERT_EQ(0, Node->getData().end()[0]);
+      ASSERT_EQ(Node->getNumReferences(), CreatedNodes.size());
+      CreatedNodes.emplace_back(Node->getRef());
+    }
+  }
+
+  for (auto ID : CreatedNodes)
+    ASSERT_THAT_ERROR(CAS->validate(CAS->getID(ID)), Succeeded());
+}
+
+/// Common test functionality for creating blobs in parallel. You can vary which
+/// cas instances are the same or different, and the size of the created blobs.
+static void testBlobsParallel(ObjectStore &Read1, ObjectStore &Read2,
+                              ObjectStore &Write1, ObjectStore &Write2,
+                              uint64_t BlobSize) {
+  SCOPED_TRACE(testBlobsParallel);
+  unsigned BlobCount = 100;
+  std::vector<std::string> Blobs;
+  Blobs.reserve(BlobCount);
+  for (unsigned I = 0; I < BlobCount; ++I) {
+    std::string Blob;
+    Blob.reserve(BlobSize);
+    while (Blob.size() < BlobSize) {
+      auto R = sys::Process::GetRandomNumber();
+      Blob.append((char *)&R, sizeof(R));
+    }
+    assert(Blob.size() >= BlobSize);
+    Blob.resize(BlobSize);
+    Blobs.push_back(std::move(Blob));
+  }
+
+  std::mutex NodesMtx;
+  std::vector<std::optional<CASID>> CreatedNodes(BlobCount);
+
+  auto Producer = [&](unsigned I, ObjectStore *CAS) {
+    std::optional<ObjectProxy> Node;
+    EXPECT_THAT_ERROR(CAS->createProxy({}, Blobs[I]).moveInto(Node),
+                      Succeeded());
+    {
+      std::lock_guard<std::mutex> L(NodesMtx);
+      CreatedNodes[I] = Node ? Node->getID() : CASID::getDenseMapTombstoneKey();
+    }
+  };
+
+  auto Consumer = [&](unsigned I, ObjectStore *CAS) {
+    std::optional<CASID> ID;
+    while (!ID) {
+      // Busy wait.
+      std::lock_guard<std::mutex> L(NodesMtx);
+      ID = CreatedNodes[I];
+    }
+    if (ID == CASID::getDenseMapTombstoneKey())
+      // Producer failed; already reported.
+      return;
+
+    std::optional<ObjectProxy> Node;
+    ASSERT_THAT_ERROR(CAS->getProxy(*ID).moveInto(Node), Succeeded());
+    EXPECT_EQ(Node->getData(), Blobs[I]);
+  };
+
+  DefaultThreadPool Threads;
+  for (unsigned I = 0; I < BlobCount; ++I) {
+    Threads.async(Consumer, I, &Read1);
+    Threads.async(Consumer, I, &Read2);
+    Threads.async(Producer, I, &Write1);
+    Threads.async(Producer, I, &Write2);
+  }
+
+  Threads.wait();
+}
+
+static void testBlobsParallel1(ObjectStore &CAS, uint64_t BlobSize) {
+  SCOPED_TRACE(testBlobsParallel1);
+  testBlobsParallel(CAS, CAS, CAS, CAS, BlobSize);
+}
+
+TEST_P(CASTest, BlobsParallel) {
+  std::shared_ptr<ObjectStore> CAS = createObjectStore();
+  uint64_t Size = 1ULL * 1024;
+  ASSERT_NO_FATAL_FAILURE(testBlobsParallel1(*CAS, Size));
+}
+
+#ifdef EXPENSIVE_CHECKS
+TEST_P(CASTest, BlobsBigParallel) {
+  std::shared_ptr<ObjectStore> CAS = createObjectStore();
+  // 100k is large enough to be standalone files in our on-disk cas.
+  uint64_t Size = 100ULL * 1024;
+  ASSERT_NO_FATAL_FAILURE(testBlobsParallel1(*CAS, Size));
+}
+#endif
diff --git a/llvm/unittests/CMakeLists.txt b/llvm/unittests/CMakeLists.txt
index 8892f3e75729ab..5ebdc3bb4cac13 100644
--- a/llvm/unittests/CMakeLists.txt
+++ b/llvm/unittests/CMakeLists.txt
@@ -34,6 +34,7 @@ add_subdirectory(AsmParser)
 add_subdirectory(BinaryFormat)
 add_subdirectory(Bitcode)
 add_subdirectory(Bitstream)
+add_subdirectory(CAS)
 add_subdirectory(CGData)
 add_subdirectory(CodeGen)
 add_subdirectory(DebugInfo)

>From ee98c85d7f5274a7e0b86cc839cc9d0ad5a1e05f Mon Sep 17 00:00:00 2001
From: Steven Wu <stevenwu at apple.com>
Date: Wed, 30 Oct 2024 14:54:44 -0700
Subject: [PATCH 2/2] Address review feedback

Created using spr 1.3.5
---
 llvm/docs/ContentAddressableStorage.md | 55 +++++++++++++-------------
 llvm/include/llvm/CAS/CASReference.h   | 14 +------
 llvm/lib/CAS/InMemoryCAS.cpp           | 23 ++++++-----
 llvm/lib/CAS/ObjectStore.cpp           | 20 ++++------
 4 files changed, 50 insertions(+), 62 deletions(-)

diff --git a/llvm/docs/ContentAddressableStorage.md b/llvm/docs/ContentAddressableStorage.md
index 4f2d9a6a3a9185..1cd788382c653f 100644
--- a/llvm/docs/ContentAddressableStorage.md
+++ b/llvm/docs/ContentAddressableStorage.md
@@ -6,8 +6,8 @@ Content Addressable Storage, or `CAS`, is a storage system where it assigns
 unique addresses to the data stored. It is very useful for data deduplicaton
 and creating unique identifiers.
 
-Unlikely other kind of storage system like file system, CAS is immutable. It
-is more reliable to model a computation when representing the inputs and outputs
+Unlike other kinds of storage system like a file system, CAS is immutable. It
+is more reliable to model a computation by representing the inputs and outputs
 of the computation using objects stored in CAS.
 
 The basic unit of the CAS library is a CASObject, where it contains:
@@ -24,11 +24,10 @@ struct CASObject {
 }
 ```
 
-Such abstraction can allow simple composition of CASObjects into a DAG to
-represent complicated data structure while still allowing data deduplication.
-Note you can compare two DAGs by just comparing the CASObject hash of two
-root nodes.
-
+With this abstraction, it is possible to compose CASObjects into a DAG that is
+capable of representing complicated data structures, while still allowing data
+deduplication. Note you can compare two DAGs by just comparing the CASObject
+hash of two root nodes.
 
 
 ## LLVM CAS Library User Guide
@@ -47,11 +46,11 @@ along. It has following properties:
 `ObjectRef` created by different `ObjectStore` cannot be cross-referenced or
 compared.
 * `ObjectRef` doesn't guarantee the existence of the CASObject it points to. An
-explicitly load is required before accessing the data stored in CASObject.
-This load can also fail, for reasons like but not limited to: object does
+explicit load is required before accessing the data stored in CASObject.
+This load can also fail, for reasons like (but not limited to): object does
 not exist, corrupted CAS storage, operation timeout, etc.
-* If two `ObjectRef` are equal, it is guarantee that the object they point to
-(if exists) are identical. If they are not equal, the underlying objects are
+* If two `ObjectRef` are equal, it is guaranteed that the object they point to
+are identical (if they exist). If they are not equal, the underlying objects are
 guaranteed to be not the same.
 
 ### ObjectProxy
@@ -88,33 +87,33 @@ It also provides APIs to convert between `ObjectRef`, `ObjectProxy` and
 
 ## CAS Library Implementation Guide
 
-The LLVM ObjectStore APIs are designed so that it is easy to add
-customized CAS implementation that are interchangeable with builtin
-CAS implementations.
+The LLVM ObjectStore API was designed so that it is easy to add
+customized CAS implementations that are interchangeable with the builtin
+ones.
 
 To add your own implementation, you just need to add a subclass to
 `llvm::cas::ObjectStore` and implement all its pure virtual methods.
 To be interchangeable with LLVM ObjectStore, the new CAS implementation
 needs to conform to following contracts:
 
-* Different CASObject stored in the ObjectStore needs to have a different hash
-and result in a different `ObjectRef`. Vice versa, same CASObject should have
-same hash and same `ObjectRef`. Note two different CASObjects with identical
-data but different references are considered different objects.
-* `ObjectRef`s are comparable within the same `ObjectStore` instance, and can
-be used to determine the equality of the underlying CASObjects.
-* The loaded objects from the ObjectStore need to have the lifetime to be at
-least as long as the ObjectStore itself.
+* Different CASObjects stored in the ObjectStore need to have a different hash
+and result in a different `ObjectRef`. Similarly, the same CASObject should have
+the same hash and the same `ObjectRef`. Note: two different CASObjects with
+identical data but different references are considered different objects.
+* `ObjectRef`s are only comparable within the same `ObjectStore` instance, and
+can be used to determine the equality of the underlying CASObjects.
+* The loaded objects from the ObjectStore need to have a lifetime at least as
+long as the ObjectStore itself.
 
 If not specified, the behavior can be implementation defined. For example,
 `ObjectRef` can be used to point to a loaded CASObject so
 `ObjectStore` never fails to load. It is also legal to use a stricter model
-than required. For example, an `ObjectRef` that can be used to compare
-objects between different `ObjectStore` instances is legal but user
-of the ObjectStore should not depend on this behavior.
+than required. For example, an `ObjectRef` can be an unique indentity of
+the objects across multiple `ObjectStore` instances but users of the LLVMCAS
+should not depend on this behavior.
 
-For CAS library implementer, there is also a `ObjectHandle` class that
+For CAS library implementers, there is also an `ObjectHandle` class that
 is an internal representation of a loaded CASObject reference.
-`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, because
+`ObjectProxy` is just a pair of `ObjectHandle` and `ObjectStore`, and
 just like `ObjectRef`, `ObjectHandle` is only useful when paired with
-the ObjectStore that knows about the loaded CASObject.
+the `ObjectStore` that knows about the loaded CASObject.
diff --git a/llvm/include/llvm/CAS/CASReference.h b/llvm/include/llvm/CAS/CASReference.h
index 1f435cf306c4ca..e41c04ca2655d8 100644
--- a/llvm/include/llvm/CAS/CASReference.h
+++ b/llvm/include/llvm/CAS/CASReference.h
@@ -89,7 +89,7 @@ class ReferenceBase {
 #endif
 };
 
-/// Reference to an object in a \a ObjectStore instance.
+/// Reference to an object in an \a ObjectStore instance.
 ///
 /// If you have an ObjectRef, you know the object exists, and you can point at
 /// it from new nodes with \a ObjectStore::store(), but you don't know anything
@@ -105,12 +105,6 @@ class ReferenceBase {
 /// ObjectHandle, a variant that knows what kind of entity it is. \a
 /// ObjectStore::getReferenceKind() can expect the type of reference without
 /// asking for unloaded objects to be loaded.
-///
-/// This is a wrapper around a \c uint64_t (and a \a ObjectStore instance when
-/// assertions are on). If necessary, it can be deconstructed and reconstructed
-/// using \a Reference::getInternalRef() and \a
-/// Reference::getFromInternalRef(), but clients aren't expected to need to do
-/// this. These both require the right \a ObjectStore instance.
 class ObjectRef : public ReferenceBase {
   struct DenseMapTag {};
 
@@ -122,12 +116,6 @@ class ObjectRef : public ReferenceBase {
     return !(LHS == RHS);
   }
 
-  /// Allow a reference to be recreated after it's deconstructed.
-  static ObjectRef getFromInternalRef(const ObjectStore &CAS,
-                                      uint64_t InternalRef) {
-    return ObjectRef(CAS, InternalRef);
-  }
-
   static ObjectRef getDenseMapEmptyKey() {
     return ObjectRef(DenseMapEmptyTag{});
   }
diff --git a/llvm/lib/CAS/InMemoryCAS.cpp b/llvm/lib/CAS/InMemoryCAS.cpp
index abdd7ed3ef8051..f0305e0d4eafae 100644
--- a/llvm/lib/CAS/InMemoryCAS.cpp
+++ b/llvm/lib/CAS/InMemoryCAS.cpp
@@ -13,6 +13,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/ThreadSafeAllocator.h"
+#include "llvm/Support/TrailingObjects.h"
 
 using namespace llvm;
 using namespace llvm::cas;
@@ -69,12 +70,12 @@ class InMemoryObject {
   static_assert(((int)Kind::Max >> NumKindBits) == 0, "Kind will be truncated");
 
 public:
-  inline ArrayRef<char> getData() const;
+  ArrayRef<char> getData() const;
 
-  inline ArrayRef<const InMemoryObject *> getRefs() const;
+  ArrayRef<const InMemoryObject *> getRefs() const;
 };
 
-class InMemoryRefObject : public InMemoryObject {
+class InMemoryRefObject final : public InMemoryObject {
 public:
   static constexpr Kind KindValue = Kind::RefNode;
   static bool classof(const InMemoryObject *O) {
@@ -107,7 +108,10 @@ class InMemoryRefObject : public InMemoryObject {
   ArrayRef<char> Data;
 };
 
-class InMemoryInlineObject : public InMemoryObject {
+class InMemoryInlineObject final
+    : public InMemoryObject,
+      public TrailingObjects<InMemoryInlineObject, const InMemoryObject *,
+                             char> {
 public:
   static constexpr Kind KindValue = Kind::InlineNode;
   static bool classof(const InMemoryObject *O) {
@@ -116,15 +120,12 @@ class InMemoryInlineObject : public InMemoryObject {
 
   ArrayRef<const InMemoryObject *> getRefs() const { return getRefsImpl(); }
   ArrayRef<const InMemoryObject *> getRefsImpl() const {
-    return ArrayRef(reinterpret_cast<const InMemoryObject *const *>(this + 1),
-                    NumRefs);
+    return ArrayRef(getTrailingObjects<const InMemoryObject *>(), NumRefs);
   }
 
   ArrayRef<char> getData() const { return getDataImpl(); }
   ArrayRef<char> getDataImpl() const {
-    ArrayRef<const InMemoryObject *> Refs = getRefs();
-    return ArrayRef(reinterpret_cast<const char *>(Refs.data() + Refs.size()),
-                    DataSize);
+    return ArrayRef(getTrailingObjects<char>(), DataSize);
   }
 
   static InMemoryInlineObject &
@@ -136,6 +137,10 @@ class InMemoryInlineObject : public InMemoryObject {
     return *new (Mem) InMemoryInlineObject(I, Refs, Data);
   }
 
+  size_t numTrailingObjects(OverloadToken<const InMemoryObject *>) const {
+    return NumRefs;
+  }
+
 private:
   InMemoryInlineObject(const InMemoryIndexValueT &I,
                        ArrayRef<const InMemoryObject *> Refs,
diff --git a/llvm/lib/CAS/ObjectStore.cpp b/llvm/lib/CAS/ObjectStore.cpp
index a938c4e215382e..179621cfa296c3 100644
--- a/llvm/lib/CAS/ObjectStore.cpp
+++ b/llvm/lib/CAS/ObjectStore.cpp
@@ -12,6 +12,7 @@
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include <optional>
 
 using namespace llvm;
 using namespace llvm::cas;
@@ -121,20 +122,15 @@ Expected<ObjectProxy> ObjectStore::createProxy(ArrayRef<ObjectRef> Refs,
 Expected<ObjectRef>
 ObjectStore::storeFromOpenFileImpl(sys::fs::file_t FD,
                                    std::optional<sys::fs::file_status> Status) {
-  // Copy the file into an immutable memory buffer and call \c store on that.
-  // Using \c mmap would be unsafe because there's a race window between when we
-  // get the digest hash for the \c mmap contents and when we store the data; if
-  // the file changes in-between we will create an invalid object.
-
-  // FIXME: For the on-disk CAS implementation use cloning to store it as a
+  // TODO: For the on-disk CAS implementation use cloning to store it as a
   // standalone file if the file-system supports it and the file is large.
+  uint64_t Size = Status ? Status->getSize() : -1;
+  auto Buffer = MemoryBuffer::getOpenFile(FD, /*Filename=*/"", Size);
+  if (Buffer)
+    return errorCodeToError(Buffer.getError());
 
-  constexpr size_t ChunkSize = 4 * 4096;
-  SmallString<0> Data;
-  Data.reserve(ChunkSize * 2);
-  if (Error E = sys::fs::readNativeFileToEOF(FD, Data, ChunkSize))
-    return std::move(E);
-  return store(std::nullopt, ArrayRef(Data.data(), Data.size()));
+  return store(std::nullopt,
+               arrayRefFromStringRef<char>((*Buffer)->getBuffer()));
 }
 
 Error ObjectStore::validateTree(ObjectRef Root) {



More information about the llvm-commits mailing list