[llvm] [CAS] Add LLVMCAS library with InMemoryCAS implementation (PR #114096)
Paul Kirth via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 8 15:55:40 PDT 2025
================
@@ -0,0 +1,298 @@
+//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CAS_OBJECTSTORE_H
+#define LLVM_CAS_OBJECTSTORE_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CAS/CASID.h"
+#include "llvm/CAS/CASReference.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/FileSystem.h"
+#include <cstddef>
+
+namespace llvm {
+
+class MemoryBuffer;
+template <typename T> class unique_function;
+
+namespace cas {
+
+class ObjectStore;
+class ObjectProxy;
+
+/// Content-addressable storage for objects.
+///
+/// Conceptually, objects are stored in a "unique set".
+///
+/// - Objects are immutable ("value objects") that are defined by their
+/// content. They are implicitly deduplicated by content.
+/// - Each object has a unique identifier (UID) that's derived from its content,
+/// called a \a CASID.
+/// - This UID is a fixed-size (strong) hash of the transitive content of a
+/// CAS object.
+/// - It's comparable between any two CAS instances that have the same \a
+/// CASIDContext::getHashSchemaIdentifier().
+/// - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
+/// by the same or a different CAS instance with \a
+/// ObjectStore::parseID().
+/// - An object can be looked up by content or by UID.
+/// - \a store() is "get-or-create" methods, writing an object if it
+/// doesn't exist yet, and return a ref to it in any case.
+/// - \a loadObject(const CASID&) looks up an object by its UID.
+/// - Objects can reference other objects, forming an arbitrary DAG.
+///
+/// The \a ObjectStore interface has a few ways of referencing objects:
+///
+/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
+/// opaque type that references an object inside a specific CAS. It is
+/// implementation defined if the underlying object exists or not for an
+/// ObjectRef, and it can used to speed up CAS lookup as an implementation
+/// detail. However, you don't know anything about the underlying objects.
+/// "Loading" the object is a separate step that may not have happened
+/// yet, and which can fail (e.g. due to filesystem corruption) or introduce
+/// latency (if downloading from a remote store).
+/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
+/// these to inspect the content of an object: to look at its stored
+/// data and references. This is internal to CAS implementation and not
+/// availble from CAS public APIs.
+/// - \a CASID: the UID for an object in the CAS, obtained through \a
+/// ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
+/// identifier, but may reference an object that is unknown to this CAS
+/// instance.
+/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
+/// wraps access APIs to avoid having to pass extra parameters. It is the
+/// object used for accessing underlying data and refs by CAS users.
+///
+/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
+/// are only valid with the associated ObjectStore instance.
+///
+/// There are a few options for accessing content of objects, with different
+/// lifetime tradeoffs:
+///
+/// - \a getData() accesses data without exposing lifetime at all.
+/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
+/// is independent of the CAS (it can live longer).
+/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
+/// long as \a ObjectStore.
+/// - \a readRef() and \a forEachRef() iterate through the references in an
+/// object. There is no lifetime assumption.
+class ObjectStore {
+ friend class ObjectProxy;
+ void anchor();
+
+public:
+ /// Get a \p CASID from a \p ID, which should have been generated by \a
+ /// CASID::print(). This succeeds as long as \a validateID() would pass. The
+ /// object may be unknown to this CAS instance.
+ ///
+ /// TODO: Remove, and update callers to use \a validateID() or \a
+ /// extractHashFromID().
+ virtual Expected<CASID> parseID(StringRef ID) = 0;
+
+ /// Store object into ObjectStore.
+ virtual Expected<ObjectRef> store(ArrayRef<ObjectRef> Refs,
+ ArrayRef<char> Data) = 0;
+ /// Get an ID for \p Ref.
+ virtual CASID getID(ObjectRef Ref) const = 0;
+
+ /// Get an existing reference to the object called \p ID.
+ ///
+ /// Returns \c None if the object is not stored in this CAS.
+ virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
+
+ /// \returns true if the object is directly available from the local CAS, for
+ /// implementations that have this kind of distinction.
+ virtual Expected<bool> isMaterialized(ObjectRef Ref) const = 0;
+
+ /// Validate the underlying object referred by CASID.
+ virtual Error validate(const CASID &ID) = 0;
+
+protected:
+ /// Load the object referenced by \p Ref.
+ ///
+ /// Errors if the object cannot be loaded.
+ /// \returns \c std::nullopt if the object is missing from the CAS.
+ virtual Expected<std::optional<ObjectHandle>> loadIfExists(ObjectRef Ref) = 0;
+
+ /// Like \c loadIfExists but returns an error if the object is missing.
+ Expected<ObjectHandle> load(ObjectRef Ref);
+
+ /// Get the size of some data.
+ virtual uint64_t getDataSize(ObjectHandle Node) const = 0;
+
+ /// Methods for handling objects.
----------------
ilovepi wrote:
This comment implies its for the following methods and sounds more like its about the classes organization (for the reader), but it is a doc comment and will only be attached to the `forEachRef` method.
I'd suggest adding a bit more documentation here, since these are `virtual`.
https://github.com/llvm/llvm-project/pull/114096
More information about the llvm-commits
mailing list