[PATCH] D133716: [CAS] Add LLVMCAS library with InMemoryCAS implementation

Wed Dec 7 13:54:11 PST 2022

steven_wu added a comment.

@dblaikie Thanks for reviewing. Feedback inline.

================
Comment at: llvm/include/llvm/CAS/CASID.h:26
+class CASContext {
+  virtual void anchor();
+
----------------
dblaikie wrote:
> This could be private to avoid polluting autocomplete and such, maybe? (alternatively it's probably not too costly to have the dtor virtual and out of line to act as an anchor - I assume CASContexts aren't being created/torn down with any great frequency such that some dtor indirection would be especially costly?)
This is a private method? But sure this is not a common object to create or destory. dtor as anchor sounds fine.

================
Comment at: llvm/include/llvm/CAS/CASID.h:108-109
+private:
+  CASID(const CASContext *Context, StringRef Hash)
+      : Context(Context), Hash(Hash) {}
+
----------------
dblaikie wrote:
> if you have a `std::string` member, should probably take the parameter as `std::string` and `std::move` it into the member - in cases where the caller can discard a string that'll be more efficient, and in cases where the caller can't, it's not especially worse.
> 
> (though shoudl the member be `std::string`? should it be `SmallString` or anything else?)
I don't have much preference for the internal data type. CASID should only be relevant when you actually need the hash representation, often when you need to print out the value for outside users. Otherwise, `ObjectRef` is always a better choice.

================
Comment at: llvm/include/llvm/CAS/CASReference.h:106
+/// ObjectHandle, a variant that knows what kind of entity it is. \a
+/// ObjectStore::getReferenceKind() can expect the type of reference without
+/// asking for unloaded objects to be loaded.
----------------
dblaikie wrote:
> expect -> extract? or something else?
This comment is out-dated. The builtin kind of the object is removed from the latest implementation to simplify the interface so it is easier to use and implement

================
Comment at: llvm/include/llvm/CAS/CASReference.h:110-113
+/// assertions are on). If necessary, it can be deconstructed and reconstructed
+/// using \a Reference::getInternalRef() and \a
+/// Reference::getFromInternalRef(), but clients aren't expected to need to do
+/// this. These both require the right \a ObjectStore instance.
----------------
dblaikie wrote:
> If clients aren't expected to need to do it, could we leave that out until some necessary use comes up?
It actually means that only the CAS implementers should use those method (for example, used in BuiltinCAS) but users of the CAS should not use those methods because `InternalRef` has no meaning outside the CAS internal. I rewrite it to clear up a bit.

================
Comment at: llvm/include/llvm/CAS/CASReference.h:162
+/// data and references.
+class ObjectHandle : public ReferenceBase {
+public:
----------------
dblaikie wrote:
> Since `Handle` and `Ref` aren't especially self documenting (while reading the docs for `Ref` I wondered if it should be called `Handle` - until I got to the bit where it explained that that exists too/separately) - is there something that'd be more explicit about the difference between these? (like, I wonder if `ObjectHandle` could be `Object`? - not sure that'd make `ObjectRef` more obvious, though... )
`ObjectRef` is just a reference, and you can't access anything in the object it points to, until you load the object using ObjectRef, which will turn into an `ObjectHandle`. The load can have latency or can fail.

Maybe the comments in ObjectStore.h is a better explanation with a bigger picture? Or should we put all the documentation into one place so it is easier to read?

================
Comment at: llvm/include/llvm/CAS/ObjectStore.h:49
+///     - \a loadObject(const CASID&) looks up an object by its UID.
+/// - Objects can reference other objects, forming an arbitrary DAG.
+///
----------------
dblaikie wrote:
> do they "reference" other objects - or does an object consist of other objects/include those objects in some sense? (maybe there's no real difference, and this current language is better... not sure)
It is a real `reference`, like a pointer reference. But this is up to the CAS implementation, as I can see it is totally possible to implement a CAS the directly embeds the "referenced" object without breaking the contract of the API, but there is really no good reason to do that.

================
Comment at: llvm/include/llvm/CAS/ObjectStore.h:53-57
+/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
+///   opaque type that references an object inside a specific CAS. It is
+///   implementation defined if the underlying object exists or not for an
+///   ObjectRef, and it can used to speed up CAS lookup as an implementation
+///   detail. However, you don't know anything about the underlying objects.
----------------
dblaikie wrote:
> Not clear from this what the difference between an `ObjectRef` and a `CASID` are - is that worth more details?
I refine the docs a bit. Let me know if it is clearer to read.

================
Comment at: llvm/include/llvm/CAS/ObjectStore.h:109-110
+
+  /// Validate the underlying object referred by CASID.
+  virtual Error validate(const CASID &ID) = 0;
+
----------------
dblaikie wrote:
> what sort of validity is of concern here?
This is more a debugging function that can check if the integrity of the CAS has been broken. The current implementation for builtin CAS is to rehash the object fetched from CASID and make sure the hash matches. If not, there is either a bug in CAS or the data has been corrupted.

================
Comment at: llvm/include/llvm/CAS/ObjectStore.h:174-175
+  /// Methods for handling objects.
+  virtual Error forEachRef(ObjectHandle Node,
+                           function_ref<Error(ObjectRef)> Callback) const = 0;
+  virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
----------------
dblaikie wrote:
> this lets you walk which other objects reference this object?
> I guess you've probably got some pretty core use cases/need for this - but dealing with updating use lists in LLVM IR at least makes me ask: Do you need to be able to walk the uses of an object?
No, the reference in CAS goes only one direction. This iterates through all the objects that is referenced by the current object and you don't know how many parents references you.

In general, there is even no API for CAS to iterate through all the objects to compute a use-list and that is by design. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133716/new/

https://reviews.llvm.org/D133716