[PATCH] D130516: [llvm] compression classes

Tue Aug 2 12:17:39 PDT 2022

ckissane added a comment.

In D130516#3694366 <https://reviews.llvm.org/D130516#3694366>, @dblaikie wrote:

> In D130516#3694151 <https://reviews.llvm.org/D130516#3694151>, @MaskRay wrote:
>
>> In D130516#3688236 <https://reviews.llvm.org/D130516#3688236>, @dblaikie wrote:
>>
>>> In D130516#3688123 <https://reviews.llvm.org/D130516#3688123>, @MaskRay wrote:
>>>
>>>> I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
>>>> Let's discuss several use cases.
>>>>
>>>> (a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
>>>> For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
>>>> There is some slight cognitive burden.
>>>> The class design has a non-trivial one-shot cost to have a function returning the singleton instance.
>>>
>>> Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.
>>>
>>> If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.
>>
>> A pointer to a singleton compression class is isomorphic to an `enum class CompressionType` variable.
>
> I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.
>
> I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.
>
>> Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.
>
> I agree that either design doesn't change what's possible - I do, though, think that the "usage patterns" are meaningfully different between the two designs.
>
>> An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).
>>
>> Say, we do
>>
>>   auto *algo = !compression::ZlibCompression;
>>   if (!algo)
>>     ...
>>   
>>   
>>   algo->compress(...);
>>
>> either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:
>>
>>   if (!compression::isAvailable(format))
>>     ...
>>   
>>   compression::compress(format);
>
> Specifically two APIs that are related (it's important/necessary to check for availability before calling compress or decompress) in their contracts but unrelated in their API use makes it easier to misuse the APIs and have a situation where the availability check doesn't cover the usage. That's what I think is important/important to discuss here.
>
>>>> (b) zlib compress/uncompress immediately following an availability check.
>>>>
>>>>   // free function
>>>>   if (!compression::zlib::isAvailable())
>>>>     errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();
>>>>   
>>>>   // class
>>>>   auto *algo = !compression::ZlibCompression;
>>>>   if (!algo->isAvailable()) {
>>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>>   }
>>>
>>> I think maybe this code might end up looking like:
>>>
>>>   Algo *algo = getAlgo(Zlib)
>>>   if (!algo)
>>>     errs() ...
>>>
>>> It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)
>>
>> I think this is similarly achieved with an enum variable.
>> With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
>> This is not different from that: the enum variable may have a value the enum class supports.
>
> I agree that the code is similar in either case, but with a small difference that is important to me - that accessing the algorithm necessarily (to some degree - you could still have code that doesn't test the condition/dereferences null, the same way that code can dereference an empty Optional without checking first - but at least the API I'm suggesting makes clear there's a connection between availability and usage).
>
>>>> (c) zlib/zstd compress/uncompress immediately following an availability check.
>>>>
>>>>   // free function
>>>>   if (!compression::isAvailable(format))
>>>>     errs() << "cannot compress: " << compression::buildConfigurationHint(format);
>>>>   
>>>>   // class
>>>>   std::unique_ptr<Compression> algo = make_compression(format);
>>>>   if (!algo->isAvailable()) {
>>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>>   }
>>>
>>> I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.
>>>
>>> & this example doesn't include the code that does the compression/decompression,  which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.
>>
>> Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an `enum CompressionType` variable" argument :)
>
> I don't understand what you're saying here. Could you rephrase/expand a bit?

@dblaikie, @MaskRay I think I have worked out something that is the best of both worlds:
none compression is represented simply as a none type for use cases that will use `Optional<CompressionKind>`.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:

  llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
            llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
        if (!OptionalCompressionScheme) {
          return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
        }
        llvm::compression::CompressionKind CompressionScheme =
            *OptionalCompressionScheme;
        if (!CompressionScheme) {
          Error("compression class " +
                (CompressionScheme->getName() + " is not available").str());
          return nullptr;
        }
        SmallVector<uint8_t, 0> Uncompressed;
        if (llvm::Error E = CompressionScheme->decompress(
                llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
          Error("could not decompress embedded file contents: " +
                llvm::toString(std::move(E)));
          return nullptr;
        }
        return llvm::MemoryBuffer::getMemBufferCopy(
            llvm::toStringRef(Uncompressed), Name);

(excerpt from `ASTReader.cpp`)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130516/new/

https://reviews.llvm.org/D130516