[PATCH] D130516: [llvm] compression classes

Tue Aug 2 12:10:20 PDT 2022

dblaikie added a comment.

In D130516#3694151 <https://reviews.llvm.org/D130516#3694151>, @MaskRay wrote:

> In D130516#3688236 <https://reviews.llvm.org/D130516#3688236>, @dblaikie wrote:
>
>> In D130516#3688123 <https://reviews.llvm.org/D130516#3688123>, @MaskRay wrote:
>>
>>> I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
>>> Let's discuss several use cases.
>>>
>>> (a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
>>> For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
>>> There is some slight cognitive burden.
>>> The class design has a non-trivial one-shot cost to have a function returning the singleton instance.
>>
>> Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.
>>
>> If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.
>
> A pointer to a singleton compression class is isomorphic to an `enum class CompressionType` variable.

I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.

I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.

> Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.

I agree that either design doesn't change what's possible - I do, though, think that the "usage patterns" are meaningfully different between the two designs.

> An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).
>
> Say, we do
>
>   auto *algo = !compression::ZlibCompression;
>   if (!algo)
>     ...
>   
>   
>   algo->compress(...);
>
> either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:
>
>   if (!compression::isAvailable(format))
>     ...
>   
>   compression::compress(format);

Specifically two APIs that are related (it's important/necessary to check for availability before calling compress or decompress) in their contracts but unrelated in their API use makes it easier to misuse the APIs and have a situation where the availability check doesn't cover the usage. That's what I think is important/important to discuss here.

>>> (b) zlib compress/uncompress immediately following an availability check.
>>>
>>>   // free function
>>>   if (!compression::zlib::isAvailable())
>>>     errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();
>>>   
>>>   // class
>>>   auto *algo = !compression::ZlibCompression;
>>>   if (!algo->isAvailable()) {
>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>   }
>>
>> I think maybe this code might end up looking like:
>>
>>   Algo *algo = getAlgo(Zlib)
>>   if (!algo)
>>     errs() ...
>>
>> It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)
>
> I think this is similarly achieved with an enum variable.
> With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
> This is not different from that: the enum variable may have a value the enum class supports.

I agree that the code is similar in either case, but with a small difference that is important to me - that accessing the algorithm necessarily (to some degree - you could still have code that doesn't test the condition/dereferences null, the same way that code can dereference an empty Optional without checking first - but at least the API I'm suggesting makes clear there's a connection between availability and usage).

>>> (c) zlib/zstd compress/uncompress immediately following an availability check.
>>>
>>>   // free function
>>>   if (!compression::isAvailable(format))
>>>     errs() << "cannot compress: " << compression::buildConfigurationHint(format);
>>>   
>>>   // class
>>>   std::unique_ptr<Compression> algo = make_compression(format);
>>>   if (!algo->isAvailable()) {
>>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>   }
>>
>> I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.
>>
>> & this example doesn't include the code that does the compression/decompression,  which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.
>
> Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an `enum CompressionType` variable" argument :)

I don't understand what you're saying here. Could you rephrase/expand a bit?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130516/new/

https://reviews.llvm.org/D130516