[PATCH] D130516: [llvm] compression classes

Tue Aug 2 11:01:04 PDT 2022

MaskRay added a comment.

In D130516#3688236 <https://reviews.llvm.org/D130516#3688236>, @dblaikie wrote:

> In D130516#3688123 <https://reviews.llvm.org/D130516#3688123>, @MaskRay wrote:
>
>> I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
>> Let's discuss several use cases.
>>
>> (a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
>> For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
>> There is some slight cognitive burden.
>> The class design has a non-trivial one-shot cost to have a function returning the singleton instance.
>
> Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.
>
> If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

Singleton compression classes are isomorphic to an `enum CompressionType` variable.
Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.

Say, we do

  auto *algo = !compression::ZlibCompression;
  if (!algo)
    ...

  algo->compress(...);

either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:

  if (!compression::isAvailable(format))
    ...

  compression::compress(format);

>> (b) zlib compress/uncompress immediately following an availability check.
>>
>>   // free function
>>   if (!compression::zlib::isAvailable())
>>     errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();
>>   
>>   // class
>>   auto *algo = !compression::ZlibCompression;
>>   if (!algo->isAvailable()) {
>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>   }
>
> I think maybe this code might end up looking like:
>
>   Algo *algo = getAlgo(Zlib)
>   if (!algo)
>     errs() ...
>
> It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)

I think this is similarly achieved with an enum variable.
With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
This is not different from that: the enum variable may have a value the enum class supports.

>> (c) zlib/zstd compress/uncompress immediately following an availability check.
>>
>>   // free function
>>   if (!compression::isAvailable(format))
>>     errs() << "cannot compress: " << compression::buildConfigurationHint(format);
>>   
>>   // class
>>   std::unique_ptr<Compression> algo = make_compression(format);
>>   if (!algo->isAvailable()) {
>>     errs() << "cannot compress: " << algo->buildConfigurationHint();
>>   }
>
> I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.
>
> & this example doesn't include the code that does the compression/decompression,  which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.

Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an `enum CompressionType` variable" argument :)

>> (d) compress/uncompress and an availability check are apart.
>>
>>   // free function
>>   no change
>>   
>>   // class
>>   Store (the pointer to the) the algorithm object somewhere, or construct the pointer/object twice.
>
> The benefit here is that it's harder for the test to become separated from the usage - for the usage to end up becoming unconditional/incorrectly guarded.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130516/new/

https://reviews.llvm.org/D130516