[PATCH] D130516: [llvm] compression classes
Cole Kissane via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Aug 2 12:17:39 PDT 2022
ckissane added a comment.
In D130516#3694366 <https://reviews.llvm.org/D130516#3694366>, @dblaikie wrote:
> In D130516#3694151 <https://reviews.llvm.org/D130516#3694151>, @MaskRay wrote:
>
>> In D130516#3688236 <https://reviews.llvm.org/D130516#3688236>, @dblaikie wrote:
>>
>>> In D130516#3688123 <https://reviews.llvm.org/D130516#3688123>, @MaskRay wrote:
>>>
>>>> I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
>>>> Let's discuss several use cases.
>>>>
>>>> (a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
>>>> For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
>>>> There is some slight cognitive burden.
>>>> The class design has a non-trivial one-shot cost to have a function returning the singleton instance.
>>>
>>> Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.
>>>
>>> If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.
>>
>> A pointer to a singleton compression class is isomorphic to an `enum class CompressionType` variable.
>
> I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.
>
> I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.
>
>> Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.
>
> I agree that either design doesn't change what's possible - I do, though, think that the "usage patterns" are meaningfully different between the two designs.
>
>> An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).
>>
>> Say, we do
>>
>> auto *algo = !compression::ZlibCompression;
>> if (!algo)
>> ...
>>
>>
>> algo->compress(...);
>>
>> either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:
>>
>> if (!compression::isAvailable(format))
>> ...
>>
>> compression::compress(format);
>
> Specifically two APIs that are related (it's important/necessary to check for availability before calling compress or decompress) in their contracts but unrelated in their API use makes it easier to misuse the APIs and have a situation where the availability check doesn't cover the usage. That's what I think is important/important to discuss here.
>
>>>> (b) zlib compress/uncompress immediately following an availability check.
>>>>
>>>> // free function
>>>> if (!compression::zlib::isAvailable())
>>>> errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();
>>>>
>>>> // class
>>>> auto *algo = !compression::ZlibCompression;
>>>> if (!algo->isAvailable()) {
>>>> errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>> }
>>>
>>> I think maybe this code might end up looking like:
>>>
>>> Algo *algo = getAlgo(Zlib)
>>> if (!algo)
>>> errs() ...
>>>
>>> It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)
>>
>> I think this is similarly achieved with an enum variable.
>> With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
>> This is not different from that: the enum variable may have a value the enum class supports.
>
> I agree that the code is similar in either case, but with a small difference that is important to me - that accessing the algorithm necessarily (to some degree - you could still have code that doesn't test the condition/dereferences null, the same way that code can dereference an empty Optional without checking first - but at least the API I'm suggesting makes clear there's a connection between availability and usage).
>
>>>> (c) zlib/zstd compress/uncompress immediately following an availability check.
>>>>
>>>> // free function
>>>> if (!compression::isAvailable(format))
>>>> errs() << "cannot compress: " << compression::buildConfigurationHint(format);
>>>>
>>>> // class
>>>> std::unique_ptr<Compression> algo = make_compression(format);
>>>> if (!algo->isAvailable()) {
>>>> errs() << "cannot compress: " << algo->buildConfigurationHint();
>>>> }
>>>
>>> I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.
>>>
>>> & this example doesn't include the code that does the compression/decompression, which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.
>>
>> Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an `enum CompressionType` variable" argument :)
>
> I don't understand what you're saying here. Could you rephrase/expand a bit?
@dblaikie, @MaskRay I think I have worked out something that is the best of both worlds:
none compression is represented simply as a none type for use cases that will use `Optional<CompressionKind>`.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:
llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
if (!OptionalCompressionScheme) {
return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
}
llvm::compression::CompressionKind CompressionScheme =
*OptionalCompressionScheme;
if (!CompressionScheme) {
Error("compression class " +
(CompressionScheme->getName() + " is not available").str());
return nullptr;
}
SmallVector<uint8_t, 0> Uncompressed;
if (llvm::Error E = CompressionScheme->decompress(
llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
Error("could not decompress embedded file contents: " +
llvm::toString(std::move(E)));
return nullptr;
}
return llvm::MemoryBuffer::getMemBufferCopy(
llvm::toStringRef(Uncompressed), Name);
(excerpt from `ASTReader.cpp`)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130516/new/
https://reviews.llvm.org/D130516
More information about the cfe-commits
mailing list