[PATCH] D131992: [Support] compression proposal for a enum->spec->impl approach

David Blaikie via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Aug 25 01:22:17 PDT 2022


dblaikie added a comment.

@MaskRay I think this change is probably the best point of comparison/demonstration of this patch direction (taking some minor liberties with this patch to simplify things somewhat in ways that have already been discussed/seem quite reasonable - specifically having `getCompresisonSpec` return a reference and the enum having either no "unknown" value, or `getCompressionSpec` asserting on the unknown value):

  // removing the hardcoded zlib compression kind parameter in favor of what the code would look like in the near future once it's parameterized on a compression kind
  CompressionImplementation *Compressor = getCompressionSpec(CompressionAlgorithmKind)->Implementation;
  bool DoCompression = Compress && DoInstrProfNameCompression && Compressor;
  if (DoCompression) {
    Compressor->compress(arrayRefFromStringRef(FilenamesStr),
                                    CompressedStr,
                                    Compressor->BestSizeLevel);
  }

I think a reasonable speculation about what the alternative would look like in the a non-OO design would be something like:

  bool DoCompression = Compress && DoInstrProfNameCompression && isAvailable(CompressionAlgorithmKind);
  if (DoCompression) {
    compress(CompressionAlgorithmKind, arrayRefFromStringRef(FilenamesStr), CompressedStr, getBestSizeLevel(CompressionAlgorithmKind));

And having these three correlated calls (`isAvailable`, `compress`, and `getBestSizeLevel`) all made independently/have to have the same compression algorithm kind passed to them independently seems more error prone & redundant (not in an actual execution/runtime efficiency perspective - I don't think any of these different designs would have materially different runtime performance - but in terms of "duplicate work, and work that could diverge/become inconsistent" - essentially the duplicate work/repeated switch statements, one in each `compression::*` generic entry point seems like a code smell to me that points towards a design that doesn't have that duplication) rather than tying these calls together and making the lookup happen once and then using the features after that. Also exposing the compression algorithm along with the availability in one operation (not exposing compress/decompress/size APIs when the algorithm isn't available) seems to have similar benefits.

I agree it's somewhat overkill for two types of compression, but I don't think it's so burdensome as to be avoided either. I probably wouldn't've put quite this much consideration into the design if I were doing it myself amongst other things, but I appreciate that @ckissane has & I'm pretty happy with where this is now.

A few things still leftover that I think I've mentioned to @ckissane offline:

- getCompressionSpec returning by reference
- more incremental patches - implementing the generic API in terms of the old one (or keeping the old one as a wrapper around the generic one) with at most one or two uses of the API (if any) in the first commit - then subsequent API migrations in future commits, eventually removing the old API once the migration is complete
- having only the `getCompressionSpec(CompressionKind)` and dropping the version that takes uint8_t - callers should handle mapping from whatever domain-specific format/representation they have into the CompressionKind enum, there's no need for the raw version here
- Probably removing the 'unknown' CompressionKind - if a caller needs to handle having "maybe a compression kind, or none" they can wrap CompressionKind in Optional themselves and handle not looking up the algorithm if no compression kind is desired/requested



================
Comment at: llvm/lib/ProfileData/InstrProf.cpp:495
   return collectPGOFuncNameStrings(
-      NameStrs, compression::zlib::isAvailable() && doCompression, Result);
+      NameStrs, doCompression ? OptionalCompressionScheme : nullptr, Result);
 }
----------------
ckissane wrote:
> dblaikie wrote:
> > looks like this could be changed to pass the implementation, without the spec? (the caller doesn't need/use the spec)
> `(the caller doesn't need/use the spec)`...
> It definitely might in the future because it writes a header, that (in the conceivably near future) (like one of my follow up patches) could contain the uint 8 value of the compression type from the Spec, and 0 if uncompressed.
That should/can wait for the future change.

& I think at that point we could get into what the right design might be - and I was thinking maybe the right design might be for the CompressionImplementation to have a pointer back to its spec. But in general, lets defer that design decision for later - it's not too expensive to change what the parameter types are once the data is needed later.

(again, this might be even simpler addressed by not including this change at all up-front - keeping the old API and leaving these callers until you're patching the callers to add the new functionality of being able to choose between Zlib and Zstd - rather than, for now, plumbing in the new API without any change in functionality - the plumbing + functionality together might be easire to understand the motivation for and review, and doing it separately from most of the generic design work will isolate the usage design discussion so each piece can move forward separately, for the most part)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131992/new/

https://reviews.llvm.org/D131992



More information about the cfe-commits mailing list