[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

Artem Belevich via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 4 11:28:54 PST 2024


================
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
 }
 
 OffloadBundlerConfig::OffloadBundlerConfig() {
+  if (llvm::compression::zstd::isAvailable()) {
+    CompressionFormat = llvm::compression::Format::Zstd;
+    // Use a high zstd compress level by default for better size reduction.
----------------
Artem-B wrote:

I'd add more details here. While higher compression levels usually do improve compression ratio, in typical use case it's an incremental improvement. Here, we do it to achieve dramatic increase in compression ratio by exploiting the fact that we carry multiple sets of very similar large bitcode blobs, and that we need compression level high enough to fit one complete blob into compression window. At least that's the theory. 

Should we print a warning (or just document it?) when compression level ends up being below of what we'd expect? Considering that good compression starts at zstd-20, I suspect that compression level will go back to ~2.5x if the binary size for one GPU doubles in size and no longer fits. On top of that compression time will also increase, a lot. That will be a rather unpleasant surprise for whoever runs into it.

ZSTD's current compression parameters are set this way:
https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h#L47

```
{ 23, 24, 22,  7,  3,256, ZSTD_btultra2},  /* level 19 */
{ 25, 25, 23,  7,  3,256, ZSTD_btultra2,  /* level 20 */
```
First three numbers are log2 of (largest match distance, fully searched segment, dispatch table).

2^25 = 32MB which happens to be about the size of the single GPU binary in your example. I'm pretty sure this explains why `zstd-20` works so well on it, while zstd-19 does not. It will work well for the smaller binaries, but I'm pretty sure it will regress for a slightly larger binary.

I think it may be worth experimenting with fine-tuning compression settings and instead of blindly setting `zstd-20`, consider the size of the binary we need to deal with, and adjust only windowLog/chainLog appropriately.

Or we could set the default to lower compression level + large windowLog. This should still give us most of the compression benefits for the binaries that would fit into the window, but would avoid the performance cliff if the binary is too large.

I may be overcomplicating it too much, too. If someone does run into the problem, they now have a way to work around it by tweaking the compression level.


https://github.com/llvm/llvm-project/pull/83605


More information about the cfe-commits mailing list