[llvm] [LLVM] add LZMA for compression/decompression (PR #83297)

Mon Mar 4 13:54:43 PST 2024

Artem-B wrote:

> Excuse the outlandish suggestion, but given:
> 
> > I do get the part that multiple GPU variants give us a lot of redundancy in the data to compress away.
> 
> Is there any chance of some sort of domain-specific compression, 

The key domain-specific quirk we can exploit here is that we produce N very similar blobs (same code, with minor differences due to GPU-specific intrinsics, etc.) There's nothing particularly interesting about the individual blobs. 

> especially that would be more resilient to the size of the kernels? (seems like increasing the compression level increases the compression window size, which has some cliff/break points for kernels of certain sizes, which seems unfortunately non-general - like it'd be nice to not have to push the compression algorithm so hard for smaller kernels, and it'd be nice if larger kernels could still be deduplicated)

One way to achieve that would be to interleave GPU blobs. Instead of `AAAAABBBBBCCCCC`, pack them as `ABCABCABCABC`. This way the compression window requirement will be reduced to cover only a slice, not the whole blob.

Increasing compression window while keeping the rest of parameters at a lower compression level may work, too. At least on my experiments `zstd -9 --zstd=wlog=25` does not seem to affect compression time much. It still works much faster than `zstd -20`.

https://github.com/llvm/llvm-project/pull/83297