[llvm] [LLVM] add LZMA for compression/decompression (PR #83297)

Sat Mar 2 06:38:44 PST 2024

yxsamliu wrote:

> This compression ratio cliff bothers me a bit. I wonder if there's something special about the data the benchmark was ran on that triggers it. for _both_ compression algorithms.
> 
> @yxsamliu would it be possible for you to rerun the benchmarks one more with the data set split into 1/3 and 2/3 of the original input in size and see if compression ratio cliff happens at lower compression levels for smaller inputs?

The specialty about the data is that the bitcode for different GPU arch's are very similar, which is common for HIP, therefore the file to be compressed contains N similar portions for N GPU archs.

The following tables shows zstd level 20 results bundled bitcode for 2, 4, and 6 GPU arch:

| GPU Archs | Size Before (bytes) | Size After (bytes) | Compression Rate | Compress Time (s) | Decompress Time (s) |
|-----------|---------------------|--------------------|------------------|-------------------|---------------------|
| 2         | 22,819,940          | 4,390,242          | 5.20             | 2.0094            | 0.0293              |
| 4         | 45,639,848          | 4,392,548          | 10.39            | 2.0127            | 0.0391              |
| 6         | 68,459,756          | 4,394,991          | 15.58            | 2.1567            | 0.0429              |

You can see the compressed size, compression and decompression time are almost the same.

This means the more GPU archs, the better compression rate we will get.

Only zstd level 20 and above can achieve this.

https://github.com/llvm/llvm-project/pull/83297