[llvm] [LLVM] add LZMA for compression/decompression (PR #83297)

Thu Feb 29 07:06:36 PST 2024

aganea wrote:

I do have a different perspective here. I worked with LZMA in the past and it is by far one of the best compression schemes out there in many regards. I do not understand the assertion around its decompression speed. Compression is certainly slower, but it is not *a lot* slower than the competition. I also **do have** practical use-cases today for its use, as opposed to "weaker" compression formats.

In the past I had a realtime streaming LZMA decompressor running on a 16 MHz ARM7TDMI, sharing timeslices along with many other runtime jobs for rendering a video stream. Admittedly it was hand optimized asm, but we had the same issues with memory latency as today, and the low bitrate of LZMA stream allowed less data to be read from the ROM. The gap has widened today, memory reads are a lot more expensive that cpu cyles, even if that data is already in the caches. Most likely the LZMA window would have to be crafted for today's cache hierarchy/target CPU architecture.

Even though COFF doesn't support internal compression today AFAIK, I tried compressing the .OBJ files in this LLVM build folder: `stage1\tools\clang\unittests\Tooling\CMakeFiles\ToolingTests.dir` on a modern Ryzen 9 Windows machine:

| Compressor | Compression time | Decompression time | Size |
| ------------- | -------------------- | ----------------------- | ---- |
| None | | | 383 MB |
| 7z.exe 23.00 `-tzip a files.zip *.obj` | 3 sec | 1.2 sec | 44.7 MB |
| zstd.exe 1.5.5 `-9 -f *.obj -o files.zstd` | 3 sec | 0.008 sec | 36.9 MB |
| zstd.exe 1.5.5 `-19 -f *.obj -o files.zstd` | 1min 46 sec | 0.210 sec | 30.8 MB |
| 7z.exe 23.00 `a files.7z *.obj` | 5.5 sec | 0.500 sec | 26.2 MB |
| 7z.exe 23.00 `-mx9 a files.7z *.obj` | 26 sec | 0.791 sec | 18.2 MB |

All figures are single-threaded. The assumption is that `libzstd` and `liblzma` have the same performance as their executables counter-part.

A practical counter-argument to the comp/decomp speed (which does not seem to be that terrible at the light of the figures I'm seeing above) is that people working from home are usually on poor/not-that-good internet connections. Upload speed from their end isn't that great, but their CPU power is. To avoid on cloud costs, it makes sense to distribute compilation on a private network on user's PC, which includes at-home PCs. In this case, the size of the generated assets/.OBJs is more important than the time spent on compressing/decompression them, as long as it remains within reasonable terms. If 40 sec are spent on compiling an .OBJ and 2-3 secs on compression, this has a great value if it generates 2x smaller assets.

However I understand that this figures could be different if compressing individual sections within a DWARF file.

I'd like to give the OP the benefit of the doubt, if they can come with tangible figures for their use-case? In terms of compression/decompression speed, and size, comparing with existing compression schemes in LLVM. @yxsamliu  

https://github.com/llvm/llvm-project/pull/83297