[cfe-dev] [RFC] Compressing AST files by default?

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Fri Oct 21 12:42:39 PDT 2016


On 21 October 2016 at 21:29, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Oct 21, 2016, at 12:13 PM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
> Hi!
>
> On 20 October 2016 at 18:12, Mehdi Amini via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>> > On Oct 20, 2016, at 2:23 AM, Ilya Palachev via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>> >
>> > Hi,
>> >
>> > It seems that compressing AST files with simple "gzip --fast" makes
>> them 30-40% smaller.
>> > So the questions are:
>> > 1. Is current AST serialization format really non-compressed (only
>> abbreviations in bit stream format)?
>> > 2. Is it worthwhile to compress AST by default (with -emit-ast)?
>> > 3. Will this break things like PCH?
>> > 4. What's the current trade-off between PCH compile time and disk
>> usage? If AST compression makes compilation a bit slower, but reduces the
>> disk usage significantly, will this be appropriate for users or not?
>>
>> Is there a need for this disk usage? If the main use of AST files is C++
>> modules / PCH, what is a typical size for a module cache directory?
>> (Compression is expensive)
>>
>
>
> In some cases compression can actually improve the peformance, because in
> some cases the bottleneck is the I/O, and less data read from the disk and
> a fast decompression can be beneficial to the overall performance.
>
>
> Are you speculating or do you have numbers on the actual AST writer/reader?
>
> Also a good starting point would be to consider not storing the AST as
> “blob” in the bitcode but using proper abbrev.
>


I do not have any numbers for this use case, but as far as I remember there
were some benchmarks for some executable packers like UPX that can reduce
the startup time of some applications in some cases.


>
>
> In case someone wants to do a whole project analysis on merged ASTs, this
> compression can be a very significant saving. Dumping all of LLVM and Clang
> TUs to the disk occupies about 45 GB of disk space at the moment.
>
>
> Sure, adding a compression layer on top for this particular application
> seems interesting, but you don’t need to have it on *by default* to support
> your use case though.
> Having it always-on would require as a starting point to look closely at
> the impact on memory/time when including modules for example.
>
>
>
>
>
>>
>>>> Mehdi
>>
>>
>> >
>> > LLVM already has a support for compression (functions
>> compress/uncompress in include/llvm/Support/Compression.h).
>> >
>> > Best regards,
>> > Ilya Palachev
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161021/eaed8792/attachment.html>


More information about the cfe-dev mailing list