[cfe-dev] [RFC] Compressing AST files by default?
Mehdi Amini via cfe-dev
cfe-dev at lists.llvm.org
Fri Oct 21 12:29:48 PDT 2016
> On Oct 21, 2016, at 12:13 PM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
> Hi!
>
> On 20 October 2016 at 18:12, Mehdi Amini via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> > On Oct 20, 2016, at 2:23 AM, Ilya Palachev via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> >
> > Hi,
> >
> > It seems that compressing AST files with simple "gzip --fast" makes them 30-40% smaller.
> > So the questions are:
> > 1. Is current AST serialization format really non-compressed (only abbreviations in bit stream format)?
> > 2. Is it worthwhile to compress AST by default (with -emit-ast)?
> > 3. Will this break things like PCH?
> > 4. What's the current trade-off between PCH compile time and disk usage? If AST compression makes compilation a bit slower, but reduces the disk usage significantly, will this be appropriate for users or not?
>
> Is there a need for this disk usage? If the main use of AST files is C++ modules / PCH, what is a typical size for a module cache directory?
> (Compression is expensive)
>
>
> In some cases compression can actually improve the peformance, because in some cases the bottleneck is the I/O, and less data read from the disk and a fast decompression can be beneficial to the overall performance.
Are you speculating or do you have numbers on the actual AST writer/reader?
Also a good starting point would be to consider not storing the AST as “blob” in the bitcode but using proper abbrev.
>
> In case someone wants to do a whole project analysis on merged ASTs, this compression can be a very significant saving. Dumping all of LLVM and Clang TUs to the disk occupies about 45 GB of disk space at the moment.
Sure, adding a compression layer on top for this particular application seems interesting, but you don’t need to have it on *by default* to support your use case though.
Having it always-on would require as a starting point to look closely at the impact on memory/time when including modules for example.
>
>
>
> —
> Mehdi
>
>
> >
> > LLVM already has a support for compression (functions compress/uncompress in include/llvm/Support/Compression.h).
> >
> > Best regards,
> > Ilya Palachev
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161021/667f05e2/attachment.html>
More information about the cfe-dev
mailing list