[cfe-dev] [RFC] Compressing AST files by default?

Mehdi Amini via cfe-dev cfe-dev at lists.llvm.org
Fri Oct 21 12:29:48 PDT 2016


> On Oct 21, 2016, at 12:13 PM, Gábor Horváth <xazax.hun at gmail.com> wrote:
> 
> Hi!
> 
> On 20 October 2016 at 18:12, Mehdi Amini via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> 
> > On Oct 20, 2016, at 2:23 AM, Ilya Palachev via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> >
> > Hi,
> >
> > It seems that compressing AST files with simple "gzip --fast" makes them 30-40% smaller.
> > So the questions are:
> > 1. Is current AST serialization format really non-compressed (only abbreviations in bit stream format)?
> > 2. Is it worthwhile to compress AST by default (with -emit-ast)?
> > 3. Will this break things like PCH?
> > 4. What's the current trade-off between PCH compile time and disk usage? If AST compression makes compilation a bit slower, but reduces the disk usage significantly, will this be appropriate for users or not?
> 
> Is there a need for this disk usage? If the main use of AST files is C++ modules / PCH, what is a typical size for a module cache directory?
> (Compression is expensive)
> 
> 
> In some cases compression can actually improve the peformance, because in some cases the bottleneck is the I/O, and less data read from the disk and a fast decompression can be beneficial to the overall performance. 

Are you speculating or do you have numbers on the actual AST writer/reader?

Also a good starting point would be to consider not storing the AST as “blob” in the bitcode but using proper abbrev.

> 
> In case someone wants to do a whole project analysis on merged ASTs, this compression can be a very significant saving. Dumping all of LLVM and Clang TUs to the disk occupies about 45 GB of disk space at the moment.

Sure, adding a compression layer on top for this particular application seems interesting, but you don’t need to have it on *by default* to support your use case though. 
Having it always-on would require as a starting point to look closely at the impact on memory/time when including modules for example.


> 
>  
> 
>> Mehdi
> 
> 
> >
> > LLVM already has a support for compression (functions compress/uncompress in include/llvm/Support/Compression.h).
> >
> > Best regards,
> > Ilya Palachev
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161021/667f05e2/attachment.html>


More information about the cfe-dev mailing list