[cfe-dev] [RFC] Compressing AST files by default?

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Fri Oct 21 12:48:23 PDT 2016


On 21 October 2016 at 21:26, Richard Smith via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Thu, Oct 20, 2016 at 2:23 AM, Ilya Palachev via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi,
>>
>> It seems that compressing AST files with simple "gzip --fast" makes them
>> 30-40% smaller.
>> So the questions are:
>>  1. Is current AST serialization format really non-compressed (only
>> abbreviations in bit stream format)?
>>  2. Is it worthwhile to compress AST by default (with -emit-ast)?
>>  3. Will this break things like PCH?
>>  4. What's the current trade-off between PCH compile time and disk usage?
>> If AST compression makes compilation a bit slower, but reduces the disk
>> usage significantly, will this be appropriate for users or not?
>>
>> LLVM already has a support for compression (functions compress/uncompress
>> in include/llvm/Support/Compression.h).
>
>
> The current AST format is designed for lazy, partial loading from disk; we
> make heavy use of file offsets to pull in only the small portions of AST
> files that are actually used. In a compilation using hundreds or thousands
> of AST files, it's essential that we don't load any more than we need to
> (just the file headers) since we should need essentially nothing from
> almost all loaded files.
>
> Any approach that requires the entire file to be decompressed seems like a
> non-starter. I would expect you could get something like the 30-40%
> improvements you're seeing under gzip by making better use of abbreviations
> and using smarter representations generally. There is some easy low-hanging
> fruit here.
>

I agree that I did see some low hanging fruits in the serialized AST
format. I did one measurement to see which parts of the ASTs are
contributing the most to the AST dumps' size. For the details see the json
attached to this mail:
http://clang-developers.42468.n3.nabble.com/Two-pass-analysis-framework-AST-merging-approach-tp4051301p4052577.html


>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20161021/7c0fe966/attachment.html>


More information about the cfe-dev mailing list