[cfe-dev] Macro history (de-)serialization implementation, need help figuring out some things

Mon Sep 17 07:57:45 PDT 2012

On Fri, Sep 7, 2012 at 7:06 PM, Sebastian Redl <
sebastian.redl at getdesigned.at> wrote:

>
> On 07.09.2012, at 13:40, Alexander Kornienko wrote:
>
> On Fri, Sep 7, 2012 at 8:45 AM, Sebastian Redl <
> sebastian.redl at getdesigned.at> wrote:
>
>> On 06.09.2012 05:03, Richard Smith wrote:
>>
>>> There is some documentation for PCH and the bitcode format it uses:
>>>
>>> http://clang.llvm.org/docs/**PCHInternals.html<http://clang.llvm.org/docs/PCHInternals.html>
>>> http://llvm.org/docs/**BitCodeFormat.html<http://llvm.org/docs/BitCodeFormat.html>
>>>
>>> I don't think we have any documentation for PCH chaining.
>>>
>>>  Well, that's grave neglect on my part. I need to rectify this
>> eventually.
>>
>> For the short term, PCH chaining is pretty simple in idea, if not in
>> implementation. Load an existing PCH file, parse some additional code, and
>> save the diff between the AST loaded from the PCH and the AST after parsing
>> as another PCH file that references the first. Now you have two chained PCH
>> files. Load the second, and it will automatically load the first and then
>> apply the diff.
>> This is easy for new AST nodes, but pretty hard for AST mutation.
>> Luckily, this is very rare.
>>
>> Sebastian
>
>
> Thanks for the explanation. But what problem does this feature intend to
> solve? What are use-cases for this?
>
>
> The main problem it was intended to solve was code completion speed in
> IDEs. Any given source file starts with a bunch of includes, typically
> (depending on one's preference) first some library headers, then some
> internal project headers. Only then comes the actual code.
>
> When you type an identifier in a Clang-powered IDE and request code
> completion, what happens is that the IDE calls on Clang to parse the file
> up to the point where code completion is requested, and Clang will return a
> list of possible completions, which the IDE then displays. In order to be
> useful, this has to be fast. If you've developed with Visual Studio, I'm
> sure you're familiar with how disrupting the delay in IntelliSense's
> reaction can be. If the project gets big and complicated, IntelliSense
> often becomes unusable simply because it takes several seconds to pop up.
>
> If you have to wait for Clang to parse your entire source file, including
> all the headers, you're going to wait just as long, especially for a
> complicated C++ project. The main way to speed this up is precompiled
> headers. Compile the headers first, then just load the binary format when
> you need to reparse the file. However, PCHs typically have to be
> configured. You take a set of headers that is common to your project and
> rarely changes (because rebuilding PCHs is slow) and tell the compiler to
> use it. But that still leaves the project-specific headers to be reparsed.
>
> So Clang has another feature, called the precompiled preamble (PCP).
> Basically, Clang will look at a source file, decide where the include
> directives for the file end (the preamble) and automatically build a PCH
> from that, which it will use when it needs to reparse the file. (I think
> the C API has ReparseTranslationUnit for this.) Once the preamble is built
> (which should happen once when the file is opened in the IDE), Clang only
> needs to reparse the actual source file, which can be done in less than a
> second usually, especially if the PCP is kept in memory.
>
> The downside of this approach is that it takes a long time to do the
> initial compiling of the preamble when you open the file. It would be a lot
> faster if you had a PCH of all the third party headers and just combined it
> with the file-specific part of the preamble into a PCP. And Clang used to
> be able to do that. You could use a PCH and it would load it completely
> (PCHs are usually loaded lazily by Clang), parse the new parts, and create
> a new PCH consisting of the combination. This is faster than reparsing
> everything, but it is still not fast enough (fully loading a PCH isn't very
> fast and needs a lot of memory), and each resulting PCP is rather big (tens
> of megabytes if not more), which is a problem if you want to keep all the
> PCPs for your open files (and I know enough programmers who keep dozens of
> files open) in memory for fast access.
>
> Enter chained PCH. You take your big third party library PCH as the
> primary. You created a diff for the rest of the preamble, which is usually
> nice and small (maybe a megabyte), and fast to create (because you only
> load the parts of the PCH that you need). You have one big block and a
> multitude of small blocks that reference it in memory, you have fast
> loading, fast parsing, and all-around goodness.
>
>
> As a side effect, the work on chained PCH made the PCH system more
> flexible and was thus the first step towards the true module system being
> developed in Clang.
>
> Sebastian
>

Thank you for the extensive explanation you gave. This and a live chat with
Doug helped me much!

-- 
Regards,
Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120917/cf8697af/attachment.html>