[cfe-dev] Module build - tokenized form of intermediate source stream

Wed Oct 14 16:12:01 PDT 2015

On Wed, Oct 14, 2015 at 1:31 AM, Vassil Vassilev <vvasilev at cern.ch> wrote:

> On 12/10/15 21:13, Richard Smith via cfe-dev wrote:
>
> On Mon, Oct 12, 2015 at 11:33 AM, Serge Pavlov via cfe-dev <
> <cfe-dev at lists.llvm.org>cfe-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> Now building a module involves creation of intermediate source streams
>> that includes/imports each header composing the  module. This source stream
>> is then parsed as if it were a source file. So to build a module several
>> transformations must be done:
>> - Module map is parsed to produce module objects(clang::Module),
>> - Module objects are used to build source stream (llvm::MemoryBuffer),
>> which contains include directives,
>> - The source stream is parsed to produce module content.
>>
>> The build process could be simpler, if instead of text source stream we
>> prepared a sequence of annotation tokens, annot_module_begin,
>> annot_module_end and some new token, say annot_module_header, which
>> represented a header of a module. It would be something like pretokenized
>> header but without a counterpart in file system.
>>
>> Such redesign would help in solving performance degradation reported in
>> PR24667 ([Regression] Quadratic module build time due to
>> Preprocessor::LeaveSubmodule). The reason of the problem is leaving module
>> after each header, even if the next header is of the same module.
>>
>
> We generally recommend that each header goes in its own submodule, so
> optimizing for this case doesn't address the problem for a lot of cases.
>
> Is there a technical reason for this? Is there a difference (say bigger
> module size or slower deserialization) between a header file per submodule
> and a hearder file per standalone module?
>

The technical reason is that it gives more precise control over name export
/ import -- that is, if you don't do this then #including a modular header
file can make too many names visible, and if you develop using that
approach then your builds will likely fail due to use of undeclared names
when you build without modules enabled.

We stumble upon (see attachment) cases which compile just fine without
> modules and with standalone modules (i.e. header per module). They do not
> compile with the submodule model.
>

That's somewhat separate from what we're talking about; this also doesn't
compile with the "all the headers in the same module with no submodules"
approach. The problem here is that you're violating
[basic.scope.declarative]p4, so your program is ill-formed, and with
modules enabled Clang is able to detect and diagnose this. (I'm inclined to
permit your example -- we should only really be diagnosing redeclaration
conflicts between entities if either both have linkage or the old
declaration is visible -- and if we did so then the
one-submodule-per-header approach would work and the one-big-module
approach would fail for your testcase.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151014/920de23e/attachment.html>