[cfe-dev] Module build - tokenized form of intermediate source stream

Sean Silva via cfe-dev cfe-dev at lists.llvm.org
Wed Oct 14 18:25:07 PDT 2015


On Wed, Oct 14, 2015 at 4:12 PM, Richard Smith via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Wed, Oct 14, 2015 at 1:31 AM, Vassil Vassilev <vvasilev at cern.ch> wrote:
>
>> On 12/10/15 21:13, Richard Smith via cfe-dev wrote:
>>
>> On Mon, Oct 12, 2015 at 11:33 AM, Serge Pavlov via cfe-dev <
>> <cfe-dev at lists.llvm.org>cfe-dev at lists.llvm.org> wrote:
>>
>>> Hi all,
>>>
>>> Now building a module involves creation of intermediate source streams
>>> that includes/imports each header composing the  module. This source stream
>>> is then parsed as if it were a source file. So to build a module several
>>> transformations must be done:
>>> - Module map is parsed to produce module objects(clang::Module),
>>> - Module objects are used to build source stream (llvm::MemoryBuffer),
>>> which contains include directives,
>>> - The source stream is parsed to produce module content.
>>>
>>> The build process could be simpler, if instead of text source stream we
>>> prepared a sequence of annotation tokens, annot_module_begin,
>>> annot_module_end and some new token, say annot_module_header, which
>>> represented a header of a module. It would be something like pretokenized
>>> header but without a counterpart in file system.
>>>
>>> Such redesign would help in solving performance degradation reported in
>>> PR24667 ([Regression] Quadratic module build time due to
>>> Preprocessor::LeaveSubmodule). The reason of the problem is leaving module
>>> after each header, even if the next header is of the same module.
>>>
>>
>> We generally recommend that each header goes in its own submodule, so
>> optimizing for this case doesn't address the problem for a lot of cases.
>>
>> Is there a technical reason for this? Is there a difference (say bigger
>> module size or slower deserialization) between a header file per submodule
>> and a hearder file per standalone module?
>>
>
> The technical reason is that it gives more precise control over name
> export / import -- that is, if you don't do this then #including a modular
> header file can make too many names visible, and if you develop using that
> approach then your builds will likely fail due to use of undeclared names
> when you build without modules enabled.
>

My experience with modularizing (and the advice that I give to my
customers) is to first use "all the headers in the same module with no
submodules" approach, and then treat the submodule feature as a way of
tightening things up once they work. This seems to be the most
understandable and easiest, since during the initial step of making the
"one huge module with no submodules" the errors can be diagnosed very
similarly to PCH/textual inclusion, which is intuitive for users.
Incrementally tightening things up then happens at fine granularity and the
issues are easy to pinpoint.

The errors that occur when submodules are used tend to be extremely
difficult to deceipher since they cannot be debugged with a "textual"/"PCH"
mental model. I say this as a person who has tried and failed to modularize
significant amounts of real-world code before devising this approach as a
way to systematically succeed at the task.



> We stumble upon (see attachment) cases which compile just fine without
>> modules and with standalone modules (i.e. header per module). They do not
>> compile with the submodule model.
>>
>
> That's somewhat separate from what we're talking about; this also doesn't
> compile with the "all the headers in the same module with no submodules"
> approach. The problem here is that you're violating
> [basic.scope.declarative]p4, so your program is ill-formed, and with
> modules enabled Clang is able to detect and diagnose this. (I'm inclined to
> permit your example -- we should only really be diagnosing redeclaration
> conflicts between entities if either both have linkage or the old
> declaration is visible -- and if we did so then the
> one-submodule-per-header approach would work and the one-big-module
> approach would fail for your testcase.)
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151014/5ce33a6c/attachment.html>


More information about the cfe-dev mailing list