[cfe-dev] Module build - tokenized form of intermediate source stream

Thu Oct 15 02:41:33 PDT 2015

On 15/10/15 01:12, Richard Smith wrote:
> On Wed, Oct 14, 2015 at 1:31 AM, Vassil Vassilev <vvasilev at cern.ch 
> <mailto:vvasilev at cern.ch>> wrote:
>
>     On 12/10/15 21:13, Richard Smith via cfe-dev wrote:
>>     On Mon, Oct 12, 2015 at 11:33 AM, Serge Pavlov via cfe-dev
>>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>>         Hi all,
>>
>>         Now building a module involves creation of intermediate
>>         source streams that includes/imports each header composing
>>         the  module. This source stream is then parsed as if it were
>>         a source file. So to build a module several transformations
>>         must be done:
>>         - Module map is parsed to produce module objects(clang::Module),
>>         - Module objects are used to build source stream
>>         (llvm::MemoryBuffer), which contains include directives,
>>         - The source stream is parsed to produce module content.
>>
>>         The build process could be simpler, if instead of text source
>>         stream we prepared a sequence of annotation tokens,
>>         annot_module_begin, annot_module_end and some new token, say
>>         annot_module_header, which represented a header of a module.
>>         It would be something like pretokenized header but without a
>>         counterpart in file system.
>>
>>         Such redesign would help in solving performance degradation
>>         reported in PR24667 ([Regression] Quadratic module build time
>>         due to Preprocessor::LeaveSubmodule). The reason of the
>>         problem is leaving module after each header, even if the next
>>         header is of the same module.
>>
>>
>>     We generally recommend that each header goes in its own
>>     submodule, so optimizing for this case doesn't address the
>>     problem for a lot of cases.
>     Is there a technical reason for this? Is there a difference (say
>     bigger module size or slower deserialization) between a header
>     file per submodule and a hearder file per standalone module?
>
>
> The technical reason is that it gives more precise control over name 
> export / import -- that is, if you don't do this then #including a 
> modular header file can make too many names visible, and if you 
> develop using that approach then your builds will likely fail due to 
> use of undeclared names when you build without modules enabled.
Got it, thanks.
>
>     We stumble upon (see attachment) cases which compile just fine
>     without modules and with standalone modules (i.e. header per
>     module). They do not compile with the submodule model.
>
>
> That's somewhat separate from what we're talking about; this also 
> doesn't compile with the "all the headers in the same module with no 
> submodules" approach. The problem here is that you're violating 
> [basic.scope.declarative]p4, so your program is ill-formed, and with 
> modules enabled Clang is able to detect and diagnose this. (I'm 
> inclined to permit your example -- we should only really be diagnosing 
> redeclaration conflicts between entities if either both have linkage 
> or the old declaration is visible -- and if we did so then the 
> one-submodule-per-header approach would work and the one-big-module 
> approach would fail for your testcase.)
Yes it doesn't compile with "all the headers in the same module with no 
submodules". It compiles just fine with standalone modules (commenting 
out module Top in the example).

I am totally for allowing this to work. It will make the migration to 
modules in our case (maybe not only?) *a lot* easier. Could I help 
addressing this issue and where should I start from?

Vassil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151015/cf4e6204/attachment.html>