[cfe-dev] Module build - tokenized form of intermediate source stream

Mon Oct 12 11:33:32 PDT 2015

Hi all,

Now building a module involves creation of intermediate source streams that
includes/imports each header composing the  module. This source stream is
then parsed as if it were a source file. So to build a module several
transformations must be done:
- Module map is parsed to produce module objects(clang::Module),
- Module objects are used to build source stream (llvm::MemoryBuffer),
which contains include directives,
- The source stream is parsed to produce module content.

The build process could be simpler, if instead of text source stream we
prepared a sequence of annotation tokens, annot_module_begin,
annot_module_end and some new token, say annot_module_header, which
represented a header of a module. It would be something like pretokenized
header but without a counterpart in file system.

Such redesign would help in solving performance degradation reported in
PR24667 ([Regression] Quadratic module build time due to
Preprocessor::LeaveSubmodule). The reason of the problem is leaving module
after each header, even if the next header is of the same module. Leaving
module after the last header would be a solution but it is problematic to
reveal if the header just parsed is the last one, - there is no such thing
as look ahead of the next include directive. Using tokenized input would
mark module ends easily.

Is there any reason why textual form of the intermediate source stream
should be kept? Does implementing tokenized form of it make sense?

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151013/f61b48fb/attachment.html>