[PATCH] D131062: [docs] Add "C++20 Modules"

H. Vetinari via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Aug 6 03:04:14 PDT 2022


h-vetinari added a comment.

> It would be greatly welcome for such comments!

OK, here goes. Sorry for the large volume of comments. In addition to typos and stylistic improvements, I've had a few questions where the content wasn't clear to me (but note I'm not experienced with modules at all, so this may be obvious to others). I've also added a few line breaks where Phab was awkwardly overflowing. The position of the linebreaks is obviously irrelevant, but I thought it might help further review.



================
Comment at: clang/docs/CPlusPlus20Modules.rst:11
+
+Modules have a lot of meanings. For the users of Clang compiler, modules may
+refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``,
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:13-14
+refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``,
+etc) and C++20 modules. The implementation of all kinds of the modules in Clang 
+share a big part of codes. But from the perspective of users, their semantics and
+command line interfaces are very different. So it should be helpful for the users
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:15-16
+share a big part of codes. But from the perspective of users, their semantics and
+command line interfaces are very different. So it should be helpful for the users
+to introduce how to use C++20 modules.
+
----------------
Not 100% what the intention of the last sentence is - I presume it sets the goal for this document?


================
Comment at: clang/docs/CPlusPlus20Modules.rst:20-21
+should be helpful to read `Clang modules <Modules.html>`_ if you want to know
+more about the general idea of modules. Due to the C++20 modules having very
+different semantics, it might be more friendly for users who care about C++20
+modules only to create a new page.
----------------
> Due to the C++20 modules having very different semantics, it might be more friendly for users who care about C++20 modules only to create a new page.

Isn't "C++20 modules" what this page intends to do? Which new page are we talking about then?


================
Comment at: clang/docs/CPlusPlus20Modules.rst:24-26
+Although the term ``modules`` has a unique meaning in C++20 Language Specification,
+when people talk about C++20 modules, they may refer to another C++20 feature:
+header units. So this document would try to cover header units too.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:31-35
+This document was intended to be pure manual. But it should be helpful to
+introduce some language background here for readers who are not familiar with
+the new language feature. This document is not intended to be a language
+tutorial. The document would only introduce concepts about the the
+structure and building of the project.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:49-51
+In this document, the term ``Clang Modules``/``Clang modules`` refer to Clang
+c++ modules extension. It is also known as ``Clang header modules``,
+``Clang module map modules`` or ``Clang c++ modules``.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:64-66
+Things in ``[]`` means optional. The syntax of ``module_name`` and ``partition_name``
+in regex form should be ``[a-zA-Z_][a-zA-Z_0-9.]*``. The dot ``.`` in the name has
+no special meaning.
----------------
> The dot ``.`` in the name has no special meaning.

Not sure if this is intended to say that the dot in the regex is not needed, or that it has no semantic significance.

Also, when wanting to match a literal "." in a regex, I'd consider it beneficial for clarity to escape it ("\."), even though it does what's intended in the context of [].


================
Comment at: clang/docs/CPlusPlus20Modules.rst:78-80
+A primary module interface unit is a module unit whose module declaration is
+``export module module_name;``. The ``module_name`` here denotes the name of the
+module. A module should have one and only primary module interface unit.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:94-98
+In this document, we call ``primary module interface unit`` and
+``module partition interface unit`` as ``module interface unit``. We call ``module
+interface unit`` and ``module partition implementation unit`` as
+``importable module unit``. We call ``module partition interface unit`` and
+``module partition implementation unit`` as ``module partition unit``.
----------------
This seems quite important for the terminology of the rest of the document, so I'd structure it in a way that stand out visually, e.g. the suggestion above.


================
Comment at: clang/docs/CPlusPlus20Modules.rst:147
+
+Then let's see a little bit more complex HelloWorld example which uses the 4 kinds of module units.
+
----------------
Missing space & inconsistent spelling compared to above.

I'd personally write it as (including the quotes): "hello world" example


================
Comment at: clang/docs/CPlusPlus20Modules.rst:189
+
+Then we could compile the example by the following command:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:213-214
+
+Currently, C++20 Modules is enabled automatically if the language standard is ``-std=c++20`` or newer.
+The ``-fmodules-ts`` option is deprecated and is planned to be removed.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:219
+
+We could generate a module file for an importable module unit by ``--precompile`` option.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:224-226
+The file name of ``importable module unit`` must end with ``.cppm``
+(or ``.ccm``, ``.cxxm``, etc). The file name of ``module implementation unit``
+should end with ``.cpp`` (or ``.cc``, ``.cxx``, etc).
----------------
As a reader, I have no idea how to parse
>  must end with ``.cppm`` (or ``.ccm``, ``.cxxm``, etc)

On the one hand there's a clear restriction ("must"), on the other there's an open-ended list, whose contents are not clear to me.


================
Comment at: clang/docs/CPlusPlus20Modules.rst:228-230
+The file name of module files should end with ``.pcm``.
+The file name of the module file of a ``primary module interface unit`` should be ``module_name.pcm``.
+The file name of module files of ``module partition unit`` should be ``module_name-partition_name.pcm``.
----------------
Similarly here; can we describe the origin or the restrictions should/must - i.e. do they come from the standard or from the Clang implementation? Is there any enforcement, or do things just error otherwise?


================
Comment at: clang/docs/CPlusPlus20Modules.rst:235-236
+
+We could use ``-fprebuilt-module-interface`` to tell the compiler the path to search the dependent module files.
+``-fprebuilt-module-interface`` could occur multiple times just like ``-I``.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:238
+
+Another way to specify the dependent module files is to use ``-fmodule-file``.
+
----------------
Is this equivalent to "-fprebuilt-module-interface". Are there relevant differences, and if so, should the reader of this document know about them?


================
Comment at: clang/docs/CPlusPlus20Modules.rst:238
+
+Another way to specify the dependent module files is to use ``-fmodule-file``.
+
----------------
h-vetinari wrote:
> Is this equivalent to "-fprebuilt-module-interface". Are there relevant differences, and if so, should the reader of this document know about them?
Moving this up from further down, but I might be getting things wrong.


================
Comment at: clang/docs/CPlusPlus20Modules.rst:240-241
+
+When we compile ``module implementation unit``, we must pass the module file of the corresponding ``primary module interface unit`` by ``-fmodule-file``.
+The ``-fmodule-file`` option could occur multiple times. For example, the command line to compile ``M.cppm`` in the above example could be rewritten into:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:247
+
+``-fprebuilt-module-interface`` is more convenient and ``-fmodule-file`` is faster since it would save the time for file lookup.
+
----------------
 I didn't understand what would be more convenient at first, but I'm guessing that one takes a directory and the other takes a file? I made a suggestion further up how to reflect this.


================
Comment at: clang/docs/CPlusPlus20Modules.rst:252-253
+
+It is easy to forget to link module files at first since we may envision module interfaces like headers. It is not true.
+Module units are translation units. We need to compile them and link them like the example shows.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:261-262
+
+If we envision modules as a cache to speed up compilation, then it is important to keep the cache consistency as other cache techniques.
+So **currently** Clang would do very strict check for consistency.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:296
+
+Note that **currently** the compiler don't think it is a problem about inconsistent macro definition. For example:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:304
+
+Currently Clang would accept the above example. But it may produce surprising result if the debugging code dependents on each other.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:309
+
+When the compiler reads a module file, the compiler would check the consistency of the corresponding source files. For example:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:343
+
+But it is OK to move the module file as long as the source files remained:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:364
+Now the compiler would accept the above example.
+Important note: XClang options are intended to be used by compiler internally and its semantics are not guaranteed to preserve in future versions.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:366-367
+
+How module speed up the compilation
+-----------------------------------
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:371-372
+if there are ``n`` headers and ``m`` source files and each header is included by each source file, then the complexity of the compilation is ``O(nm)``;
+But if there are ``n`` module interfaces and ``m`` source files, the complexity of the compilation is ``O(n+m)``. So the modules would get a big win
+when scaling. In a simpler word, we could get rid of many redundant compilations by using modules.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:374-375
+
+Roughly, the theory is correct. But the problem is that it is too rough. Let's see what would happen actually. And it depends on the optimization level
+actually. 
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:400-403
+But the imported codes would only get involved in semantical analysis, which is mainly about name lookup, overload resolution and template instantiation.
+All of these processes are fast to the whole compilation process.
+The imported codes could save the time for the fronend code generation and the whole middle end and the backend.
+So we could get big win for the compilation time in O0.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:426-428
+It is not acceptable if we get performance loss after we use modules. The main concern is that when we compile a source file, the compiler need to see the function body
+of imported module units so that the compiler could perform IPO (InterProcedural Optimization, primarily inlining in practice) to optimize functions in current source file
+by the information provided by the imported module units.
----------------
"It is not acceptable" -> "It would be very unfortunate"

Since it sounds like that's what will happen when switching on optimization?


================
Comment at: clang/docs/CPlusPlus20Modules.rst:449-450
+The linkage name of ``NS::foo()`` would be ``_ZN2NSW1M3fooEv``.
+This couldn't be demangled by low versions of debugger or demangler.
+User could use ``llvm-cxxfilt`` since 15.x to demangle this:
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:458-459
+
+The ABI implies that we can't declare something in a module unit and define it in a non-module unit (or vice-versa).
+Since it would meet linking errors.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:464-466
+Here lists some problems of modules. You could visit https://github.com/llvm/llvm-project/labels/clang%3Amodules for more issues
+or file a new issue if you don't find an existing one. If you're going to create a new issue for C++20 modules, it is better to add
+tags ``clang:modules`` and start the title with ``[C++20] [Modules]``.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:473-478
+The support for clang-scan-deps may be the most urgent problem for modules now.
+Without the support for clang-scan-deps, the build systems are hard to get involved.
+So that the users could only play modules with makefiles or even write a parser by hand.
+It blocks more uses for modules, which will block more defect reports or requirements.
+
+We could track the issue at: https://github.com/llvm/llvm-project/issues/51792.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:498
+
+We could tract the issue at: https://github.com/llvm/llvm-project/issues/56916
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:506
+
+We could tract the issue at: https://github.com/llvm/llvm-project/issues/56490
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:511-513
+This is covered by P1857R3. We mentione it again here since users may abuse it before we implement it.
+
+The users may want to write codes which could be compiled by modules or non-modules. A direct idea would be use macros like:
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:523-528
+So this file could be triggered like a module unit or a non-module unit by the definition of the macros.
+However, the usage is forbidden by P1857R3 but we haven't implemented P1857R3.
+So it is possible to write illegal modules codes now.
+A simple suggestion would be "Don't play macro tricks with module declarations".
+
+We could tract the issue at: https://github.com/llvm/llvm-project/issues/56917
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:564
+
+We could use ``-fmodule-file`` to specify the module files. ``-fmodule-file`` could occur multiple times too.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:574-575
+
+Another difference with modules is that we can't compile the module file.
+It makes sense due to the semantics of header unit are just like headers.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:641-644
+Then here will be a direct question: why don't we implement header units by Clang header modules.
+
+Since Clang modules have more semantics like hierarchy, wrapping multiple headrs together as a big module. All of this are not part of C++20 Header units.
+We're afraid that users may abuse these features and think they are using C++20 things.
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:646-647
+
+Another reason is that there are proposals to introduce module mappers to the C++ standard (for example, https://wg21.link/p1184r2).
+Then if we decide to resued Clang's modulemap, we may get in problem once we need to introduce antoher module mapper.
+
----------------



================
Comment at: clang/docs/CPlusPlus20Modules.rst:649-651
+So the final answer for why don't we reuse the interface of Clang modules for header units is that
+we've see some differences between header units and Clang modules and we guess the differences may be bigger
+so that we can't accept it.
----------------



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131062/new/

https://reviews.llvm.org/D131062



More information about the cfe-commits mailing list