<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 17 January 2017 at 14:45, Hamza Sood <span dir="ltr"><<a href="mailto:hamza_sood@me.com" target="_blank">hamza_sood@me.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks for clarifying parts of the current implementation. I wasn’t sure what’s incomplete and what’s by design.<br>

<span class=""><br>

> Rather than producing two ASTs, it would be preferable to simply export less of the AST into the pcm file. As noted above, this optimization is not yet implemented (along with some of the semantics of the Modules TS).<br>

</span>Since it’s currently possible to generate a complete object file from a pcm, I assumed that such an optimisation wouldn't be possible with the current format. In fact a fully optimised pcm is pretty much what I was trying to describe here, but I wasn’t sure if being able to go from pcm -> obj is an essential part of what a pcm is.<br></blockquote><div><br></div><div>Our pcm format is not immutable; we are free to make such changes if necessary. One thing that might not be immediately obvious: in a highly parallel build, it can be beneficial to avoid blocking downstream compiles on the step that generates object code from a module interface. That is, we may want to generate a .pcm file without generating object code, and then later generate the object code from it, to improve build performance. This doesn't necessarily mean that the .pcm file must contain all function definitions -- we could generate the object file for the module interface by re-parsing the .cppm file -- but there's a tradeoff between parallelism and total CPU time in doing so.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Just writing less of the AST to a file is certainly better than producing two ASTs, and I attempted that with my original tests. However I wasn’t able to find anything in ASTWriter that lets you to control which parts of the AST are written; all I could get working is producing a second AST from the original (with modifications of course) and passing that through to ASTWriter. Is there an API that I missed?</blockquote><div><br></div><div>We have no real support for this yet, but it doesn't seem especially hard to add the ability to filter during AST emission. The interesting part will be determining what can be safely filtered out. Example: an exported template makes a call to a function with unqualified name 'foo'; can we still discard any non-exported functions named 'foo' in the module interface? Those functions might be found by ADL.</div><div><br></div><div>Also note that this affects linkage: even internal, non-exported functions in the module interface might be called that way, and if so, we need some way to link the symbol references in those template instantiations to the code we emitted for the module interface.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> Clang's module files are explicitly not a distribution format. You are expected to ship your module interface files, not a precompiled form of them.<br>

</span>Would library developers want to ship their module interfaces considering they could potentially contain a lot of code?<br>

Microsoft for example have come up with a distributable binary format so that library developers don’t have to ship their module interface files.</blockquote><div><br></div><div>Considering that the module interface can, and often will, contain code that is in some way conditional on the environment (for instance, on the size of 'int', or on whether certain headers or functions are provided by the environment, or on certain details of their standard library implementation -- and so on), it is not clear that Microsoft's approach is feasible for a non-single-vendor environment. Even trivial concerns such as whether assert(X) in an inline function or template in a module interface require precompiled module interfaces for the same .cppm file. At this point, the idea of a redistributable binary module interface format seems misguided, but we'll have to see how usage patterns develop and whether they ever start to make sense.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> Even with modules, large codebases will still want to maintain an interface / implementation separation discipline, in order to avoid every change to a low-level library's implementation triggering unnecessary recompilation of dependent code. (Keep in mind that a change that affects line numbers in a low-level library could affect the debug information generated for any transitive dependency, so we can't necessarily bail out of the compilation if the abstract interface of the module is unchanged.)<br>

</span>That brings up the question of how a module based build system would look, which I don’t think I’ve seen mentioned anywhere. Should the compiler be in charge by seeking out imported modules based on search paths and automatically building them if needed? Or should it be more like the dependency file generation that occurs with headers, which leaves a tool such as GNU make in charge?</blockquote></div><br></div><div class="gmail_extra">Historically, Clang's approach has been to provide a mode that requires no changes to build systems, in order to make transition to modules and sharing code between a modules build and a non-modules build straightforward, but that introduces many problems (particularly with parallel and distributed builds), and with the Modules TS we are already making a break with the past, so we should simply treat the act of building a module as a first-class action performed by a build. The compiler should not become a build system.</div><div class="gmail_extra"><br></div><div class="gmail_extra">This does mean that build systems will need to track interface dependencies in a way they didn't before (you need to know which module interfaces should be built before which other module interfaces), and that information will either need to be provided or detected by the build system. If a build system wishes to automate this, it would not be dissimilar to the #include scanning that some existing build systems already perform.</div></div>