[PATCH] D119409: [C++20] [Modules] Remain dynamic initializing internal-linkage variables in module interface unit

Sun Feb 20 18:48:18 PST 2022

ChuanqiXu added a comment.

In D119409#3332313 <https://reviews.llvm.org/D119409#3332313>, @dblaikie wrote:

> (maybe relevant: For what it's worth: I originally implemented inline function homing in modules codegen for Clang Header Modules - the results I got for object file size in an -O0 build were marginal - a /slight/ win in object file size, but not as much as we might've expected. Part of the reason might be that there can be inline functions that are never called, or at higher optimization levels, inline functions that always get inlined (via "available externally" definitions) - in that case, providing "homed" definitions creates inline function definitions that are unused during linking/a waste of space. It's possible the workload I was dealing with (common Google internal programs) skewed compared to broader C++ code - for instance heavy use of protobufs could be leading to a lot of generated code/inline functions that are mostly unused. I didn't iterate further to tweak/implement heuristics about which inline functions should be homed. I'm not sure if Richard Smith made a choice about not homing inline functions in C++20 modules because of these results, or for other reasons, or just as a consequence of the implementation - but given we had the logic in Clang to do inline function homing for Clang Header Modules, I'm guessing it was an intentional choice to not use that functionality in C++20 modules when they have to have an object file anyway)

Thanks for sharing this. I didn't consider code size before. I agree the result should depends on the pattern of the program. I guess the code size may increase or decrease between different projects.

> Richard and I discussed taking advantage of this kind of new home location, certainly for key-less polymorphic classes. I was against it as it was more work :) Blame me.

>From my experience, it depends on how well we want to implement. A plain implementation is relatively easy. It would speed up the compilation significantly in **O0**. But it doesn't work well with optimization turned on, since we need to do optimization and we must import all the function  by `available_externally` to enable a complete optimization. In this case (with optimization), we could only save the time for Preprocessor, Parser, Semantic analysis and backend. But the big part of compilation takes on the middle end and we need to pay for the serialization and deserialization. My experiment shows that we could only get 5% improvement on compilation time with named module in optimization turned on. (We could offer an option to make it compile fast at On by not emitting functions in other module unit, but it would hurt the performance obviously).

A good implementation may attach optimized IR to the PCM files. Since the standard talks nothing about CMI/BMI (PCM), we are free to compile it during the middle end and attach the optimized IR to PCM files. And when we imports the optimized PCM, we could extract the optimized function on need. We could mark such functions with a special attribute (like 'optimized_available_externally'?) to tell the compiler not optimize it and delete it after middle end optimization gets done. Such functions is only available for inlining (or any other IPO). I think this wouldn't hurt performance and we could get a win for the compilation speed. But I agree this is not easy to implement.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119409/new/

https://reviews.llvm.org/D119409