[PATCH] D69825: [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation

Alexandre Ganea via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Dec 2 15:21:22 PST 2019


aganea added a comment.

You're right indeed Russell, my bad, this gain is not as important as I initially claimed -- however there's a gain.

Several factors came into play:

1. The latest ninja 1.9.0 doesn't use all cpu sockets on some Windows systems (see this <https://github.com/ninja-build/ninja/pull/1674>), and as it uses +2 cores in addition to the detected # of cores, this possibly induces some unwanted side-effects (such as the speed-up seen in the summary).
2. I was using Clang 9.0 **RC4,** which is compiled with -DLLVM_ENABLE_ASSERTIONS=ON. When @hans sent the email for 9.0.1 I realized this flag is turned off only for the final release :-(
3. I was using all sorts of options (/MT /GS- /arch:AVX) which actually improved build times, when compared with the baseline Clang release (see graphs below)

We indeed have a security Windows service running on our machines which we //cannot// disable.

I have re-run the tests on 6 different configurations (see graph for specs)
Overall, this patch seems to reduce build times by about **30 sec** on many-cores machines, and about **15-20 sec** on 6-cores. The 28-core machine did not see any improvements, I cannot explain why.
The variability is also reduced when this patch is applied. This seems to be consistent across all machines.

WARNING: the X coordinate (time) does not start at zero

F10949529: cc1_4-core.png <https://reviews.llvm.org/F10949529>

F10949531: cc1_6-core_w7.png <https://reviews.llvm.org/F10949531>

F10949533: cc1_6-core_w10.png <https://reviews.llvm.org/F10949533>

F10949535: cc1_28-core_ws2016.png <https://reviews.llvm.org/F10949535>

F10949537: cc1_36-core-ws2016.png <https://reviews.llvm.org/F10949537>

F10949539: cc1_36-core-w10.png <https://reviews.llvm.org/F10949539>

In my sense, the patch remains valid overall:

- It is easier to debug with just one process.
- Later on, I'd like to put forward a patch which (optionally) replaces the CRT allocator with rpmalloc, and not having this current patch would have less impact.
- I have an alternate proposal for D52193 <https://reviews.llvm.org/D52193> and D69582 <https://reviews.llvm.org/D69582> which uses a thread pool instead of a process pool, and this patch is needed for that purpose (however that requires reviving & solving the thread-safe cl::opt RFC).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69825/new/

https://reviews.llvm.org/D69825





More information about the cfe-commits mailing list