> Ignoring FE time which can be fully parallelized and assuming 10%
> compile time is spent in serial module passes, 25% time is spent in
> CGSCC pass, the maximum speed up that can be gained by using function
> level parallelism is less than 3x.  Even adding support for parallel
> compilation for leaves of CG in CGSCC pass won't help too much -- the
> percentage of leaf functions is < 30% in large apps I have seen.

Can you clarify what you're basing these assumption on or how you derived
your data?

> Module based parallelism proposed by Shuxin has max speed up of 10x,
> assuming body cloning does not add a lot overhead and build farm with
> hundred/thousands of nodes is used.

Body cloning does add some overhead, so that actually needs to be measured.
Also, many don't have such a build farm.
