[cfe-dev] Why clang needs to fork into itself?

Wed Jan 29 10:04:26 PST 2014

On Wed Jan 29 2014 at 12:38:27 AM, Manuel Klimek <klimek at google.com> wrote:

> On Wed, Jan 29, 2014 at 8:58 AM, Richard Smith <richard at metafoo.co.uk>wrote:
>
> On Tue, Jan 28, 2014 at 11:40 PM, Manuel Klimek <klimek at google.com> wrote:
>
> On Wed, Jan 29, 2014 at 8:05 AM, Ted Kremenek <kremenek at apple.com> wrote:
>
> If this can be done, then great.
>
> Yury Gribov's point about stack smashing is a good one.  We implemented
> such crash recovery mechanisms in libclang and libclang still takes down
> Xcode sometimes because of stack overflows due to unbounded recursion, etc.
>  We've also noticed that when libclang "crashes" (and recovers) the overall
> process can be in an undefined state.  Our experience is that such
> histrionics can provide an 80% solution, but we've never been all that
> satisfied with them.  It may, however, be good enough for generating crash
> reports, but it seems like a lot of work to replace something we already
> have that works very well in practice.
>
>
> One thing I'd be curious about:
> What would be the downside of using the parent clang *just* as crash
> reporter - and push all driver logic into the spawned process. The parent
> clang would then forward all command line arguments literally, and then
> just sit there, waiting for the crash, and if there is a crash scrap up all
> it needs for the bug report.
> That way, we could easily turn off forking and have equivalent behavior
> minus crash reporting.
> Is there a downside to that that I'm missing (apart from the time needed
> to implement it)?
>
>
> We still lose all the simplicity benefits of having the driver run one
> subprocess per compilation, and we'd need to turn off -disable-free in the
> multiple-files case, and we'd need to worry about having a large VMSize if
> we fork to spawn another process (for instance, the linker). Having a
> separate process for the compiler from the driver is a much cleaner model.
>
> That said, most users of the compiler don't actually *want* a driver. They
> want just a .c* -> .o compiler, not a multi-source-file cross-language
> compiler + assembler + linker + kitchen sink, and it makes sense to me to
> optimize for the single-source, -c case.
>
>
> Reiterating my question from earlier: how does that model match with the
> recursive compilation step for modules in CompilerInstance.cpp? Will the
> -disable-free basically kill our memory if we are in a larger module
> dependency chain that needs to be rebuilt? Do we somehow disable recursive
> modules compilation in that mode? Or do you think nobody will hit that
> case? (I actually hit it "by chance" the first time I was playing around
> with modules, without actively wanting it, and was quite surprised by what
> happened ;)
>

Sorry I missed this the first time around!

I think this is a really interesting question. With a bit of handwaving, we
can try to argue that the modules-based build "should" never compile much
more source than the same build would without modules, so the increase in
memory usage may be fine, but that seems like an unsatisfying answer. I
don't think we currently do anything about this. I'd not be that surprised
if we find (when deploying modules on very large codebases) that we want to
perform some module builds in a separate process for some reason --
especially for C++ modules, where the amount of source code in header files
tends to be much larger.

One issue I've seen before is that Linux (in some modes) will fail a
'fork()' call from a process with too high a VMSize. So modules alone might
not cause a problem (because the frontend doesn't fork) where modules +
compilation-in-driver could.

> I agree that most people nowadays just want .c* -> .o compiler, but if I
> look at the complexity of what Java grew as part of their module system
> (where many people also then basically dump it down to a simple .java ->
> .class compiler), I'd be interested to see a plan of how we imagine this to
> look down the road.
>
>
>
>  On Jan 28, 2014, at 10:28 PM, Yuri <yuri at rawbw.com> wrote:
>
> > On 01/28/2014 22:04, Yury Gribov wrote:
> >>
> >> Makes sense but what if some important bits (say argv) are trashed by
> stack overflow
> >
> > All information needed for crash reporting should be copied into the
> fixed memory area, and it should be made read-only for the duration of run.
> >
> > Yuri
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140129/79ba1cd6/attachment.html>