[cfe-dev] Why clang needs to fork into itself?

Wed Jan 29 00:39:09 PST 2014

On Tue, Jan 28, 2014 at 11:58 PM, Richard Smith <richard at metafoo.co.uk>wrote:

> On Tue, Jan 28, 2014 at 11:40 PM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Wed, Jan 29, 2014 at 8:05 AM, Ted Kremenek <kremenek at apple.com> wrote:
>>
>>> If this can be done, then great.
>>>
>>> Yury Gribov’s point about stack smashing is a good one.  We implemented
>>> such crash recovery mechanisms in libclang and libclang still takes down
>>> Xcode sometimes because of stack overflows due to unbounded recursion, etc.
>>>  We’ve also noticed that when libclang “crashes” (and recovers) the overall
>>> process can be in an undefined state.  Our experience is that such
>>> histrionics can provide an 80% solution, but we’ve never been all that
>>> satisfied with them.  It may, however, be good enough for generating crash
>>> reports, but it seems like a lot of work to replace something we already
>>> have that works very well in practice.
>>>
>>
>> One thing I'd be curious about:
>> What would be the downside of using the parent clang *just* as crash
>> reporter - and push all driver logic into the spawned process. The parent
>> clang would then forward all command line arguments literally, and then
>> just sit there, waiting for the crash, and if there is a crash scrap up all
>> it needs for the bug report.
>> That way, we could easily turn off forking and have equivalent behavior
>> minus crash reporting.
>> Is there a downside to that that I'm missing (apart from the time needed
>> to implement it)?
>>
>
> We still lose all the simplicity benefits of having the driver run one
> subprocess per compilation, and we'd need to turn off -disable-free in the
> multiple-files case, and we'd need to worry about having a large VMSize if
> we fork to spawn another process (for instance, the linker). Having a
> separate process for the compiler from the driver is a much cleaner model.
>

There is another advantage. More and more we are lifting the host-specific
behavior into the driver rather than the compiler proper. The internal
compiler invocation thus has a canonical set of flags rather than a
platform specific flags, and it captures *numerous* behavioral reflections
of the host system. This too is very useful in capturing bug reports
accurately. While we might be able to successfully extract the exact
internal flag state necessary to reproduce things and serialize it,
exclusively using that serialization does help ensure this always holds
true.

The problem I really have with all of this is that we are acting like we
won't have bugs and problems with this clever signal-based crash handling
solution. My experience with signal handling and running a wide variety of
Unix-like operating systems is that this is absolutely not true. I think
the maintenance cost of complex and subtle signal management will be
extremely high, and to me it doesn't (yet) seem likely to be worth the
cost. Currently, compiling C++ (or even moderately complex C, ObjC, etc.)
is still massively slower than a subprocess invocation even on Windows. I
don't think we should even consider crossing this bridge until that changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140129/c4e0ef92/attachment.html>