[llvm-dev] [RFC] Generating LLD reproducers on crashes

Petr Hosek via llvm-dev llvm-dev at lists.llvm.org
Thu Apr 15 01:37:27 PDT 2021


lld crashes are more rare, but they do happen. For example, we see lld
segfaulting occasionally on our bots. I'd like to fix it, but I don't know
how to reproduce this issue because we never managed to reproduce it
locally. This is primarily where the motivation for this feature came from.
In the case of Clang, we already configure our build to generate
reproducers in a dedicated directory and at the end of the build we upload
its content to a dedicated (short lived) storage bucket. We would like to
do the same with lld and if this feature existed, we would use it in our
build.

The size of the reproducers is not really an issue; even if they are a few
gigabytes, they're still dwarfed by the size of the debug info, at least in
our build.

Passing -Wl,--reproduce is something a compiler engineer can do when
debugging an issue locally, but it's not something a bot can do. Even most
developers on our team wouldn't know how to do it which is why the
automatic crash reproducer generation in Clang is so valuable, all that
developers need to do is to follow the instructions without having to
modify the build and we've had great success with it in the case of Clang.

I'm leaning towards the second option, that is implementing this feature
directly in lld. The reason is that we most often see lld crashes when
linking Rust code. If we implemented this feature in the Clang driver, we
would also need to do the same inside the Rust driver (and any other
compiler driver that supports lld). If we implement it in lld, we only need
to do it once, so it's more universal.

On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at google.com> wrote:
> >
> > > I am skeptical that users will want to have this behavior by default.
> > > If this behavior is guarded by an option, it might be fine.
> >
> > That's a good point. If the reproducer will be more than a few hundreds
> MiBs, it is definitely not suitable to be enabled by default. I agree it's
> better to be guarded by an option flag such as `--gen-lld-crash-reproducer`.
> >
> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song <maskray at google.com> wrote:
> >>
> >>
> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
> >> >*Background / Motivation*
> >> >
> >> >Both clang and lld have the ability to generate a reproducer (an
> archive
> >> >with input files and invoker script to reproduce the clang/lld build).
> >> >While clang will generate a reproducer archive when a crash happens,
> lld
> >> >only generates a reproducer when '--reproduce' flag is explicitly set
> (this
> >> >is equivalent to Clang's -gen-reproducer flag). This is not very
> helpful
> >> >for debugging lld bugs, particularly when the crash happens in
> building big
> >> >projects, since it will be unrealistic to set reproducer flags to
> generate
> >> >reproducer archives for every lld invocation. This design also causes
> >> >troubles when the crash happens on bots only, as in most cases,
> developers
> >> >do not have access to the file system of these bots. It would be great
> to
> >> >improve the lld reproducer generation for easier debugging in these
> >> >scenarios.
> >> >
> >> >*Proposal*
> >> >
> >> >Given the use cases and status of clang and lld. I think there are 2
> >> >possible solutions.
> >> >
> >> >*Extend Clang driver*
> >> >In most cases, lld is invoked by the clang driver instead of being
> invoked
> >> >by the build system directly. Therefore, the clang driver can be
> changed to
> >> >re-invoke lld with '--reproduce' flags when it detects the lld
> subprocess
> >> >is crashed.
> >> >
> >> >Advantages:
> >> >    * It probably does not require any changes to the lld and might be
> >> >easier than handling the crash directly in lld.
> >> >
> >> >Disadvantages:
> >> >    * In case when there is a racing condition in the build system, the
> >> >input files might have changed between 1st lld crash and 2nd lld rerun
> with
> >> >'--reproduce' flag. In this case, the generated lld reproducer archive
> >> >might not be able to trigger a crash, makes it less useful.
> >> >
> >> >*Improve lld reproducer*
> >> >Another way would be to make lld generate a reproducer archive when it
> >> >crashes, just like what clang is doing.
> >> >
> >> >Advantages:
> >> >    * It will work no matter if lld is invoked from Clang or from the
> build
> >> >system.
> >> >    * It will catch the input file in case the crash is caused by build
> >> >races.
> >> >
> >> >Disadvantages:
> >> >    * It might need a lot of work if lld does not already have a
> >> >sophisticated crash handler. It might still need some plumbing changes
> in
> >> >clang driver so lld can honor the '-fcrash-diagnostic-dir' flag.
> >> >
> >> >*Comments?*
> >> >Which approach do you prefer? Feel free to share your opinions.
> >>
> >> There is a resource difference between clang -gen-reproducer /
> >> environment variable "FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
> --reproduce.
> >>
> >> clang -gen-reproducer produces a source file and a .sh file for one
> >> single translation unit, the space consumption is low.
> >> ld.lld --reproduce can potentially pack a large list of files, which may
> >> take hundreds of megabytes or several gigabytes.
> >>
> >> I am skeptical that users will want to have this behavior by default.
> >> If this behavior is guarded by an option, it might be fine.
>
> I'll retract my words about an option. This behavior looks like it
> needs a fair bit of customization and is build system dependent.
> You can replace the proposed option with a shell script wrapper, which
> is more convenient than implementing the restartable action in the
> clang driver.
> When dealing with linker problems, (I doubt there are many nowadays;
> when there are problems, mostly are LTO problems), I will usually
> change compiler/linker options a bit.
> If you do this, you may only specify the proposed option when all the
> stuff has been done, but then it is only a very small extra step to
> invoke the link again with -Wl,--reproduce.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210415/e0127c80/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3996 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210415/e0127c80/attachment.bin>


More information about the llvm-dev mailing list