[llvm-dev] [RFC] Generating LLD reproducers on crashes

Thu Apr 15 21:06:17 PDT 2021

LLD reproducers is something we'd like to have in Chrome OS as well, see
bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
activity yet).
Our plan is to create a shell wrapper and re-exec LLD if needed with
--reproduce. Obviously, if LLD supports creating reproducers natively,
that'd be great!

-Manoj

On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > lld crashes are more rare, but they do happen. For example, we see lld
> segfaulting occasionally on our bots. I'd like to fix it, but I don't know
> how to reproduce this issue because we never managed to reproduce it
> locally. This is primarily where the motivation for this feature came from.
> In the case of Clang, we already configure our build to generate
> reproducers in a dedicated directory and at the end of the build we upload
> its content to a dedicated (short lived) storage bucket. We would like to
> do the same with lld and if this feature existed, we would use it in our
> build.
> >
> > The size of the reproducers is not really an issue; even if they are a
> few gigabytes, they're still dwarfed by the size of the debug info, at
> least in our build.
> >
> > Passing -Wl,--reproduce is something a compiler engineer can do when
> debugging an issue locally, but it's not something a bot can do. Even most
> developers on our team wouldn't know how to do it which is why the
> automatic crash reproducer generation in Clang is so valuable, all that
> developers need to do is to follow the instructions without having to
> modify the build and we've had great success with it in the case of Clang.
>
> Probably would help (if this isn't done already) this part at least
> (ie: users who don't have this newly proposed feature enabled) if
> lld's crash reporter printed the command line to run with the extra
> flag "to reproduce this run <this command>" for discoverability?
>
> (not to derail the primary discussion on this thread, which I don't
> have much opinion on)
>
> > I'm leaning towards the second option, that is implementing this feature
> directly in lld. The reason is that we most often see lld crashes when
> linking Rust code. If we implemented this feature in the Clang driver, we
> would also need to do the same inside the Rust driver (and any other
> compiler driver that supports lld). If we implement it in lld, we only need
> to do it once, so it's more universal.
> >
> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>
> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at google.com> wrote:
> >> >
> >> > > I am skeptical that users will want to have this behavior by
> default.
> >> > > If this behavior is guarded by an option, it might be fine.
> >> >
> >> > That's a good point. If the reproducer will be more than a few
> hundreds MiBs, it is definitely not suitable to be enabled by default. I
> agree it's better to be guarded by an option flag such as
> `--gen-lld-crash-reproducer`.
> >> >
> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song <maskray at google.com>
> wrote:
> >> >>
> >> >>
> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
> >> >> >*Background / Motivation*
> >> >> >
> >> >> >Both clang and lld have the ability to generate a reproducer (an
> archive
> >> >> >with input files and invoker script to reproduce the clang/lld
> build).
> >> >> >While clang will generate a reproducer archive when a crash
> happens, lld
> >> >> >only generates a reproducer when '--reproduce' flag is explicitly
> set (this
> >> >> >is equivalent to Clang's -gen-reproducer flag). This is not very
> helpful
> >> >> >for debugging lld bugs, particularly when the crash happens in
> building big
> >> >> >projects, since it will be unrealistic to set reproducer flags to
> generate
> >> >> >reproducer archives for every lld invocation. This design also
> causes
> >> >> >troubles when the crash happens on bots only, as in most cases,
> developers
> >> >> >do not have access to the file system of these bots. It would be
> great to
> >> >> >improve the lld reproducer generation for easier debugging in these
> >> >> >scenarios.
> >> >> >
> >> >> >*Proposal*
> >> >> >
> >> >> >Given the use cases and status of clang and lld. I think there are 2
> >> >> >possible solutions.
> >> >> >
> >> >> >*Extend Clang driver*
> >> >> >In most cases, lld is invoked by the clang driver instead of being
> invoked
> >> >> >by the build system directly. Therefore, the clang driver can be
> changed to
> >> >> >re-invoke lld with '--reproduce' flags when it detects the lld
> subprocess
> >> >> >is crashed.
> >> >> >
> >> >> >Advantages:
> >> >> >    * It probably does not require any changes to the lld and might
> be
> >> >> >easier than handling the crash directly in lld.
> >> >> >
> >> >> >Disadvantages:
> >> >> >    * In case when there is a racing condition in the build system,
> the
> >> >> >input files might have changed between 1st lld crash and 2nd lld
> rerun with
> >> >> >'--reproduce' flag. In this case, the generated lld reproducer
> archive
> >> >> >might not be able to trigger a crash, makes it less useful.
> >> >> >
> >> >> >*Improve lld reproducer*
> >> >> >Another way would be to make lld generate a reproducer archive when
> it
> >> >> >crashes, just like what clang is doing.
> >> >> >
> >> >> >Advantages:
> >> >> >    * It will work no matter if lld is invoked from Clang or from
> the build
> >> >> >system.
> >> >> >    * It will catch the input file in case the crash is caused by
> build
> >> >> >races.
> >> >> >
> >> >> >Disadvantages:
> >> >> >    * It might need a lot of work if lld does not already have a
> >> >> >sophisticated crash handler. It might still need some plumbing
> changes in
> >> >> >clang driver so lld can honor the '-fcrash-diagnostic-dir' flag.
> >> >> >
> >> >> >*Comments?*
> >> >> >Which approach do you prefer? Feel free to share your opinions.
> >> >>
> >> >> There is a resource difference between clang -gen-reproducer /
> >> >> environment variable "FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
> --reproduce.
> >> >>
> >> >> clang -gen-reproducer produces a source file and a .sh file for one
> >> >> single translation unit, the space consumption is low.
> >> >> ld.lld --reproduce can potentially pack a large list of files, which
> may
> >> >> take hundreds of megabytes or several gigabytes.
> >> >>
> >> >> I am skeptical that users will want to have this behavior by default.
> >> >> If this behavior is guarded by an option, it might be fine.
> >>
> >> I'll retract my words about an option. This behavior looks like it
> >> needs a fair bit of customization and is build system dependent.
> >> You can replace the proposed option with a shell script wrapper, which
> >> is more convenient than implementing the restartable action in the
> >> clang driver.
> >> When dealing with linker problems, (I doubt there are many nowadays;
> >> when there are problems, mostly are LTO problems), I will usually
> >> change compiler/linker options a bit.
> >> If you do this, you may only specify the proposed option when all the
> >> stuff has been done, but then it is only a very small extra step to
> >> invoke the link again with -Wl,--reproduce.
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210415/53b9b98b/attachment-0001.html>