[llvm-dev] [RFC] Generating LLD reproducers on crashes
pawel k. via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 16 00:36:45 PDT 2021
Reusing existing one sounds even better.
Best regards,
Pk
pt., 16.04.2021, 08:14 użytkownik Petr Hosek via llvm-dev <
llvm-dev at lists.llvm.org> napisał:
> On Thu, Apr 15, 2021 at 9:30 PM Fangrui Song via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On 2021-04-15, Manoj Gupta via llvm-dev wrote:
>> >LLD reproducers is something we'd like to have in Chrome OS as well, see
>> >bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
>> >activity yet).
>> >Our plan is to create a shell wrapper and re-exec LLD if needed with
>> >--reproduce. Obviously, if LLD supports creating reproducers natively,
>> >that'd be great!
>> >
>> >-Manoj
>>
>> The crash report can be easily implemented via a shell script, but is
>> difficult
>> to implementat reliably in the process itself. When a process crashes,
>> naturally not everything can work very robustly. The process wants to
>> recover
>> some state and starts a .tar writer, collects every touched file and
>> places
>> them in the .tar writer. There are many steps things can go afoul. I am
>> worrying about the robustness. Of course, this may be solved by a
>> multiprocess
>> architecture, but I am not sure we want to pay the complexity in the LLD
>> entrypoint itself.
>>
>
> I'm hoping that we could reuse llvm::CrashRecoveryContext just like Clang
> does without needing multi-process architecture. Furthermore, we might be
> able to extract some of the common infrastructure from Clang to LLVM and
> then use it in lld. Clang has already solved this so there's no need to
> duplicate the effort.
>
>
>> (Crashing LLD is not the idea I hear a lot. For some groups it has been
>> very stable.
>> The crashes are more frequently from some optimizations triggered by
>> llvm/lib/LTO.
>> The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like
>> to provide.)
>>
>> On the other hand, this task seems to require a fair amount of
>> customization to
>> me. First we have the tarball size problem. Then say there is a common
>> crash
>> and 100 links of a similar kind crash at the same time, do we write 100
>> tarballs? In a controlled environment, for example when there is some
>> deduplicater or throttling this may be feasible. The output filename may
>> want
>> customization as well, and different groups may have different opinions.
>> It
>> feels to me that a script, whether or not LLD has the built-in crash
>> reporting
>> feature, is indispensable. Then the built-in C++ crash reporter code in
>> LLD
>> does not convince me.
>>
>
> I cannot speak for others, but at least in our case we are usually limited
> by the available memory and on the machines we have we can only run single
> to low double digit of link jobs in parallel. We also never had this issue
> with Clang and we usually run hundreds of parallel jobs.
>
> Regarding the output name, I'd again take an inspiration from Clang and
> simply use a temporary filename generated
> by llvm::sys::fs::createTemporaryFile. That way we don't need to worry
> about two failing link jobs trying to write to the same file. We never
> needed the customization for Clang and I doubt we will for lld.
>
>
>> >On Thu, Apr 15, 2021 at 11:23 AM David Blaikie via llvm-dev <
>> >llvm-dev at lists.llvm.org> wrote:
>> >
>> >> On Thu, Apr 15, 2021 at 1:37 AM Petr Hosek via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> >
>> >> > lld crashes are more rare, but they do happen. For example, we see
>> lld
>> >> segfaulting occasionally on our bots. I'd like to fix it, but I don't
>> know
>> >> how to reproduce this issue because we never managed to reproduce it
>> >> locally. This is primarily where the motivation for this feature came
>> from.
>> >> In the case of Clang, we already configure our build to generate
>> >> reproducers in a dedicated directory and at the end of the build we
>> upload
>> >> its content to a dedicated (short lived) storage bucket. We would like
>> to
>> >> do the same with lld and if this feature existed, we would use it in
>> our
>> >> build.
>> >> >
>> >> > The size of the reproducers is not really an issue; even if they are
>> a
>> >> few gigabytes, they're still dwarfed by the size of the debug info, at
>> >> least in our build.
>> >> >
>> >> > Passing -Wl,--reproduce is something a compiler engineer can do when
>> >> debugging an issue locally, but it's not something a bot can do. Even
>> most
>> >> developers on our team wouldn't know how to do it which is why the
>> >> automatic crash reproducer generation in Clang is so valuable, all that
>> >> developers need to do is to follow the instructions without having to
>> >> modify the build and we've had great success with it in the case of
>> Clang.
>> >>
>> >> Probably would help (if this isn't done already) this part at least
>> >> (ie: users who don't have this newly proposed feature enabled) if
>> >> lld's crash reporter printed the command line to run with the extra
>> >> flag "to reproduce this run <this command>" for discoverability?
>> >>
>> >> (not to derail the primary discussion on this thread, which I don't
>> >> have much opinion on)
>> >>
>> >> > I'm leaning towards the second option, that is implementing this
>> feature
>> >> directly in lld. The reason is that we most often see lld crashes when
>> >> linking Rust code. If we implemented this feature in the Clang driver,
>> we
>> >> would also need to do the same inside the Rust driver (and any other
>> >> compiler driver that supports lld). If we implement it in lld, we only
>> need
>> >> to do it once, so it's more universal.
>> >> >
>> >> > On Wed, Apr 14, 2021 at 3:40 PM Fāng-ruì Sòng via llvm-dev <
>> >> llvm-dev at lists.llvm.org> wrote:
>> >> >>
>> >> >> On Wed, Apr 14, 2021 at 3:27 PM Haowei Wu <haowei at google.com>
>> wrote:
>> >> >> >
>> >> >> > > I am skeptical that users will want to have this behavior by
>> >> default.
>> >> >> > > If this behavior is guarded by an option, it might be fine.
>> >> >> >
>> >> >> > That's a good point. If the reproducer will be more than a few
>> >> hundreds MiBs, it is definitely not suitable to be enabled by default.
>> I
>> >> agree it's better to be guarded by an option flag such as
>> >> `--gen-lld-crash-reproducer`.
>> >> >> >
>> >> >> > On Wed, Apr 14, 2021 at 2:40 PM Fangrui Song <maskray at google.com>
>> >> wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> On 2021-04-14, Haowei Wu via llvm-dev wrote:
>> >> >> >> >*Background / Motivation*
>> >> >> >> >
>> >> >> >> >Both clang and lld have the ability to generate a reproducer (an
>> >> archive
>> >> >> >> >with input files and invoker script to reproduce the clang/lld
>> >> build).
>> >> >> >> >While clang will generate a reproducer archive when a crash
>> >> happens, lld
>> >> >> >> >only generates a reproducer when '--reproduce' flag is
>> explicitly
>> >> set (this
>> >> >> >> >is equivalent to Clang's -gen-reproducer flag). This is not very
>> >> helpful
>> >> >> >> >for debugging lld bugs, particularly when the crash happens in
>> >> building big
>> >> >> >> >projects, since it will be unrealistic to set reproducer flags
>> to
>> >> generate
>> >> >> >> >reproducer archives for every lld invocation. This design also
>> >> causes
>> >> >> >> >troubles when the crash happens on bots only, as in most cases,
>> >> developers
>> >> >> >> >do not have access to the file system of these bots. It would be
>> >> great to
>> >> >> >> >improve the lld reproducer generation for easier debugging in
>> these
>> >> >> >> >scenarios.
>> >> >> >> >
>> >> >> >> >*Proposal*
>> >> >> >> >
>> >> >> >> >Given the use cases and status of clang and lld. I think there
>> are 2
>> >> >> >> >possible solutions.
>> >> >> >> >
>> >> >> >> >*Extend Clang driver*
>> >> >> >> >In most cases, lld is invoked by the clang driver instead of
>> being
>> >> invoked
>> >> >> >> >by the build system directly. Therefore, the clang driver can be
>> >> changed to
>> >> >> >> >re-invoke lld with '--reproduce' flags when it detects the lld
>> >> subprocess
>> >> >> >> >is crashed.
>> >> >> >> >
>> >> >> >> >Advantages:
>> >> >> >> > * It probably does not require any changes to the lld and
>> might
>> >> be
>> >> >> >> >easier than handling the crash directly in lld.
>> >> >> >> >
>> >> >> >> >Disadvantages:
>> >> >> >> > * In case when there is a racing condition in the build
>> system,
>> >> the
>> >> >> >> >input files might have changed between 1st lld crash and 2nd lld
>> >> rerun with
>> >> >> >> >'--reproduce' flag. In this case, the generated lld reproducer
>> >> archive
>> >> >> >> >might not be able to trigger a crash, makes it less useful.
>> >> >> >> >
>> >> >> >> >*Improve lld reproducer*
>> >> >> >> >Another way would be to make lld generate a reproducer archive
>> when
>> >> it
>> >> >> >> >crashes, just like what clang is doing.
>> >> >> >> >
>> >> >> >> >Advantages:
>> >> >> >> > * It will work no matter if lld is invoked from Clang or
>> from
>> >> the build
>> >> >> >> >system.
>> >> >> >> > * It will catch the input file in case the crash is caused
>> by
>> >> build
>> >> >> >> >races.
>> >> >> >> >
>> >> >> >> >Disadvantages:
>> >> >> >> > * It might need a lot of work if lld does not already have a
>> >> >> >> >sophisticated crash handler. It might still need some plumbing
>> >> changes in
>> >> >> >> >clang driver so lld can honor the '-fcrash-diagnostic-dir' flag.
>> >> >> >> >
>> >> >> >> >*Comments?*
>> >> >> >> >Which approach do you prefer? Feel free to share your opinions.
>> >> >> >>
>> >> >> >> There is a resource difference between clang -gen-reproducer /
>> >> >> >> environment variable "FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld
>> >> --reproduce.
>> >> >> >>
>> >> >> >> clang -gen-reproducer produces a source file and a .sh file for
>> one
>> >> >> >> single translation unit, the space consumption is low.
>> >> >> >> ld.lld --reproduce can potentially pack a large list of files,
>> which
>> >> may
>> >> >> >> take hundreds of megabytes or several gigabytes.
>> >> >> >>
>> >> >> >> I am skeptical that users will want to have this behavior by
>> default.
>> >> >> >> If this behavior is guarded by an option, it might be fine.
>> >> >>
>> >> >> I'll retract my words about an option. This behavior looks like it
>> >> >> needs a fair bit of customization and is build system dependent.
>> >> >> You can replace the proposed option with a shell script wrapper,
>> which
>> >> >> is more convenient than implementing the restartable action in the
>> >> >> clang driver.
>> >> >> When dealing with linker problems, (I doubt there are many nowadays;
>> >> >> when there are problems, mostly are LTO problems), I will usually
>> >> >> change compiler/linker options a bit.
>> >> >> If you do this, you may only specify the proposed option when all
>> the
>> >> >> stuff has been done, but then it is only a very small extra step to
>> >> >> invoke the link again with -Wl,--reproduce.
>> >> >> _______________________________________________
>> >> >> LLVM Developers mailing list
>> >> >> llvm-dev at lists.llvm.org
>> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > llvm-dev at lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>>
>> >_______________________________________________
>> >LLVM Developers mailing list
>> >llvm-dev at lists.llvm.org
>> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210416/be9194ae/attachment.html>
More information about the llvm-dev
mailing list