[PATCH] D109400: [LLD] Add archive Name to relocaiton overflow printout

Thu Nov 4 22:43:45 PDT 2021

MaskRay added a comment.

In D109400#3110659 <https://reviews.llvm.org/D109400#3110659>, @wenlei wrote:

>> The description is not what this patch does. Do you mean D94141 <https://reviews.llvm.org/D94141>?
>
> Yes, I was referring to that patch. It is all related to improving diagnostics on error.

Then please move move the reply there. I'll copy my response there and delete this message.

>> It is also not much to ask the user to re-link with an option.
>
> I don't agree. 1) When builds are done on build agents, it's often not easy to just do "re-link". Then for people to repro this, a full rebuild may be needed which could take hours; 2) as I said above, it's too much to ask for an average non-compiler developer to deal with all of this, to know the `-M` + `sed` trick; 3) toolchain maintainers often don't have the setup to redo those builds locally.
>
> Let's use an analogy - you're a user of a product, say a phone app, and when the app crashed, wouldn't it be nice if there's an diagnostic report you can send to app developers right way? Isn't it a bad user experience if after the crash, what users have to do is to go to phone settings, do some trick, and repeat all you've done again to repro the crash? Ofc, it's doable. But isn't it better to make things easier for everyone? Toolchain is also a product that serves its users. I'm yet to be convinced why such a harmless diagnostic improvements is not acceptable, given it certainly helps some users.

The two scenarios are quite different. For an app crash, the crash involves user input and some non-determinism. There is often no good way reproducing every trace of it, so a coredump can be helpful.

A linker out-of-range error is different. It's expected that all input files don't immediate disappear so you can re-run the link.
The linker is deterministic (there could be LTO nondeterminism bugs but in an ideal world it is deterministic) so re-invoking can reproduce the issue.
Auto output section dump assumes that the output section sizes are important - but I am not sure.
To fix the issue you probably want more detailed information. That may require some "zoom-in" & "zoom-out" manual steps.
I don't think from just an output section dump you can identify the problem, except that you can tell the user "your output is too large".
Some domain specific knowledge may be involved like "some sections are known to cause more problems".
All information like this is too specific information which may serve your users well but may not be generic enough.

I want to highlight some points you may have neglected but are important to me:

- We want option orthogonality. -M is kept as opt-in. Making it enabled in some diagnostics breaks orthogonality.
- The design principle of diagnostics is to be sufficiently useful but not too verbose.
- -M output may be too huge. On some systems there may be a limit on output size. Printing too much may cause hide other diagnostics. We all know that template errors can sometimes lead to pages of errors which are difficult to comprehend, right?

>> As I said, I do not think auto -M or section output behavior is useful.
>
> Well, for the reasons I mentioned above, evidently it's proven to be quite useful for us. It helped us to triage some known issues quickly without needing to rebuild and point people to proper solutions very efficiently. It routinely saves many people hours of extra work from extra build for repro. Also in case there's confusion, this is not unconditional auto `-M`, this is just better diagnostics when error actually happens.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109400/new/

https://reviews.llvm.org/D109400