[llvm] [symbolizer] Empty string is not an error (PR #92660)

Wed May 22 00:50:59 PDT 2024

jh7370 wrote:

> > Honetly, I feel like the requested "check that llvm-symbolizer is responding to input by sending it an arbitrary string" is a bit of a hack too: what if llvm-symbolizer were to crash immediately after printing its response?
> 
> > the use case is invalid as a good response doesn't mean that more responses will come from llvm-symbolizer (see above re. crash after a response for one possible example)
> 
> A good response means that the client program found the llvm-symbolizer executable and it reached its main loop, which is a useful thing to know on its own regardless of whether later requests fail. A bad response is a strong signal (not a guarantee) that there was a user error and the client program should tell the user to e.g. check their $PATH. A good response rules out this type of user error and means that if we see a crash later it means that there is a bug (either in the client program or llvm-symbolizer) and the client program can respond by telling the user to file a bug report. This is the case whether the crash was caused by responding to the `\n` or to the first request.

Thanks. Would my suggestion 1. above (printing something immediately before the loop starts) be sufficient from your point of view? Alternatively suggestion 2 I think would satisfy this, though I'd prefer the input string to cause a response to be a little less magic, e.g. introduce a "ECHO" directive, a bit like we have `CODE`, `DATA` etc, and then it just prints everything else in the same input line.

> > What does binutils addr2line do in this case?
> 
> Looks like it prints `??:0\n` without any output to stderr. We might want to emulate that in the addr2line emulation mode but since llvm-symbolizer has historically responded to `\n` with `\n` I reckon that's what llvm-symbolizer should do by default.

I'm largely ambivalent about whether we match GNU addr2line in GNU output mode, so if there's a preference for that, I'm okay with it (though if it makes the code significantly more complex, then that's a different story). As for the LLVM output mode, llvm-symbolizer also historically responded to "arglefargle" with "arglefargle": "\n" wasn't a special case (as far as I understand it) - it was just simply echoing the input whenever it didn't recognise the input as a valid address. I'm reluctant to introduce special behaviour for "\n" simply because it isn't obvious why this special behaviour should exist (assuming no knowledge about specific use cases of course). Hence my preference for one of the two other suggestions I made (both imply how they might be useful by carefully selecting the output/magic string, without further context needed).

https://github.com/llvm/llvm-project/pull/92660