[PATCH] D80803: [llvm-dwarfdump] Flush output before warning and errors are reported

Tue Jun 2 09:52:43 PDT 2020

JDevlieghere added a comment.

In D80803#2068152 <https://reviews.llvm.org/D80803#2068152>, @labath wrote:

> So, combining stdout and stderr is an issue that a lot of people considered important, and indeed there are a lot of tools out there which get it "right".

I hope I didn't give the impression that I don't think it's important. I think it's really useful to have `stdout` and `stderr` for exactly the reason that James mentioned. Everything I said was weighing those benefits agains us having to maintain flushes in `dwarfdump`.

> llvm streams have no "tied stream" concept, and I would stipulate that this is the real reason for why we're running into the problems here. The simplest and least obtrusive way this concept could be retrofitted into the existing llvm streams is to have a new class -- let's call it `tied_raw_fd_ostream`. This class would inherit from `raw_fd_ostream` and it would take an additional constructor argument -- the "tied" stream. The only special thing it would do is that in its `write_impl` implementation, it would call `TiedStream.flush()` before doing the actual write. `llvm::errs()` would be of type `tied_raw_fd_ostream`, tied to `outs()`, everything else would remain unchanged.
> 
> Implementing this idea should be very easy. The result won't be as flexible as the std::iostream solution, but that might actually be a good thing. And as long as an application writes only/mostly to `outs()`, the performance impact should be zero. If an application combines stdout and stderr a lot, then it will suffer some performance degradation, but hopefully that is not the case we need to optimize for.

I really like this idea because it takes the tools out of the business of flushing and provides a structural solution to the problem. My main issue with the original patch was that it's very tricky to cover all cases and you can't sprinkle flushes until it works. Having a `tied_raw_fd_ostream` would solve that issue completely.

In D80803#2068317 <https://reviews.llvm.org/D80803#2068317>, @jhenderson wrote:

> I'm happy to go for a tied stream approach. That would seem like a more scalable solution for clients that want to keep things in sync. In practice, not that much prints to stderr, so performance is probably not so critical, in my opinion.
>
> I hear @JDevlieghere's concerns about messages being self-descriptive enough to not need worrying about syncing, and I agree with the principle. Indeed, I often push for more context to be included in error messages. However, my take on it is that if error messages are near related output when the two are combined, it makes it much easier to associate one with the other, regardless of how much context there is in the message. I have often found cases in build systems etc which produce huge amounts of output for whatever reason, where the error messages end up miles away from the actual related output. Often, the stdout might say something like "Build Failed" but the related stderr error messages are several hundred lines earlier, and therefore quite hard to find. Admittedly, this is as much to do with there being too much output as it is to do with flushing, but the principle still more or less applies.

As I said above I completely agree with all of this!

> In terms of the testing, I have a patch that I hope to put up for review today, to reposition the testing of the error messages in debug_line_invalid.test to be alongside their corresponding tables. The information in most (all?) of the messages is sufficient to already associate them correctly, since they contain the file offsets, but that doesn't make the test script easy to read or follow from my experience, and that's despite having written most of this test myself! Whilst I could continue to keep the errors in a single file and test that indepdendently, but with the FileCheck lines interleaved appropriately, I feel like this may get a little confusing at times.

I strongly believe that maintainability of the tests is very important. My point was more about making sure we don't start relying on the order. But you're right, we've been doing a great job and nothing indicates that will change, so it was more of a concern than an actual problem.

> From a purely llvm-dwarfdump perspective, I think there are likely three common usages:
> 
> 1. Both stream printed in a console window - in this case, stream interleaving isn't great, as it can mean the error/warnings don't appear near corresponding stdout, meaning a warning might be missed/thought to be related to the wrong thing.
> 2. Both streams redirected to the same file - same as above.
> 3. One stream redirected to a file/both redirected to different files - error messages need enough context (which I think they have already for the most part).
> 
>   In the first two cases, this patch improves the issue, whilst in the third, the only thing that happens is a slight degradation in performance in the event of a warning/error being emitted. Since these should be uncommon, I think that's harmless. Testing typically will use either the first or third version (currently uses the third, but I'd like to move to the first).

In D80803#2068799 <https://reviews.llvm.org/D80803#2068799>, @jhenderson wrote:

> I've tried getting the `tied_raw_fd_ostream` approach to work, by tying `errs()` to `outs()` as suggested, but it doesn't work when it comes to testing D80989 <https://reviews.llvm.org/D80989> (and possibly others) in lit. The problem is that llvm-dwarfdump uses a `ToolOutputFile`, which contains its own `raw_fd_ostream` pointing to stdout, with its own buffer independent of the version that is provided by `outs()`. Consequently, the flush triggered by writing to `errs()` flushes the empty `outs()` buffer, before writing the error message, rather than the real output buffer provided by the `ToolOutputFile`. I see three possible ways forward: 1) Go ahead with my existing approach as suggested in the current patch (my preference), 2) abandon the whole idea of keeping the two in sync (I don't like this idea for reasons already outlined), or 3) have `raw_fd_ostream` instances contructed with `-` (the stdout variant) actually use the `outs()` stream, rather than its own one - this will prevent potential weird situations where we have two independent streams with their own buffers both writing to stdout, and therefore potentially getting very weird interleaving, but I'm not sure of the knock-on effects of this yet.
>
> Thoughts?

My vote goes to (3) as well, though like Pavel I don't think it should block this patch. I can live with (1) for now and making (3) happen when similar issues pop up down the road.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80803/new/

https://reviews.llvm.org/D80803