[llvm] Add thread-local overrides for `llvm::errs()` and `llvm::outs()`. (PR #90374)

Wed May 8 13:04:00 PDT 2024

dwblaikie wrote:

> > I think it would also be important for `llvm::CrashRecoveryContext` to persist the stream overrides into the inner thread.
> 
> Good idea, done and tested that this works.
> 
> > I worry that a tool like this would disincentivize fixing/improving existing libraries that could/should take a stream as a parameter... - so I'm weakly not in favor of this, in that I'd rather see the library improvements... but no doubt there are some hard cases.
> > Are the cases you have in mind in-tree/could you give a rough overview of the challenges/why this is the better path?
> 
> Yeah, TBH, this isn't my favorite approach either. But I don't really know how to achieve the other approaches in a reasonable timeframe.
> 
> The code I specifically am hitting this in is Clang's Driver and Frontend library which have a bit over 60 uses of `llvm::errs()` at a quick count. If it were just updating all of those cases, maaaaaybe I could manage to do it, but the real issue is that all of these are behind layers and layers of APIs across the driver and frontend library that would need to be updated to thread a stream parameter through. In some places, there is a diagnostic engine available, but the uses of `llvm::errs()` don't seem to really map well into that abstraction, and doing so would likely break users expecting the output in its exact form.

Yeah, some of these should probably be updated to diagnostics, but even those that aren't - it seems like there's perhaps relatively few data structures that'd need to be taught about it. Easier in the Driver, perhaps, where adding an `Errs` member to the Driver covers most of it/is accessible from most of the places that are trying to print. (a few in the OffloadBundler have a Verbose and a `PrintExternalCommands` bool flags, that could instead be pointers to a stream - non-null means use the stream, null means don't print anything, which is fairly straightforward/doesn't involve a lot of extra plumbing).

One that's more subtle is probably clang parts using LLVM's timer library - that no doubt prints to errs as well without a parameter... oh, it's trickier. It creates new/distinct streams that would print to stderr and stdout, even with this proposed override in place https://github.com/llvm/llvm-project/blob/e3938f4d71493673033f6190454e7e19d5411ea7/llvm/lib/Support/Timer.cpp#L96-101

> But this is a pretty pervasive challenge when using both Clang and LLVM as libraries. For example, Clang has 500 some uses of `llvm::errs()` through out its `lib` directory, some in almost every library. A quick skim didn't seem to show many that were really utility code or outside of what a library user might encounter. 

Huh, figure a fair few would be in data structure dump() functions, like [`DiagnosticsEngine::DiagStateMap::dump`](https://github.com/llvm/llvm-project/blob/e3938f4d71493673033f6190454e7e19d5411ea7/clang/lib/Basic/Diagnostic.cpp#L257), [`MigrationContext::dumpGCAttrs`](https://github.com/llvm/llvm-project/blob/e3938f4d71493673033f6190454e7e19d5411ea7/clang/lib/ARCMigrate/TransGCAttrs.cpp#L330)

> LLVM itself has over 100 uses in `lib/Transforms`, where similarly it would be nice for a library user to be able to avoid directly writing to stderr, and it again seems difficult to imagine plumbing streams through all the APIs.

I would've thought having one on LLVMContext would cover a fair few of these uses (& similarly on clang with CompilerInstance, perhaps). 

Though, I think another idiom that shows up here is errs() under !NDEBUG or in code only intended to be called from within assertions - eg: the 27 errs() calls in VPlanVerifier all bubble up to verifyVPlanIsValid which is only used within a couple of assertions. Some appear within LLVM_DEBUG macros, others in functions only called from assertions, in dump functions, etc.

> Long term, the right thing is definitely to try to avoid using these in code paths that are intended to be library-used, and instead to build some better mechanism. But I think that will be the work of years, and I'd somewhat like to have the flexibility of controlling this in the mean time.
> 
> Dunno, curious what others think here.

Yeah, love to hear other opinions here, to be sure.

I think my take on it is I'm not saying a very strong no on this, but it seems like if the need is only for the driver/frontend, the problem might be tractable plumbing through relatively few abstractions/attaching the stream to some of the nearly-ubiquitous data structures there. And that might help set a precedent/a good direction when the next person needs only a little bit more, etc.

But I get the convenience of such an abstraction... :/

Be nice to find a way to differentiate the assert-only/dump/etc code from all this. I guess if we `#ifdef`'d it all well enough, maybe had a `llvm::debug_errs` that isn't defined in non-asserts builds, then its use couldn't leak out. Though it'd be hard to enforce - perhaps it'd make it easier to notice if there were no legitimate uses of direct `errs()` in the library source trees, only in the drivers.

https://github.com/llvm/llvm-project/pull/90374