[PATCH] D141451: [clang] report inlining decisions with -Wattribute-{warning|error}

Mon May 1 12:46:47 PDT 2023

aaron.ballman added inline comments.

================
Comment at: clang/test/Frontend/backend-attribute-error-warning-optimize.c:12
   foo(); // expected-error {{call to 'foo' declared with 'error' attribute: oh no foo}}
+         // expected-note@* {{In function 'baz'}}
   if (x())
----------------
nickdesaulniers wrote:
> cjdb wrote:
> > aaron.ballman wrote:
> > > nickdesaulniers wrote:
> > > > cjdb wrote:
> > > > > aaron.ballman wrote:
> > > > > > nickdesaulniers wrote:
> > > > > > > nickdesaulniers wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > Instead of allowing this note to appear anywhere in the file, I think it's better to use "bookmark" comments. e.g.,
> > > > > > > > > ```
> > > > > > > > > void baz() { // #baz_defn
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > void foo() {
> > > > > > > > >   baz(); // expected-note@#baz_defn {{whatever note text}}
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > > so that we're testing where the diagnostic is emitted. (This is especially important given that the changes are about location tracking.)
> > > > > > > > oh, that's new (to me)! will do
> > > > > > > It looks like the "bookmarks" don't work because the notes do not line+col info. They follow the warning/error diagnostic which does, on the bottom most call site.
> > > > > > > 
> > > > > > > The warning is supposed to be emitted on the callsite, not the definition.
> > > > > > I still don't think this is reasonable for test coverage because this means we'll accept the note *anywhere* in the file. This makes it much too easy to regress the behavior accidentally (e.g., accidentally emit all the notes on line 1 of the file). I think we need *some* way to nail down where these notes are emitted in the test. However, I might be asking you to solve "please track location information through the backend" for that, so perhaps this isn't a reasonable request?
> > > > > > 
> > > > > > Also, CC @cjdb for awareness of how this kind of diagnostic is produced (Chris is working on an idea for how we emit diagnostics so we get better structured information from them, so knowing about this novel approach might impact that idea).
> > > > > Very interesting, thanks for the heads up! Cross-phase diagnostics are definitely something I hadn't considered, and it **does** impact the design (but not in a derailing way).
> > > > I think we're back at "please track location information through the backend".
> > > > 
> > > > That is the tradeoff with this approach; not measurably regressing performance when these attributes aren't used in source in exchange for Note diagnostics that lack precise line+col info, but in practice provide enough info for the user to understand what's going wrong where.
> > > > 
> > > > Your call if the tradeoff is worth it.
> > > Here's my thinking, please correct any misunderstandings I have:
> > > 
> > > * This functionality is primarily for use with `_FORTIFY_SOURCE` (but not solely for that purpose), and that's an important security feature used by glibc and the Linux kernel.
> > > * Security of C and C++ source code is Very Important, especially now that various gov't agencies [1][2] are suggesting to not use C or C++ for new projects specifically because the security posture of the languages and their tools.
> > > * Therefore, the ergonomics of this functionality are quite important for the intended use cases and the ecosystem as a whole.
> > > * Emitting only the function name but without location information does not give a good user experience, especially in circumstances where the function has multiple overloads, because it pushes the burden onto the programmer to figure out which functions are actually involved in the call chain. This issue is also present in C because of support for `__attribute__((overloadable))` and is not just a C++ concern.
> > > * The compile-time performance costs of tracking this location information are roughly 1%, and there are no runtime or binary size overhead costs unless other functionality is enabled (such as emitting debug info).
> > > * We can determine whether we need to enable this location tracking while building the AST when we see one of these two attributes being used, so we could enable this functionality selectively without burdening the user with enabling it manually via a flag. (We could still consider giving the user a flag to never track this even in the presence of the attributes if a user had a compelling use case for it.)
> > > 
> > > If this is all reasonably accurate, I think it's worth seriously considering accepting the costs. I don't want to increase our compile time overhead, but at the same time, if ~1% is a "make or break" amount of compile-time performance to lose, we should be addressing *that* issue rather than hamstringing a user-facing diagnostic feature related to security. It's hard to argue "that CVE was totally worth it because we could then compile 1% faster". That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?
> > > 
> > > [1] https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
> > > [2] https://doi.org/10.6028/NIST.IR.8397
> > > 
> > My attitude is that we should prioritise ergonomics and usability over speed. Yes, having to wait a little longer for stuff to build sucks, but there's a clear ergonomics issue, which is motivation enough for me in this case (adding the CVE argument is the cherry on top). Folks either aren't going to use a feature if it's difficult to get right, or they're going to use it with large amounts of stress (and I don't think this is fair).
> > 
> > Moreover, I think there may be some opportunities we can look into to cancel out this pessimisation.
> > The compile-time performance costs of tracking this location information are roughly 1%, and there are no runtime or binary size overhead costs unless other functionality is enabled (such as emitting debug info).
> 
> (Just a note for passers-by: https://reviews.llvm.org/D141451 is not a 1% performance regression; @aaron.ballman is referring to https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers Solution 2 where would could track all location info unconditionally)
> 
> That 1% measurement was for Linux kernel builds, as reported https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers, and it gets worse for other projects. The ones tracked on llvm compile time tracker were much worse `geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%). ` http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time
> 
> I did try tracking //only// source locations of //calls//.
> https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
> That was faster, but I only measured Linux kernel build times, and not llvm compile time tracker times.  I could do a similar experiment (test on llvm compile time tracker) if that helps?
> 
> > We can determine whether we need to enable this location tracking while building the AST when we see one of these two attributes being used
> 
> I had looked into that; I wasn't able to get something like that working. IIRC, it's hard to modify codegen during parsing, or the decision on what kind of DICompileUnit to emit had already been made.  Or maybe I was trying to do it from CodeGen after we already started producing IR, I don't remember now.  Are we guaranteed to never start CodeGen of IR until after Sema is complete?  I could look into this again/try harder though.
> 
> > We could still consider giving the user a flag to never track this even in the presence of the attributes if a user had a compelling use case for it.
> 
> Yeah, I'm concerned with the interaction between <whatever ships> and `-g0`. If <whatever ships> == location tracking via debug info, then the presence of `-g0` may change our diagnostics!
> 
> > That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?
> 
> @efriedma had some such feedback on the RFC: https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers
> That 1% measurement was for Linux kernel builds, as reported https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers, and it gets worse for other projects. The ones tracked on llvm compile time tracker were much worse geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%).  http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time

Oh, that makes me sad. A 5% slowdown is quite the hit for this.

> I did try tracking only source locations of calls.
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
That was faster, but I only measured Linux kernel build times, and not llvm compile time tracker times. I could do a similar experiment (test on llvm compile time tracker) if that helps?

I don't want to send you on a wild goose chase, but if we can find a solution that gets us closer to 1% (or less!), that would make it easier to decide what to do here. I think 5% is... probably too much, but then I wonder "how do we make Fortify work?".

> I had looked into that; I wasn't able to get something like that working. IIRC, it's hard to modify codegen during parsing, 

Ah, that I'd believe (I forgot, language options are basically const by the time we start parsing). However, could we squirrel this away on `ASTContext` instead?

>> That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?
> @efriedma had some such feedback on the RFC: https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers

Ah, thank you for reminding me of that! However, I disagree with this:

> (Most code doesn’t normally contain either inline asm or warning/error attributes, and I’m not really convinced by other potential diagnostics.)

because this attribute is also used by glibc: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#203 and that macro does get used: https://codebrowser.dev/glibc/glibc/stdlib/bits/stdlib.h.html#33 or https://codebrowser.dev/glibc/glibc/posix/bits/unistd.h.html#32 (etc), so I think this could potentially be significantly wider than just the Linux kernel. Does that change your opinion at all @efriedma?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141451/new/

https://reviews.llvm.org/D141451