[PATCH] D124446: [clang-tidy] Add the misc-discarded-return-value check

Tue May 17 05:54:36 PDT 2022

aaron.ballman added inline comments.

================
Comment at: clang-tools-extra/clang-tidy/misc/DiscardedReturnValueCheck.cpp:83-86
+void DiscardedReturnValueCheck::onStartOfTranslationUnit() {
+  ConsumedCalls.clear();
+  CallMap.clear();
+}
----------------
whisperity wrote:
> aaron.ballman wrote:
> > Is this code necessary?
> Yes, if you have checks that store TU-specific data, and you run `clang-tidy a.cpp b.cpp` then if you do not clear the data structures, dangling pointers into already destroyed ASTs will remain.
Ah, thank you!

================
Comment at: clang-tools-extra/clang-tidy/misc/DiscardedReturnValueCheck.cpp:181
+
+  static const auto Decltype = decltypeType(hasUnderlyingExpr(Call));
+  static const auto TemplateArg =
----------------
whisperity wrote:
> whisperity wrote:
> > whisperity wrote:
> > > aaron.ballman wrote:
> > > > So, I'm not super keen on this approach of having to try to identify every single place in which an expression is considered to be "used" -- this is going to be fragile because we'll miss places and it's going to be a maintenance burden because new places will be added as the languages evolve.
> > > > 
> > > > For example, if we're handling `decltype` as a use, why not `noexcept`? Or conditional `explicit`? What about a `co_return` statement?
> > > > 
> > > > I'm not certain what we can do to improve this, but I think it's worth trying to explore options to see if we can generalize what constitutes a use so that we can write a few custom matchers to do the heavy lifting instead of trying to play whack-a-mole.
> > > I've been having other thoughts about this `decltype` here... Actually, neither `decltype` nor `noexcept` should be handled as a //"use"// at all, while `co_return` should be the same as a `return` -- however, I think it was due to lack of projects where such could be meaningfully measured as a missed case was why implementing that failed.
> > > 
> > > For `decltype`, `typedef`, and `noexcept` (and perhaps several others), the good solution would be having a third route: calls that //should not be counted//. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.
> > As for better matching... Unfortunately, types in the AST are so varied and `hasDescendant` is too generic to express something like `stmt(anyOf(ifStmt(), forStmt(), switchStmt()), hasDescendant(Call))` to express in a single expression matching uses... The conditions are not always direct children of the outer node, while `hasDescendant` will match not just the condition but the entire tree... resulting in things like //both// functions in
> > 
> > ```lang=cpp
> > if (foo())
> >   bar()
> > ```
> > 
> > matching.
> > 
> > Well... generalisation... I can throw in a formal fluke:
> > 
> > > A **use** is a //context// for a specific `CallExpr C` in which we can reasonably assume that the value produced by evaluating `C` is loaded by another expression.
> > 
> > Now what I found is `-Wunused-result`, aka `SemaDiagnostics::warn_unused_expr`, which is triggered in the function `ExprResult Sema::ActOnFinishFullExpr(Expr* FE, SourceLocation CC, bool DiscardedValue, bool IsConstexpr);`. Now this function itself does //some// heuristics inside (with a **lot** of `FIXME`s as of rGdab5e10ea5dbc2e6314e0e7ce54a9c51fbcb44bd), but notably, `DiscardedValue` is a parameter. According to a quick search, this function (and its overloads) have **82** callsites within `Sema`, with many of them just tougher to decipher than others. Some of the other ways this function is called, e.g. `ActOnStmtExprResult`, have codes like this:
> > 
> > ```lang=cpp
> > IsStmtExprResult = GetLookAheadToken(LookAhead).is(tok::r_brace) && GetLookAheadToken(LookAhead + 1).is(tok::r_paren);
> > ```
> > 
> > So I would say most of the logic there is **very** parsing specific, and requires information that is only available during the parsing descent, and not later when someone tries to consume a `const AST`.
> @aaron.ballman There is a `bugprone-unused-return-value` since mid 2018, in which the matched function set is configurable with a hardcoded default, and the matching logic is also... verbose.
> 
> [[ http://github.com/llvm/llvm-project/blob/c1a9d14982f887355da1959eba3a47b952fc6e7a/clang-tools-extra/clang-tidy/bugprone/UnusedReturnValueCheck.cpp#L144-L165 | Source ]]
> 
> ```lang=cpp
> auto UnusedInIfStmt =
>       ifStmt(eachOf(hasThen(MatchedCallExpr), hasElse(MatchedCallExpr)));
>   auto UnusedInWhileStmt = whileStmt(hasBody(MatchedCallExpr));
>   auto UnusedInDoStmt = doStmt(hasBody(MatchedCallExpr));
> ```
> 
> Agreed, this is seemingly a subset of the inverse match.
> For decltype, typedef, and noexcept (and perhaps several others), the good solution would be having a third route: calls that should not be counted. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.

I agree, these cases seem like they're not a use in the sense of discarding the return value.

================
Comment at: clang-tools-extra/clang-tidy/misc/DiscardedReturnValueCheck.cpp:181
+
+  static const auto Decltype = decltypeType(hasUnderlyingExpr(Call));
+  static const auto TemplateArg =
----------------
aaron.ballman wrote:
> whisperity wrote:
> > whisperity wrote:
> > > whisperity wrote:
> > > > aaron.ballman wrote:
> > > > > So, I'm not super keen on this approach of having to try to identify every single place in which an expression is considered to be "used" -- this is going to be fragile because we'll miss places and it's going to be a maintenance burden because new places will be added as the languages evolve.
> > > > > 
> > > > > For example, if we're handling `decltype` as a use, why not `noexcept`? Or conditional `explicit`? What about a `co_return` statement?
> > > > > 
> > > > > I'm not certain what we can do to improve this, but I think it's worth trying to explore options to see if we can generalize what constitutes a use so that we can write a few custom matchers to do the heavy lifting instead of trying to play whack-a-mole.
> > > > I've been having other thoughts about this `decltype` here... Actually, neither `decltype` nor `noexcept` should be handled as a //"use"// at all, while `co_return` should be the same as a `return` -- however, I think it was due to lack of projects where such could be meaningfully measured as a missed case was why implementing that failed.
> > > > 
> > > > For `decltype`, `typedef`, and `noexcept` (and perhaps several others), the good solution would be having a third route: calls that //should not be counted//. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.
> > > As for better matching... Unfortunately, types in the AST are so varied and `hasDescendant` is too generic to express something like `stmt(anyOf(ifStmt(), forStmt(), switchStmt()), hasDescendant(Call))` to express in a single expression matching uses... The conditions are not always direct children of the outer node, while `hasDescendant` will match not just the condition but the entire tree... resulting in things like //both// functions in
> > > 
> > > ```lang=cpp
> > > if (foo())
> > >   bar()
> > > ```
> > > 
> > > matching.
> > > 
> > > Well... generalisation... I can throw in a formal fluke:
> > > 
> > > > A **use** is a //context// for a specific `CallExpr C` in which we can reasonably assume that the value produced by evaluating `C` is loaded by another expression.
> > > 
> > > Now what I found is `-Wunused-result`, aka `SemaDiagnostics::warn_unused_expr`, which is triggered in the function `ExprResult Sema::ActOnFinishFullExpr(Expr* FE, SourceLocation CC, bool DiscardedValue, bool IsConstexpr);`. Now this function itself does //some// heuristics inside (with a **lot** of `FIXME`s as of rGdab5e10ea5dbc2e6314e0e7ce54a9c51fbcb44bd), but notably, `DiscardedValue` is a parameter. According to a quick search, this function (and its overloads) have **82** callsites within `Sema`, with many of them just tougher to decipher than others. Some of the other ways this function is called, e.g. `ActOnStmtExprResult`, have codes like this:
> > > 
> > > ```lang=cpp
> > > IsStmtExprResult = GetLookAheadToken(LookAhead).is(tok::r_brace) && GetLookAheadToken(LookAhead + 1).is(tok::r_paren);
> > > ```
> > > 
> > > So I would say most of the logic there is **very** parsing specific, and requires information that is only available during the parsing descent, and not later when someone tries to consume a `const AST`.
> > @aaron.ballman There is a `bugprone-unused-return-value` since mid 2018, in which the matched function set is configurable with a hardcoded default, and the matching logic is also... verbose.
> > 
> > [[ http://github.com/llvm/llvm-project/blob/c1a9d14982f887355da1959eba3a47b952fc6e7a/clang-tools-extra/clang-tidy/bugprone/UnusedReturnValueCheck.cpp#L144-L165 | Source ]]
> > 
> > ```lang=cpp
> > auto UnusedInIfStmt =
> >       ifStmt(eachOf(hasThen(MatchedCallExpr), hasElse(MatchedCallExpr)));
> >   auto UnusedInWhileStmt = whileStmt(hasBody(MatchedCallExpr));
> >   auto UnusedInDoStmt = doStmt(hasBody(MatchedCallExpr));
> > ```
> > 
> > Agreed, this is seemingly a subset of the inverse match.
> > For decltype, typedef, and noexcept (and perhaps several others), the good solution would be having a third route: calls that should not be counted. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.
> 
> I agree, these cases seem like they're not a use in the sense of discarding the return value.
> @aaron.ballman There is a bugprone-unused-return-value since mid 2018, in which the matched function set is configurable with a hardcoded default, and the matching logic is also... verbose.

Yeah, but this one is even more verbose than that one.

That said, I'm really not seeing a better approach to this problem. I'd be very curious how the performance of this check compares to the performance of the `bugprone-unused-return-value` check compared to running no checks. Maybe the performance spread isn't nearly as bad as I'm worried?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124446/new/

https://reviews.llvm.org/D124446