[PATCH] D124446: [clang-tidy] Add the misc-discarded-return-value check

Wed May 18 02:49:45 PDT 2022

whisperity added inline comments.

================
Comment at: clang-tools-extra/clang-tidy/misc/DiscardedReturnValueCheck.cpp:181
+
+  static const auto Decltype = decltypeType(hasUnderlyingExpr(Call));
+  static const auto TemplateArg =
----------------
aaron.ballman wrote:
> aaron.ballman wrote:
> > whisperity wrote:
> > > whisperity wrote:
> > > > whisperity wrote:
> > > > > aaron.ballman wrote:
> > > > > > So, I'm not super keen on this approach of having to try to identify every single place in which an expression is considered to be "used" -- this is going to be fragile because we'll miss places and it's going to be a maintenance burden because new places will be added as the languages evolve.
> > > > > > 
> > > > > > For example, if we're handling `decltype` as a use, why not `noexcept`? Or conditional `explicit`? What about a `co_return` statement?
> > > > > > 
> > > > > > I'm not certain what we can do to improve this, but I think it's worth trying to explore options to see if we can generalize what constitutes a use so that we can write a few custom matchers to do the heavy lifting instead of trying to play whack-a-mole.
> > > > > I've been having other thoughts about this `decltype` here... Actually, neither `decltype` nor `noexcept` should be handled as a //"use"// at all, while `co_return` should be the same as a `return` -- however, I think it was due to lack of projects where such could be meaningfully measured as a missed case was why implementing that failed.
> > > > > 
> > > > > For `decltype`, `typedef`, and `noexcept` (and perhaps several others), the good solution would be having a third route: calls that //should not be counted//. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.
> > > > As for better matching... Unfortunately, types in the AST are so varied and `hasDescendant` is too generic to express something like `stmt(anyOf(ifStmt(), forStmt(), switchStmt()), hasDescendant(Call))` to express in a single expression matching uses... The conditions are not always direct children of the outer node, while `hasDescendant` will match not just the condition but the entire tree... resulting in things like //both// functions in
> > > > 
> > > > ```lang=cpp
> > > > if (foo())
> > > >   bar()
> > > > ```
> > > > 
> > > > matching.
> > > > 
> > > > Well... generalisation... I can throw in a formal fluke:
> > > > 
> > > > > A **use** is a //context// for a specific `CallExpr C` in which we can reasonably assume that the value produced by evaluating `C` is loaded by another expression.
> > > > 
> > > > Now what I found is `-Wunused-result`, aka `SemaDiagnostics::warn_unused_expr`, which is triggered in the function `ExprResult Sema::ActOnFinishFullExpr(Expr* FE, SourceLocation CC, bool DiscardedValue, bool IsConstexpr);`. Now this function itself does //some// heuristics inside (with a **lot** of `FIXME`s as of rGdab5e10ea5dbc2e6314e0e7ce54a9c51fbcb44bd), but notably, `DiscardedValue` is a parameter. According to a quick search, this function (and its overloads) have **82** callsites within `Sema`, with many of them just tougher to decipher than others. Some of the other ways this function is called, e.g. `ActOnStmtExprResult`, have codes like this:
> > > > 
> > > > ```lang=cpp
> > > > IsStmtExprResult = GetLookAheadToken(LookAhead).is(tok::r_brace) && GetLookAheadToken(LookAhead + 1).is(tok::r_paren);
> > > > ```
> > > > 
> > > > So I would say most of the logic there is **very** parsing specific, and requires information that is only available during the parsing descent, and not later when someone tries to consume a `const AST`.
> > > @aaron.ballman There is a `bugprone-unused-return-value` since mid 2018, in which the matched function set is configurable with a hardcoded default, and the matching logic is also... verbose.
> > > 
> > > [[ http://github.com/llvm/llvm-project/blob/c1a9d14982f887355da1959eba3a47b952fc6e7a/clang-tools-extra/clang-tidy/bugprone/UnusedReturnValueCheck.cpp#L144-L165 | Source ]]
> > > 
> > > ```lang=cpp
> > > auto UnusedInIfStmt =
> > >       ifStmt(eachOf(hasThen(MatchedCallExpr), hasElse(MatchedCallExpr)));
> > >   auto UnusedInWhileStmt = whileStmt(hasBody(MatchedCallExpr));
> > >   auto UnusedInDoStmt = doStmt(hasBody(MatchedCallExpr));
> > > ```
> > > 
> > > Agreed, this is seemingly a subset of the inverse match.
> > > For decltype, typedef, and noexcept (and perhaps several others), the good solution would be having a third route: calls that should not be counted. Neither as a "consumed call", nor as a "bare call". Ignored, from both calculations. Maybe even for template arguments below.
> > 
> > I agree, these cases seem like they're not a use in the sense of discarding the return value.
> > @aaron.ballman There is a bugprone-unused-return-value since mid 2018, in which the matched function set is configurable with a hardcoded default, and the matching logic is also... verbose.
> 
> Yeah, but this one is even more verbose than that one.
> 
> That said, I'm really not seeing a better approach to this problem. I'd be very curious how the performance of this check compares to the performance of the `bugprone-unused-return-value` check compared to running no checks. Maybe the performance spread isn't nearly as bad as I'm worried?
While I am not familiar with all the intricacies, I know Clang-Tidy does a lot of memoisation for matches. At some point I really need to dig into it to understand it, but it might be reasonable to say that there are some caching for the matcher expressions.

I do not have a direct comparison ready against that check, but for @bahramib's thesis work (which he is putting together right now, hence why I'm responding 🙂) I have done the usual CI runs to gather data, and the run times for the analysis (at `-j 16`) are. (The `50%` and `80%` are the threshold values.)

```
bitcoin_v0.20.1 50% (single) : 0:01:02 - 140 reports
bitcoin_v0.20.1 80% (single) : 0:00:29 - 25 reports
codechecker_v6.17.0 50% (single) : 0:00:00 - 2 reports
codechecker_v6.17.0 80% (single) : 0:00:00 - 0 reports
contour_v0.2.0.173 50% (single) : 0:00:41 - 38 reports
contour_v0.2.0.173 80% (single) : 0:00:26 - 10 reports
curl_curl-7 50% (single) : 0:00:09 - 55 reports
curl_curl-7 80% (single) : 0:00:09 - 6 reports
ffmpeg_n4.3.1 50% (single) : 0:00:28 - 904 reports
ffmpeg_n4.3.1 80% (single) : 0:00:27 - 148 reports
libwebm_libwebm-1.0.0.27 50% (single) : 0:00:00 - 9 reports
libwebm_libwebm-1.0.0.27 80% (single) : 0:00:00 - 0 reports
llvm-project_llvmorg-12.0.0 50% (single) : 0:15:16 - 4615 reports
llvm-project_llvmorg-12.0.0 80% (single) : 0:14:34 - 1077 reports
memcached_1.6.8 50% (single) : 0:00:00 - 32 reports
memcached_1.6.8 80% (single) : 0:00:00 - 5 reports
mongo_r4.4.6 50% (single) : 0:20:23 - 1671 reports
mongo_r4.4.6 80% (single) : 0:20:13 - 370 reports
openssl_openssl-3.0.0-alpha7 50% (single) : 0:01:28 - 427 reports
openssl_openssl-3.0.0-alpha7 80% (single) : 0:00:41 - 56 reports
postgres_REL 50% (single) : 0:00:29 - 799 reports
postgres_REL 80% (single) : 0:00:21 - 62 reports
protobuf_v3.13.0-CSA-patched 50% (single) : 0:00:30 - 51 reports
protobuf_v3.13.0-CSA-patched 80% (single) : 0:00:20 - 20 reports
qtbase_v6.2.0 50% (single) : 0:04:12 - 1195 reports
qtbase_v6.2.0 80% (single) : 0:04:15 - 200 reports
sqlite_version-3.33.0 50% (single) : 0:00:05 - 284 reports
sqlite_version-3.33.0 80% (single) : 0:00:02 - 42 reports
tmux_2.6 50% (single) : 0:00:02 - 35 reports
tmux_2.6 80% (single) : 0:00:01 - 1 reports
twin_v0.8.1 50% (single) : 0:00:01 - 54 reports
twin_v0.8.1 80% (single) : 0:00:01 - 9 reports
vim_v8.2.1920 50% (single) : 0:00:02 - 227 reports
vim_v8.2.1920 80% (single) : 0:00:03 - 32 reports
xerces_v3.2.3 50% (single) : 0:00:08 - 125 reports
xerces_v3.2.3 80% (single) : 0:00:08 - 9 reports
```

-----

For completeness, here is the same report for the "multi TU" or "project level" mode. This measurement **only** includes the time it took for the `diagnose` phase to match, load the persistence, and emit diagnostics, and **DOES NOT** include the "collect" phase! However, it must be noted that the diagnose phase of this check only deals with printing diagnostics for matches, and no longer "counts", hence sometimes the numbers are lower than in the single-TU mode.

```
bitcoin_v0.20.1 50% (multi) : 0:00:50 - 320 reports
bitcoin_v0.20.1 80% (multi) : 0:00:31 - 92 reports
codechecker_v6.17.0 50% (multi) : 0:00:00 - 5 reports
codechecker_v6.17.0 80% (multi) : 0:00:00 - 0 reports
contour_v0.2.0.173 50% (multi) : 0:00:27 - 107 reports
contour_v0.2.0.173 80% (multi) : 0:00:27 - 8 reports
curl_curl-7 50% (multi) : 0:00:09 - 210 reports
curl_curl-7 80% (multi) : 0:00:10 - 104 reports
ffmpeg_n4.3.1 50% (multi) : 0:00:32 - 1790 reports
ffmpeg_n4.3.1 80% (multi) : 0:00:31 - 613 reports
libwebm_libwebm-1.0.0.27 50% (multi) : 0:00:00 - 10 reports
libwebm_libwebm-1.0.0.27 80% (multi) : 0:00:01 - 1 reports
llvm-project_llvmorg-12.0.0 50% (multi) : 0:16:58 - 6837 reports
llvm-project_llvmorg-12.0.0 80% (multi) : 0:16:54 - 2150 reports
memcached_1.6.8 50% (multi) : 0:00:00 - 49 reports
memcached_1.6.8 80% (multi) : 0:00:01 - 16 reports
mongo_r4.4.6 50% (multi) : 0:21:58 - 2463 reports
mongo_r4.4.6 80% (multi) : 0:22:26 - 1223 reports
openssl_openssl-3.0.0-alpha7 50% (multi) : 0:01:33 - 1060 reports
openssl_openssl-3.0.0-alpha7 80% (multi) : 0:01:05 - 244 reports
postgres_REL 50% (multi) : 0:00:26 - 644 reports
postgres_REL 80% (multi) : 0:00:31 - 215 reports
protobuf_v3.13.0-CSA-patched 50% (multi) : 0:00:21 - 122 reports
protobuf_v3.13.0-CSA-patched 80% (multi) : 0:00:20 - 75 reports
qtbase_v6.2.0 50% (multi) : 0:04:27 - 2764 reports
qtbase_v6.2.0 80% (multi) : 0:04:09 - 1093 reports
sqlite_version-3.33.0 50% (multi) : 0:00:05 - 288 reports
sqlite_version-3.33.0 80% (multi) : 0:00:03 - 48 reports
tmux_2.6 50% (multi) : 0:00:01 - 48 reports
tmux_2.6 80% (multi) : 0:00:05 - 1 reports
twin_v0.8.1 50% (multi) : 0:00:01 - 69 reports
twin_v0.8.1 80% (multi) : 0:00:03 - 22 reports
vim_v8.2.1920 50% (multi) : 0:00:03 - 383 reports
vim_v8.2.1920 80% (multi) : 0:00:06 - 54 reports
xerces_v3.2.3 50% (multi) : 0:00:09 - 197 reports
xerces_v3.2.3 80% (multi) : 0:00:23 - 38 reports
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124446/new/

https://reviews.llvm.org/D124446