[PATCH] D33672: [analyzer] INT50-CPP. Do not cast to an out-of-range enumeration checker

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Nov 5 07:48:31 PST 2018


aaron.ballman added inline comments.


================
Comment at: lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp:96
+  // Get the value of the expression to cast.
+  const auto ValueToCastOptional =
+      C.getSVal(CE->getSubExpr()).getAs<DefinedOrUnknownSVal>();
----------------
NoQ wrote:
> aaron.ballman wrote:
> > NoQ wrote:
> > > aaron.ballman wrote:
> > > > Szelethus wrote:
> > > > > Szelethus wrote:
> > > > > > ZaMaZaN4iK wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > ZaMaZaN4iK wrote:
> > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > ZaMaZaN4iK wrote:
> > > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > > ZaMaZaN4iK wrote:
> > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > `const auto *`
> > > > > > > > > > > > > Why do we need this change here? If I understand correctly, with `const auto*` we also need change initializer to `C.getSVal(CE->getSubExpr()).getAs<DefinedOrUnknownSVal>().getPointer()`. But I don't understand why we need this.
> > > > > > > > > > > > Is `ValueToCastOptional` a pointer, a reference, or just an actual `DefinedOrUnknownSVal`? I can't tell.
> > > > > > > > > > > > (sidenote: would be great to have a clang-tidy check for this.)
> > > > > > > > > > > `ValueToCastOptional` is `llvm::Optional<DefinedOrUnknownSVal>`
> > > > > > > > > > See, all my guesses were wrong. That is why it should not be `auto` at all.
> > > > > > > > > I don't agree with you for this case. Honestly it's like a yet another holywar question. If we are talking only about this case - here you can see `getAs<DefinedOrUnknownSVal>` part of the expression. this means clearly (at least for me) that we get something like `DefinedOrUnknownSVal`. What we get? I just press hotkey in my favourite IDE/text editor and see that `getAs` returns `llvm::Optional<DefinedOrUnknownSVal>`. From my point of view it's clear enough here.
> > > > > > > > > 
> > > > > > > > > If we are talking more generally about question "When should we use `auto` at all? " - we can talk, but not here, I think :)
> > > > > > > > https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable comes to mind.
> > > > > > > > > What we get? I just press hotkey in my favourite IDE/text editor and see that getAs returns llvm::Optional<DefinedOrUnknownSVal>
> > > > > > > > Which hotkey do i need to press to see this here, in the phabricator?
> > > > > > > > 
> > > > > > > > This really shouldn't be `auto`, if you have to explain that in the variable's name, justify it in review comments.
> > > > > > > Ok, didn't know about such LLVM coding standard. Of course, with this information I will fix using `auto` here. Thank you.
> > > > > > Actually, I disagree. In the Static Analyzer we use `auto` if the return type is in the name of the expression, and `getAs` may fail, so it returns an `Optional`. In the case where a nullptr may be returned to signal failure, `auto *` is used, so I believe that `auto` is appropriate here.
> > > > > But don't change it back now, it doesn't matter a whole lot :D
> > > > > Actually, I disagree. In the Static Analyzer we use auto if the return type is in the name of the expression, and getAs may fail, so it returns an Optional. 
> > > > 
> > > > The static analyzer should be following the same coding standard as the rest of the project. This is new code, so it's not matching inappropriate styles from older code, so there's really no reason to not match the coding standard.
> > > > 
> > > > > In the case where a nullptr may be returned to signal failure, auto * is used, so I believe that auto is appropriate here.
> > > > 
> > > > I disagree that `auto` is appropriate here. The amount of confusion about the type demonstrates why.
> > > I confirm the strong local tradition of preferring `auto` for optional `SVal`s in new code, and i believe it's well-justified from the overall coding guideline's point of view. Even if you guys didn't get it right from the start, the amount of Analyzer-specific experience required to understand that it's an optional is very minimal and people get used to it very quickly when they start actively working on the codebase.
> > > 
> > > On one hand, being a class that combines LLVM custom RTTI with pass-by-value semantics (i.e., it's like `QualType` when it comes to passing it around and like `Type` when it comes to casting), optionals are inevitable for representing `SVal` dynamic cast results.
> > > 
> > > On the other hand, `SVal` is one of the most common classes of objects in the Static Analyzer (like, maybe, `Stmt` in Clang; i think a lot of people who are interested in Static Analyzer more than in the rest of Clang learn about `SVal`s earlier than about `Stmt`s), and in particular `SVal` casts are extremely common (around 3-4 `SVal::getAs<T>()` casts per a path-sensitive checker, not counting `castAs`, ~2x more common than arithmetic operations over `SVal`s, only ~3x less common than all sorts of `dyn_cast` in all checkers, including path-insensitive checkers), so it's something you get used to really quickly.
> > > 
> > > Writing `Optional<DefinedOrUnknownSVal> DV = V.getAs<DefinedOrUnknownSVal>();` in a lot of checker callbacks is annoying for pretty much all checker developers from day 1. This clutters the most important parts of the checker's logic: transfer functions and the definition of the error state. Most bugs are in *that* part of the code, and those bugs are *not* due to using `auto` for casts. Not using `auto` almost doubles the amount of code you need to write to perform a simple run-time type check. With a more verbose variable name or a bit more code around it, it also causes a line break.
> > > 
> > > And it only provides information that most of the readers either already know ("`SVal` is a value-type, so it uses optionals"), or memorize immediately after encountering this pattern once on their first day, or barely even care (optionals are used like pointers anyway). Being finally allowed to replace it with just `auto` after migration to C++11 was divine.
> > > 
> > > So i'm still strongly in favor of keeping this pattern included in the list of //"places where the type is already obvious from the context"//. This is the top-1 source of annoying boilerplate in Static Analyzer, and it's as easy to remember as it is to learn that `dyn_cast<T>(S)` returns a pointer (or a reference? - you need to look up the definition of `S` to figure this out, unlike `V.getAs<T>()` that is always an optional for `SVal`s).
> > > 
> > > P.S. Though i admit that coming up with a less annoying API is also a good idea :)
> > > P.P.S. I guess some sort of `Optional<auto>` would have made everybody happy. But it's not a thing unfortunately :(
> > Thank you for the explanation. However, I'm still not convinced this is supported by the coding style guideline. The point to "places where the type is already obvious from the context" has (in my estimation) only covered places where the type truly is obvious. i.e., it's either spelled out explicitly (`dyn_cast<>`, `getAs<>`, etc) or it's entirely immaterial for understanding (iterators). Whether a value is optional strikes me as highly pertinent information for code reviewers or maintainers to understand because it conveys information about the validity of a value. To me, this is just as important as the distinction between `auto`, `auto *`, and `auto &`, which we make the user spell out in the case of pointers and references. From what I can tell, we also spell out `Optional` pretty consistently elsewhere in the product.
> > 
> > The rationale that this is a common idiom makes the push-back understandable, but at the same time, use of `auto` where the type is not immediately obvious makes the static analyzer more hostile to get into, especially for people who only stray into this part of the code base periodically (which is one major motivation for having a style guide in the first place).
> > 
> > I also don't do a ton of work in the static analyzer, so take this feedback with a grain of salt. I certainly don't expect to change the static analyzer's decision here. However, this is also the second time this week I've run into "we don't do that in the static analyzer" and both times have been in response to common review feedback that applies elsewhere in the product and both times have consumed review time from multiple people in order to resolve.
> > 
> > Ultimately, a style guide is just that -- a guide, not a mandate. It may also be that deviation from the guide here is reasonable, but it does come at a cost. That said, this ship may have already sailed; churning the code to remove use of `auto` comes with its own costs that may not be worth paying.
> Like, i mean, until now i always thought of using `auto` for casts, including casts of value-types, as of something that the coding guideline explicitly asks for, and didn't even consider that it might be a deviation, and that's the first time i hear this concern raised. Though now that you point this out, the problem with highlighting that it's //the// `llvm::Optional` is indeed an unclear situation.
> 
> For me as a maintainer, pure `auto` is definitely much easier to read and review when used with `getAs`. For a casual reader - i don't know, i accept that it might be harder than i imagine.
> 
> As usual, i'm happy to be proven wrong and/or accept any decision on that matter. To me coding guidelines are definitely above personal preferences.
> 
> > conveys information about the validity of a value
> 
> This is conveyed via the choice between `getAs` and `castAs`.
> Though now that you point this out, the problem with highlighting that it's the llvm::Optional is indeed an unclear situation.

Yup!

> This is conveyed via the choice between getAs and castAs.

That's a fair point, but in the rest of the code base, `getAs<>` returns a pointer. What started off this discussion was me asking for `const auto *` to clarify that it was a pointer to constant data when it really wasn't one (at least directly). So while it does convey the semantics, it still is a bit confusing.

However, I think you're right. The difference between `getAs<>` and `castAs<>` may be sufficient once you know the static analyzer a bit better.


https://reviews.llvm.org/D33672





More information about the cfe-commits mailing list