[PATCH] D66014: [analyzer] Avoid unnecessary enum range check on LValueToRValue casts

Tue Aug 13 05:27:26 PDT 2019

chrish_ericsson_atx marked an inline comment as done.
chrish_ericsson_atx added inline comments.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/EnumCastOutOfRangeChecker.cpp:94-98
+  // If cast is implicit LValueToRValue, no conversion is taking place,
+  // and therefore no range check is needed.  Don't analyze further.
+  if (CE->getCastKind() == CK_LValueToRValue)
+    return;
+
----------------
NoQ wrote:
> chrish_ericsson_atx wrote:
> > NoQ wrote:
> > > chrish_ericsson_atx wrote:
> > > > NoQ wrote:
> > > > > If you look at the list of cast kinds, you'll be shocked to see ridiculously weird cornercases. Even though lvalue-to-rvalue cast definitely stands out (because it's the only one that has very little to do with actual casting), i'd still be much more comfortable if we *whitelist* the casts that we check, rather than blacklist them.
> > > > > 
> > > > > Can you take a look at the list and find which casts are we actually talking about and hardcode them here?
> > > > I'm happy to cross-check a list; however, I'm not aware of the list you are referring to.  Would you mind providing a bit more detail?
> > > See **[[ https://github.com/llvm-mirror/clang/blob/release_90/include/clang/AST/OperationKinds.def | include/clang/AST/OperationKinds.def ]]**.
> > Ah.   Okay -- That is a list I've seen before, of course...  :)   
> > 
> > Hmm...  regarding whitelisting versus blacklisting...  In this case, it seems to me that switching to whitelisting casts that we know we should be checking increases the risk that we would introduce a new bug -- specifically that we'd accidentally leave out cast type that should have been included, which would disable a legitimate check (that some users might already be relying on).  By contrast, blacklisting the one cast type we know should *not* be checked adds zero new risk and still fixes this bug.
> > 
> > The sense of risk for me comes partly from being fairly new to working on clang AST code -- this is my first change.   It strikes me that if I were to feel confident about having the "correct" whitelist, I would have to understand every cast type fairly deeply.  Given that I see several cast kinds in that list that I know nothing about, I'm concerned with the level of effort required, and that I still won't be able to fully mitigate the risk.
> > 
> > Can you help me understand your rationale for preferring a whitelist in this case?  Or, do you feel I've overstated the risk here?  I'm not emotionally invested in being proven correct-- just want to learn!  :)
> > 
> Generally, if you're not sure, try to not to warn. False positives are worse than false negatives. It is natural to start with a small set of cases in which you're sure, and then slowly cover more and more cases as you investigate further. Leave a comment if you're unsure if you covered all cases, so that people knew that the code is potentially incomplete and it's a good place to look for potential improvements.
> 
> When you're actively developing the checker, you should rely on your own experiments with the real-world code that you plan to analyze. When the checker is still in alpha, it's fine for it to be more aggressive than necessary, because "you're still experimenting with it", but this will need to be addressed when the checker is moved out of alpha.
> 
> In particular, you have the following tricks at your disposal:
> - Take a large codebase, run your checker on it (you'll need to do this regularly anyway) and include the cast kind in the warning text. This will give you a nice list of casts that you want to support (and probably also a list of casts that you //don't// want to support).
> - If you want to be notified of false negatives by your power-users, instead of a whitelist put an assertion (immediately before emitting the report) that the cast kind is one of the known cast kinds. This is a bit evil, of course, but it only affects the users that run analysis with assertions on, which means custom-built clang, which means power-users that actively want to report these bugs.
> - If you don't know what code does a specific AST node kind represent, as a last resort you can always assert that this kind of node never gets constructed and see which clang tests fail.
> - If you want to have some protection against newly introduced cast kinds, make a `switch (getCastKind())` and don't add a `default:` case into it. This way anybody who introduces a new cast kind would get a compiler warning ("unhandled case in switch") and will be forced to take a look at how this code should handle the new cast kind.
Okay.  Very fair points.  I've been focused too much on released production code lately-- the mantra there is to be a surgical as possible, and minimize risk of any changes other than the one we are specifically trying to correct.  Of course, that doesn't preclude wider-scope change, but it requires a great deal more time and effort to validate.  In this case, where the code is alpha, and where missing a valid warning is better than issuing a spurious one, it makes sense to do as you've described.  I'll work on that.   Thanks for taking the time to explain in more detail.

Logistical question now (since this is my first change and review for LLVM/clang code) -- does this review stall until I have updated code (which may not happen quickly due to other work on my plate)?  Or do we greenlight this change, and I just follow it up later with the whitelist version in a separate review?

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66014/new/

https://reviews.llvm.org/D66014