[PATCH] D77150: [Analyzer] New Option for ContainerModeling: AggressiveEraseModeling

Tue Jul 7 09:34:48 PDT 2020

NoQ added inline comments.

================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:647-653
     CmdLineOption<Boolean,
                   "AggressiveStdFindModeling",
                   "Enables exploration of the failure branch in std::find-like "
                   "functions.",
                   "false",
                   Released>
   ]>,
----------------
baloghadamsoftware wrote:
> Szelethus wrote:
> > Szelethus wrote:
> > > NoQ wrote:
> > > > baloghadamsoftware wrote:
> > > > > Szelethus wrote:
> > > > > > Ah, okay, I see which one you refer to. We should totally make this non-user facing as well. 
> > > > > The option is not about state split! It is for choosing between the (default) conservative approach and a more aggressive one. It is absolutely for the user to set. Some users prefer less false positive for the price of losing true positives. However, some other users prefer more true positives for the price of additional false positives. This is why we have checker options to be able to serve both groups.
> > > > > 
> > > > > This feature was explicitly requested by our users who first disabled iterator checkers because of the too many false positives but later re-enabled them because they run into a bug in a production system which could have been prevented by enabling them. However, they run into another bug that our checker did not find because of its conservative behavior. They requested a more aggressive one but we must not do it for everyone. The concept of the Analyzer is that we apply the conservative approach by default and the aggressive can be enabled by analyzer and checker options.
> > > > These options are unlikely to be on-by-default in vanilla clang because of having a fundamentally unfixable class of false positives associated with them. Honestly, i've never seen users who are actually ok with false positives. I've regularly seen users say that they are ok with false positives when they wanted their false negatives fixed, but it always turned out that they don't understand what they're asking for and they quickly changed their mind when they saw the result. But you guys basically vend your own tool to your own customers and bear your own responsibility and i therefore can't tell you what to do in your realm, and i'm ok with having options that allow you to disagree with my choices.
> > > > 
> > > > inb4, "//`-analyzer-config` is not user-facing!!!11//"
> > > >inb4, "-analyzer-config is not user-facing!!!11"
> > > I was starting to breath really heavy and type really fast :D
> > > 
> > > Jokes aside, I'll spill the beans, we don't have a really convenient way to tweak these options through CodeChecker just yet, so for the time being, they really aren't, but maybe one day.
> > > 
> > > With that said, I also agree with you regarding options that overapproximate the analysis (greater code coverage with more false and true positives). User facing options should be the ones where the nature of the bug itself if subjective, like whether we want to check the initializedness of pointees. We sure can regard them as unwanted, or we could regard them as totally fine, so an option is a great compromise.
> > > 
> > > >[...] it always turned out that they don't understand what they're asking for [...]
> > > 
> > > This is why I think this shouldn't be user-facing. You need to be quite knowledgeable about the analyzer, and IteratorChecker in particular to make good judgement about enabling the option added in this patch. That doesn't mean you shouldn't experiment with it, but I'd prefer to not expose it too much.
> > > 
> > > I just don't see this feature being that desired by many if they knew the costs, even if one user really wants it. We could however, in that case, say
> > > "Hey, there is this option you can enable that might help you hunt down more bugs. It does change the actual analysis itself, and experience showed that it might result in more false positives and could impact performance, so we opted not to make it user-facing, but it is there, if you feel adventurous."
> > Granted, I might be demonizing the cons of these options too much, but the description doesn't help on understanding it that much either O:)
> So you both are saying that I should tell the user that we cannot help, the community does not support that he catches this bug in early phase with the analyzer. He must catch it in the testing phase for higher costs. This is not the best way to build confidence in the analyzer.
> 
> In my perception options are used to serve one users need without harming others. I never understand why there is so much resistance against introducing an option that does not change the default behavior at all.
1. That's right, implementing every single suggestion users have is a terrible idea. Not everything is possible, not everything is feasible. Populism should not be put ahead of common sense. As an extreme example, imagine you're in the 80's and a user complained that they need a voice recognition feature in their computer ("Why do I have to type all this text, it's so slow???"). Would you be ready to inform them that it'll take 30 years and a few revolutionary breakthroughs in computer science to implement? Would you be ready to tell them that learning how to touch-type will make their typing faster than anything they could ever achieve with dictation? Because that's your job as a programmer to be assertive and to be able to say "no" or "maybe later" or "you're not using the program correctly" when user demands don't make sense or are impossible to implement or if the user doesn't know what they're asking for. Developers of every single program that has any users at all constantly struggle with this.

2. Supporting multiple configurations is much harder than supporting a single configuration. I wish I could say that it will be entirely up to you to maintain these configurations, including the situations when it doesn't cause problems yet but obstructs future development. However, I can't say that, due to 3.

3. You can't control who uses your option once it's landed in open-source. By committing the option to the upstream LLVM, you're not shipping this option to just one user, you're shipping it to the entire world.

4. You can't simply say "I know it may sometimes work incorrectly but I tested it and it works for me". The option may behave fine on codebases in your company but completely ruin reputation of the static analyzer by producing a lot of false positives on a famous open-source project that you didn't test it on. This is why we care a lot about correctness and not introducing new sources of false positives.

5. You may say that the users know what they're doing when they turn the option on, and that they're willing to accept some false positives. That's not true though; you don't personally brief every user of your option. They may try it simply because they heard about it from someone else. They're not always willing to read documentation. One user may enable the option for all other developers and they may never learn that this was an option to begin with.

6. In particular, i believe that false positives destroy confidence in the analyzer much more than false negatives. I also don't trust the users when they say they don't care about false positives. I heard this many times from a variety of people but when I gave them what they wanted they immediately started complaining about false positives every single time.

7. Static analysis is not a one-size-fits-all solution to bug finding. Instead of trying to find a way to find every kind of bug statically, you should be trying to find the best tool for the job. There are areas in which static analysis excels, like finding exotic execution paths not covered by tests, but it's not good for everything.

8. Static analysis is not necessarily cheaper than testing or code review or dynamic analysis. Understanding a single analyzer report may take hours.

9. False positives reduce cost-effectiveness of static analysis dramatically. They may cause the user to spend a lot of time trying to understand an incorrect warning, or the users may introduce a new bug when they try to silence the warning without understanding the warning and/or the surrounding code.

10. Last but not least, honestly, we have significantly more important problems to fix. If you claim that adding an option makes a few users happy without changing anything for anybody else, then you must also admit that you're doing a lot of work for just a small portion of users, when you could spend your time (and our time) benefiting *every* user instead.

So, like, options aren't too bad on their own but they're definitely not as free as you describe. You still need to think about *all* users who can potentially access the option, not just the one who's in front of you. "The analyzer produces false positives under an option" still means that the analyzer produces false positives in one of the supported configurations. Marking features as opt-in doesn't give you an excuse for shipping bad features.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77150/new/

https://reviews.llvm.org/D77150