[PATCH] Modeling APIs in the Static Analyzer

Fri Aug 1 01:44:05 PDT 2014

Hi Ted,

Thank you for the review.

On 1 August 2014 07:25, Ted Kremenek <kremenek at apple.com> wrote:

> Hi Gábor,
>
> This is looking good to me.  Some minor nits/comments:
>
> - Please add doxygen comments for the CodeInjector class.
>

Done.

> - For the BugReporter patch, are there tests for that functionality
> change?  I saw tests in the other patch, but not that one.  It's fine to
> separate the review of that change before the primary change goes in, but I
> was curious.
>

Well, it may be a bit complicated. I deleted some code in BugReporter to
not to discard bug reports that are in a model file, and the plist part of
the test case only pass if that patch is applied (if the patch is not
applied the nullpointer dereference warning that has the position in the
modelfile will be discarded. In the long term it would be better to report
these errors elsewhere but it is not supported yet by the bugreporter
patch). I can move the plist check into a separate testcase and add that
case to the BugReporter patch instead. The division by zero test should
work without the BugReporter patch.

> - As for breaking code in the 'extra' repository, LLVM-internal API is not
> sacrosanct.  If we break the 'extra' projects we just need to update them,
> but I'm not certain if that is possible in this case.
>

As far as I can remember it would be a straightforward fix in the extra
repository. Clang-tidy calls CreateAnalysisConsumer.

> - For comments, please consistently use sentence casing and end with
> periods, and for type names use the appropriate casing.  For example:
>
> +  // modules create a separate compilerinstance for parsing modules,
> maybe it is
> +  // for reason so I mimic this behavior
> +  CompilerInstance Instance;
> ...
>
> This comment looks a bit suspect, since it seems like a question to
> yourself.  Here you use the word "I"; who is "I" in the context of this
> code?  The comment also seems like an unanswered question.  Is this a stale
> comment?
>
>
Done, the comment was improved.

> Another example is this comment:
>
> +  // FIXME: double memoization is redundant. Here and in bodyfarm.
> +  llvm::StringMap<Stmt *> Bodies;
>
> This can be made slightly cleaner.  For example:
>
> +  // FIXME: Double memorization is redundant, with
> +  /// memoization both here and in BodyFarm.
> +  llvm::StringMap<Stmt *> Bodies;
>

Done.

- Only use doxygen comments for documentation.  For example:
>
> +  if (notzero_notmodeled(p)) {
> +   /// There is no information about the value of p, because
> +   /// notzero_notmodeled is not modeled and the function definition
> +   /// is not available.
> +    int j = 5 / p; // expected-warning {{Division by zero}}
> +  }
>
> In this case we should use '//', not '///'.  The former are true comments,
> and the latter are candidates to be extracted for documentation.
>
>
Done.

> Overall, however, this is getting really close.
>
>
It is great.

Thanks,
Gábor

> Cheers,
> Ted
>
> On Jul 30, 2014, at 3:29 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
> Hi Ted,
>
> Thank you for the review.
>
>
> On 30 July 2014 08:18, Ted Kremenek <kremenek at apple.com> wrote:
>
>> Hi Gábor,
>>
>> Thanks for making progress on this very promising enhancement to the
>> analyzer.  I have an assortment of comments, in no particular order:
>>
>> - ModelInjector.h and ModelConsumer.h
>>
>> There is a comment at the top of these files, but I think a bit more
>> explanation is needed.  For example:
>>
>>   MetaConsumer.cpp:
>>
>>     +// "Meta" ASTConsumer for consuming model files.
>>
>> That doesn't really explain anything.  What does "Meta" in quotes mean?
>>  I think an explanation here on what this does is helpful when someone
>> discovers this code for the first time.
>>
>> Similarly, we should add some high-level comments for CodeInjector.h and
>> ModelInjector.h.  We have a good start in ModelInjector.h:
>>
>> +/// \file
>> +/// \brief Defines the clang::ento::ModelInjector class which implements
>> the
>> +/// clang::CodeInjector interface. This class is responsible for
>> injecting
>> +/// function definitions that were synthetized from model files.
>> +///
>>
>> Let's consider expanding it:
>>
>>  /// \brief This file defines the clang::ento::ModelInjector class which
>> implements the
>>  /// clang::CodeInjector interface. This class is responsible for
>> injecting
>>  /// function definitions that were synthesized from model files.
>>
>>  /// Model files allow definitions of functions to be lazily constituted
>> for functions
>>  /// which lack bodies in the original source code.  This allows the
>> analyzer
>>  /// to more precisely analyze code that calls such functions, analyzing
>> the
>>  /// artificial definitions (which typically approximate the semantics of
>> the
>>  /// called function) when called by client code.  These definitions are
>>  /// reconstituted lazily, on-demand, by the static analyzer engine.
>>
>> CodeInjector.h provides some information, but it is a bit vague:
>>
>> +///
>> +/// \file
>> +/// \brief Defines the clang::CodeInjector interface which is
>> responsible for
>> +/// injecting AST of function definitions from external source.
>> +///
>>
>> It's a bit unclear how this gets used.  I think a bit of prose here would
>> help clarify its role in the static analyzer.  I also think the
>> CodeInjector interface is also more abstract than the prose describes.
>>  There's nothing about CodeInjector's interface that requires the injected
>> definitions to come from an external source.  That's an implementation
>> detail of a concrete subclass.  Instead, all CodeInjector does is provide
>> an interface that lazily provides definitions for functions and methods
>> that may not be present in the original source.
>>
>
> I have added some further documentation to address these issues.
>
>
>
>>
>> I'm also looking at the change to
>> StaticAnalyzer/Frontend/FrontendActions.cpp, and wonder if we can simplify
>> things:
>>
>> +++ lib/StaticAnalyzer/Frontend/FrontendActions.cpp (working copy)
>> @@ -7,9 +7,11 @@
>>  //
>>
>>  //===----------------------------------------------------------------------===//
>>
>> +#include "clang/Frontend/CompilerInstance.h"
>>  #include "clang/StaticAnalyzer/Frontend/FrontendActions.h"
>> -#include "clang/Frontend/CompilerInstance.h"
>>  #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
>> +#include "clang/StaticAnalyzer/Frontend/ModelConsumer.h"
>> +#include "ModelInjector.h"
>>  using namespace clang;
>>  using namespace ento;
>>
>> @@ -18,6 +20,14 @@
>>    return CreateAnalysisConsumer(CI.getPreprocessor(),
>>                                  CI.getFrontendOpts().OutputFile,
>>                                  CI.getAnalyzerOpts(),
>> -                                CI.getFrontendOpts().Plugins);
>> +                                CI.getFrontendOpts().Plugins,
>> +                                new ModelInjector(CI));
>>  }
>>
>>
>>
>> It looks like CreateAnalysisConsumer just continues to grow more
>> arguments, all which derive from using 'CI'.  This seems silly, since this
>> function is called in one place.  Instead of intro ducting a dependency on
>> ModelInjector.h in this file, we can just sink these arguments into
>> CreateAnalysisConsumer() itself, resulting in:
>>
>>   return CreateAnalysisConsumer(CI);
>>
>> and let CreateAnalysisConsumer() do all that boilerplate.
>>
>
> That was my original idea as well but it broke the compilation of some
> code in extra repository and I wasn't sure if it is ok to break the API
> with this patch. But I find it cleaner this way so I modified it in this
> iteration.
>
>
>> Next, let's look at the change to FrontendAction:
>>
>>  class FrontendAction {
>> +  /// Is this action invoked on a model file? Model files are incomplete
>> +  /// translation units that relies on type information from another
>> translation
>> +  /// unit. Check ParseModelFileAction for details.
>> +  bool ModelFile;
>>
>>
>> Perhaps "IsModelFile"?  "ModelFile" sounds like it should be a reference
>> to the file itself.
>>
>>    FrontendInputFile CurrentInput;
>>    std::unique_ptr<ASTUnit> CurrentASTUnit;
>>    CompilerInstance *Instance;
>> @@ -105,7 +109,11 @@
>>    /// @}
>>
>>  public:
>> -  FrontendAction();
>> +  /// \brief Constructor
>> +  ///
>> +  /// \param modelFile determines whether the source files this action
>> invoked
>> +  /// on should be treated as a model file. Defaults to false.
>> +  FrontendAction(bool modelFile = false);
>>
>>
>> It seems suboptimal to modify the interface of FrontendAction just for
>> this one edge case.  Instead of modifying the constructor arguments, we
>> could default initialize "IsModelFile" to false, and have a setter to
>> change it.  For example:
>>
>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *>
>> &Bodies)
>>     : ASTFrontendAction(/*ModelFile=*/true), Bodies(Bodies) {}
>>
>> becomes:
>>
>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *>
>> &Bodies)
>>     : Bodies(Bodies)  {
>>     IsModelFile = true;
>>   }
>>
>> Looking at this more, I wonder if we should modify FrontendAction at all.
>>  The only place where isModelParsingAction() is called is in one spot in
>> CompilerInstance.cpp:
>>
>>    if (hasSourceManager() && !Act.isModelParsingAction())
>>
>> It *might* be cleaner to just have a virtual member function in
>> FrontendAction, which defaults to returning false, but is generic for all
>> subclasses to override.  Then we don't need the "IsModelFile" field in
>> FrontendAction at all, and we just have ParseModelFileAction override that
>> single member function.  We could then name that method to be something a
>> bit more generic.  That would allow us to not touch FrontendAction at all
>> except for providing that single virtual method that can be overridden in
>> subclasses.  I somewhat prefer this approach because it provides a cleaner
>> separation of concerns between FrontendAction (which is defined
>> libFrontend) and the static analyzer.  That would also allow you to get rid
>> of isModelParsingAction() entirely (replacing it with something more
>> generic).
>>
>>
> You are right, it is much cleaner to use a virtual function, so I modified
> the patch to use that approach. The new virtual function has the same name
> because I have yet to find any better and more general name yet. Do you
> have an idea for a better name?
>
>
>> As for the test case:
>>
>> +typedef int* intptr;
>> +
>> +void modelled(intptr p);
>> +
>> +int main() {
>> + modelled(0);
>> + return 0;
>> +}
>>
>>
>> Please add some comments in this test file explaining what is happening.
>>  Also, it would be great if this both used FileCheck (which it does now)
>> but also verified the diagnostics so we get cross-checking of the output
>> (we see this in some analyzer tests).  It also makes it easier to
>> understand the test.
>>
>> Also, is there a reason to break up the tests
>> between model-suppress-falsepos.cpp and model-file.cpp?  It seems like one
>> test file will do fine; just clearly comment on what is happening for each
>> test.  I also recommend called the modeled function "modeledFunction"
>> instead of "modelled" (which according to my spell checker has an
>> additional 'l').
>>
>
> I have merged the test files and also added some commets to explain what
> is going on. I have fixed the misspelling as well. The nullpointer
> dereference is only checked through plist because the point where the
> comment with the expected warning should be added is inside the model file
> and it did not work for me if the comment was in a separate file. If there
> is a different way to verify the warnings that are in a separate file and I
> did not find it, please let me know.
>
>
>>
>> As for the model files themselves:
>>
>> Index: test/Analysis/modelled.model
>> ===================================================================
>> --- test/Analysis/modelled.model  (revision 0)
>> +++ test/Analysis/modelled.model  (working copy)
>> @@ -0,0 +1,3 @@
>> +void modelled(intptr p) {
>> + ++*p;
>> +}
>> \ No newline at end of file
>> Index: test/Analysis/notzero.model
>> ===================================================================
>> --- test/Analysis/notzero.model (revision 0)
>> +++ test/Analysis/notzero.model (working copy)
>>
>>
>> Let's put these in a separate subdirectory, for example, "models",
>> instead of mixing them with the tests.  This way they really serve as
>> "inputs" to the analyzer.
>>
>
> I have moved the model files to tests/Inputs/Models.
>
>
>>
>> Overall this is looking good.  I think the explanatory comments will
>> really help people understand what this is doing, and I think changing how
>> we thread the information through FrontendAction will help not introduce an
>> artificial tainting of FrontendAction with concepts specific to the static
>> analyzer.
>>
>> Cheers,
>> Ted
>>
>>
> Thanks,
> Gábor
>
>
>>
>> On Jul 16, 2014, at 2:45 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>
>>
>>
>>
>> On 14 July 2014 19:32, Anna Zaks <ganna at apple.com> wrote:
>>
>>>
>>> On Jul 13, 2014, at 6:11 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>>
>>> Hi Anna,
>>>
>>> Thank you for the review. I have tweaked the test, so it no longer
>>> requires the error reporting tweak that is not done yet to pass. I have
>>> also added some high level comments to some files, if you think some
>>> information is lacking I will add them in the next iteration as well. The
>>> BugReporter patch is now separated into a different patch.
>>>
>>>
>>> On 11 July 2014 18:02, Anna Zaks <ganna at apple.com> wrote:
>>>
>>>>
>>>> For example, modeling functions should allow you to find bugs and
>>>> suppress false positives outside of those functions. I would suggest adding
>>>> a few of those tests first.
>>>>
>>>>
>>> How are the false positives suppressed? I did not find any resource on
>>> that. Found some analyzer attributes but I did not find them suitable for
>>> this purpuse at the first glance. But I think once the locations that are
>>> in a model file are omitted from the report path, the regular methods for
>>> suppressing false positives should work (and I will definitely add test
>>> case to ensure this once it is done).
>>>
>>>
>>> What I meant is that it is possible to construct a test where ability to
>>> model a function would eliminate a false positive. This would be another
>>> way to test your patch without worrying about BugReporter.
>>>
>>
>> I got it now, thansk. I have updated the patch with a test case where a
>> false positive case is eliminated by a model file.
>>
>> Thanks,
>> Gábor
>>
>>
>>> Thanks,
>>> Gábor
>>> <api_modeling.patch><bugreporter.patch>
>>>
>>>
>>>
>> <api_modeling.patch><bugreporter.patch>
>>
>>
>>
> <api_modeling.patch><bugreporter.patch>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140801/080b133e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: api_modeling.patch
Type: text/x-patch
Size: 44638 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140801/080b133e/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugreporter.patch
Type: text/x-patch
Size: 1731 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140801/080b133e/attachment-0001.bin>