[PATCH] Modeling APIs in the Static Analyzer

Wed Aug 13 23:35:23 PDT 2014

Hi Ted,

The ModelInjector is no longer instantiated when the modelpath is not set.

Thanks,
Gábor

On 13 August 2014 07:11, Ted Kremenek <kremenek at apple.com> wrote:

> Thanks Gábor.  This looks great.
>
> One last thing that occurred to me is that we should not create a
> ModelInjector at all unless the model-path is specified.  This allows us to
> preserve the existing behavior in the analyzer while we continue to evolve
> this new functionality.
>
> Specifically:
>
>  std::unique_ptr<AnalysisASTConsumer>
> -ento::CreateAnalysisConsumer(const Preprocessor &pp, const std::string
> &outDir,
> -                             AnalyzerOptionsRef opts,
> -                             ArrayRef<std::string> plugins) {
> +ento::CreateAnalysisConsumer(CompilerInstance &CI) {
>    // Disable the effects of '-Werror' when using the AnalysisConsumer.
> -  pp.getDiagnostics().setWarningsAsErrors(false);
> +  CI.getPreprocessor().getDiagnostics().setWarningsAsErrors(false);
>
> -  return llvm::make_unique<AnalysisConsumer>(pp, outDir, opts, plugins);
> +  return llvm::make_unique<AnalysisConsumer>(
> +      CI.getPreprocessor(), CI.getFrontendOpts().OutputFile,
> +      CI.getAnalyzerOpts(), CI.getFrontendOpts().Plugins,
> +      new ModelInjector(CI));
>  }
>
>
> We can query 'opts' to see if model-path is empty; if it is we can pass
> nullptr instead of 'new ModelInjector(CI)'.
>
> On Aug 12, 2014, at 1:22 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
> Hi Ted,
>
> Thank you for the review, I have altered the patch accordingly, and also
> added the patch to follow up the API change in clang tidy.
>
>
> On 12 August 2014 08:44, Ted Kremenek <kremenek at apple.com> wrote:
>
>>
>> On Aug 1, 2014, at 1:44 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>
>> Hi Ted,
>>
>> Thank you for the review.
>>
>> On 1 August 2014 07:25, Ted Kremenek <kremenek at apple.com> wrote:
>>
>>> Hi Gábor,
>>>
>>> This is looking good to me.  Some minor nits/comments:
>>>
>>> - Please add doxygen comments for the CodeInjector class.
>>>
>>
>> Done.
>>
>>
>>> - For the BugReporter patch, are there tests for that functionality
>>> change?  I saw tests in the other patch, but not that one.  It's fine to
>>> separate the review of that change before the primary change goes in, but I
>>> was curious.
>>>
>>
>> Well, it may be a bit complicated. I deleted some code in BugReporter to
>> not to discard bug reports that are in a model file, and the plist part of
>> the test case only pass if that patch is applied (if the patch is not
>> applied the nullpointer dereference warning that has the position in the
>> modelfile will be discarded. In the long term it would be better to report
>> these errors elsewhere but it is not supported yet by the bugreporter
>> patch). I can move the plist check into a separate testcase and add that
>> case to the BugReporter patch instead. The division by zero test should
>> work without the BugReporter patch.
>>
>>
>> Ok, this make sense.  Can you clarify what you mean by "better to report
>> these errors elsewhere"?
>>
>>
> It might be confusing for the user, if the execution path of the bug
> report contains a location that is inside a model file so the report
> contains codes and files that are not present in the analysed project. So
> it might be more user friendly if the locations that are inside a model
> file would be excluded from the reported execution path, however if this is
> not an issue I am ok with reporting those locations as well.
>
>
>>
>>> - As for breaking code in the 'extra' repository, LLVM-internal API is
>>> not sacrosanct.  If we break the 'extra' projects we just need to update
>>> them, but I'm not certain if that is possible in this case.
>>>
>>
>> As far as I can remember it would be a straightforward fix in the extra
>> repository. Clang-tidy calls CreateAnalysisConsumer.
>>
>>
>> Sounds good.  Let's get the right API and just fix up clang-tidy.
>>
>>
>>
>>> - For comments, please consistently use sentence casing and end with
>>> periods, and for type names use the appropriate casing.  For example:
>>>
>>> +  // modules create a separate compilerinstance for parsing modules,
>>> maybe it is
>>> +  // for reason so I mimic this behavior
>>> +  CompilerInstance Instance;
>>> ...
>>>
>>> This comment looks a bit suspect, since it seems like a question to
>>> yourself.  Here you use the word "I"; who is "I" in the context of this
>>> code?  The comment also seems like an unanswered question.  Is this a stale
>>> comment?
>>>
>>>
>> Done, the comment was improved.
>>
>>
>>> Another example is this comment:
>>>
>>> +  // FIXME: double memoization is redundant. Here and in bodyfarm.
>>> +  llvm::StringMap<Stmt *> Bodies;
>>>
>>> This can be made slightly cleaner.  For example:
>>>
>>> +  // FIXME: Double memorization is redundant, with
>>> +  /// memoization both here and in BodyFarm.
>>> +  llvm::StringMap<Stmt *> Bodies;
>>>
>>
>> Done.
>>
>> - Only use doxygen comments for documentation.  For example:
>>>
>>> +  if (notzero_notmodeled(p)) {
>>> +   /// There is no information about the value of p, because
>>> +   /// notzero_notmodeled is not modeled and the function definition
>>> +   /// is not available.
>>> +    int j = 5 / p; // expected-warning {{Division by zero}}
>>> +  }
>>>
>>> In this case we should use '//', not '///'.  The former are true
>>> comments, and the latter are candidates to be extracted for documentation.
>>>
>>>
>> Done.
>>
>>
>>> Overall, however, this is getting really close.
>>>
>>>
>> It is great.
>>
>> Thanks,
>> Gábor
>>
>>
>> Wonderful.  The rest of my comments are minor:
>>
>> +/// \brief CodeInjector is an interface which is responsible
>> forinjecting AST of
>> +/// function definitions that may not be available in the original
>> source.
>> +///
>> +/// The getBody function will be called each time the static analyzer
>> examines a
>> +/// function call that has no definition available in the current
>> translation
>> +/// unit. If the returned statement is not a nullpointer, it is assumed
>> to be
>> +/// the body of a function which will be used for the analysis. The
>> source of
>> +/// the body can be arbitrary, but it is advised to use memoization to
>> avoid
>> +/// unnecessary reparsing of the external source that provides the body
>> of the
>> +/// functions.
>>
>>
>>   "forinjecting" -> "for injecting"
>>   "nullpointer" -> "null pointer"
>>
>> +++ include/clang/StaticAnalyzer/Frontend/FrontendActions.h (working copy)
>> @@ -10,10 +10,16 @@
>>  #ifndef LLVM_CLANG_GR_FRONTENDACTIONS_H
>>  #define LLVM_CLANG_GR_FRONTENDACTIONS_H
>>
>> +#include <map>
>> +
>>
>>
>> This "#include" of <map> doesn't seem needed.  Neither is the one in
>> ModelConsumer.h
>>
>> +++ lib/StaticAnalyzer/Frontend/ModelConsumer.cpp (working copy)
>> @@ -0,0 +1,42 @@
>> +//===--- ModelConsumer.cpp - ASTConsumer for consuming model files
>> --------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>>
>> +//===----------------------------------------------------------------------===//
>> +///
>> +/// \file
>> +/// \brief This file implements an ASTConsumer for consuming model files.
>> +///
>> +/// This ASTConsumer handles the AST of a parsed model file. All top
>> level
>> +/// function definitions will be collected from that model file for later
>> +/// retrieval during the static analyzis. The body of these functions
>> will not
>> +/// be injected into the ASTUnit of the analyzed translation unit. It
>> will be
>> +/// available through the BodyFarm which is utilized by the
>> AnalysisDeclContext
>> +/// class.
>> +///
>>
>>
>>   "analyzis" -> "analysis"
>>
>> +  // The instance wants to take ownership, however disablefree frontend
>> option
>> +  // is set to true to avoid double free issues
>>
>>
>> Use  the actual casing for the option for technical precision:
>>
>>   DisableFree
>>
>> +  /// \brief Synthetize a body for a declaration
>> +  ///
>> +  /// This method first looks up the appropriate model file based on the
>> +  /// model-path configuration option and the name of the declaration
>> that is
>> +  /// looked up. If no model were synthetized yet for a function with
>> that name
>> +  /// it will create a new compiler instance to parse the model file
>> using the
>> +  /// ASTContext, Preprocessor, SourceManager of the original compiler
>> instance.
>> +  /// The former resources are shared between the two compiler instance,
>> so the
>> +  /// newly created instance have to "leak" these objects, since they
>> are owned
>> +  /// by the original instance.
>>
>>
>>    Synthetize -> Synthesize
>>   synthetized -> synthesized
>>
>> +  std::vector<std::unique_ptr<ASTUnit> > ModelAsts;
>>
>>
>> I'd prefer this to be "ModelASTs", as 'AST' is an acronym.
>>
>>
> It was an unused member (from an earlier implementation) that I forgot to
> remove, but done now.
>
>
>> Otherwise, this all looks great to me.
>>
>>
>>
>>> Cheers,
>>> Ted
>>>
>>> On Jul 30, 2014, at 3:29 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>>
>>> Hi Ted,
>>>
>>> Thank you for the review.
>>>
>>>
>>> On 30 July 2014 08:18, Ted Kremenek <kremenek at apple.com> wrote:
>>>
>>>> Hi Gábor,
>>>>
>>>> Thanks for making progress on this very promising enhancement to the
>>>> analyzer.  I have an assortment of comments, in no particular order:
>>>>
>>>> - ModelInjector.h and ModelConsumer.h
>>>>
>>>> There is a comment at the top of these files, but I think a bit more
>>>> explanation is needed.  For example:
>>>>
>>>>   MetaConsumer.cpp:
>>>>
>>>>     +// "Meta" ASTConsumer for consuming model files.
>>>>
>>>> That doesn't really explain anything.  What does "Meta" in quotes mean?
>>>>  I think an explanation here on what this does is helpful when someone
>>>> discovers this code for the first time.
>>>>
>>>> Similarly, we should add some high-level comments for CodeInjector.h
>>>> and ModelInjector.h.  We have a good start in ModelInjector.h:
>>>>
>>>> +/// \file
>>>> +/// \brief Defines the clang::ento::ModelInjector class which
>>>> implements the
>>>> +/// clang::CodeInjector interface. This class is responsible for
>>>> injecting
>>>> +/// function definitions that were synthetized from model files.
>>>> +///
>>>>
>>>> Let's consider expanding it:
>>>>
>>>>  /// \brief This file defines the clang::ento::ModelInjector class
>>>> which implements the
>>>>  /// clang::CodeInjector interface. This class is responsible for
>>>> injecting
>>>>  /// function definitions that were synthesized from model files.
>>>>
>>>>  /// Model files allow definitions of functions to be lazily
>>>> constituted for functions
>>>>  /// which lack bodies in the original source code.  This allows the
>>>> analyzer
>>>>  /// to more precisely analyze code that calls such functions,
>>>> analyzing the
>>>>  /// artificial definitions (which typically approximate the semantics
>>>> of the
>>>>  /// called function) when called by client code.  These definitions are
>>>>  /// reconstituted lazily, on-demand, by the static analyzer engine.
>>>>
>>>> CodeInjector.h provides some information, but it is a bit vague:
>>>>
>>>> +///
>>>> +/// \file
>>>> +/// \brief Defines the clang::CodeInjector interface which is
>>>> responsible for
>>>> +/// injecting AST of function definitions from external source.
>>>> +///
>>>>
>>>> It's a bit unclear how this gets used.  I think a bit of prose here
>>>> would help clarify its role in the static analyzer.  I also think the
>>>> CodeInjector interface is also more abstract than the prose describes.
>>>>  There's nothing about CodeInjector's interface that requires the injected
>>>> definitions to come from an external source.  That's an implementation
>>>> detail of a concrete subclass.  Instead, all CodeInjector does is provide
>>>> an interface that lazily provides definitions for functions and methods
>>>> that may not be present in the original source.
>>>>
>>>
>>> I have added some further documentation to address these issues.
>>>
>>>
>>>
>>>>
>>>> I'm also looking at the change to
>>>> StaticAnalyzer/Frontend/FrontendActions.cpp, and wonder if we can simplify
>>>> things:
>>>>
>>>> +++ lib/StaticAnalyzer/Frontend/FrontendActions.cpp (working copy)
>>>> @@ -7,9 +7,11 @@
>>>>  //
>>>>
>>>>  //===----------------------------------------------------------------------===//
>>>>
>>>> +#include "clang/Frontend/CompilerInstance.h"
>>>>  #include "clang/StaticAnalyzer/Frontend/FrontendActions.h"
>>>> -#include "clang/Frontend/CompilerInstance.h"
>>>>  #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
>>>> +#include "clang/StaticAnalyzer/Frontend/ModelConsumer.h"
>>>> +#include "ModelInjector.h"
>>>>  using namespace clang;
>>>>  using namespace ento;
>>>>
>>>> @@ -18,6 +20,14 @@
>>>>    return CreateAnalysisConsumer(CI.getPreprocessor(),
>>>>                                  CI.getFrontendOpts().OutputFile,
>>>>                                  CI.getAnalyzerOpts(),
>>>> -                                CI.getFrontendOpts().Plugins);
>>>> +                                CI.getFrontendOpts().Plugins,
>>>> +                                new ModelInjector(CI));
>>>>  }
>>>>
>>>>
>>>>
>>>> It looks like CreateAnalysisConsumer just continues to grow more
>>>> arguments, all which derive from using 'CI'.  This seems silly, since this
>>>> function is called in one place.  Instead of intro ducting a dependency on
>>>> ModelInjector.h in this file, we can just sink these arguments into
>>>> CreateAnalysisConsumer() itself, resulting in:
>>>>
>>>>   return CreateAnalysisConsumer(CI);
>>>>
>>>> and let CreateAnalysisConsumer() do all that boilerplate.
>>>>
>>>
>>> That was my original idea as well but it broke the compilation of some
>>> code in extra repository and I wasn't sure if it is ok to break the API
>>> with this patch. But I find it cleaner this way so I modified it in this
>>> iteration.
>>>
>>>
>>>> Next, let's look at the change to FrontendAction:
>>>>
>>>>  class FrontendAction {
>>>> +  /// Is this action invoked on a model file? Model files are
>>>> incomplete
>>>> +  /// translation units that relies on type information from another
>>>> translation
>>>> +  /// unit. Check ParseModelFileAction for details.
>>>> +  bool ModelFile;
>>>>
>>>>
>>>> Perhaps "IsModelFile"?  "ModelFile" sounds like it should be a
>>>> reference to the file itself.
>>>>
>>>>    FrontendInputFile CurrentInput;
>>>>    std::unique_ptr<ASTUnit> CurrentASTUnit;
>>>>    CompilerInstance *Instance;
>>>> @@ -105,7 +109,11 @@
>>>>    /// @}
>>>>
>>>>  public:
>>>> -  FrontendAction();
>>>> +  /// \brief Constructor
>>>> +  ///
>>>> +  /// \param modelFile determines whether the source files this action
>>>> invoked
>>>> +  /// on should be treated as a model file. Defaults to false.
>>>> +  FrontendAction(bool modelFile = false);
>>>>
>>>>
>>>> It seems suboptimal to modify the interface of FrontendAction just for
>>>> this one edge case.  Instead of modifying the constructor arguments, we
>>>> could default initialize "IsModelFile" to false, and have a setter to
>>>> change it.  For example:
>>>>
>>>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *>
>>>> &Bodies)
>>>>     : ASTFrontendAction(/*ModelFile=*/true), Bodies(Bodies) {}
>>>>
>>>> becomes:
>>>>
>>>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *>
>>>> &Bodies)
>>>>     : Bodies(Bodies)  {
>>>>     IsModelFile = true;
>>>>   }
>>>>
>>>> Looking at this more, I wonder if we should modify FrontendAction at
>>>> all.  The only place where isModelParsingAction() is called is in one spot
>>>> in CompilerInstance.cpp:
>>>>
>>>>    if (hasSourceManager() && !Act.isModelParsingAction())
>>>>
>>>> It *might* be cleaner to just have a virtual member function in
>>>> FrontendAction, which defaults to returning false, but is generic for all
>>>> subclasses to override.  Then we don't need the "IsModelFile" field in
>>>> FrontendAction at all, and we just have ParseModelFileAction override that
>>>> single member function.  We could then name that method to be something a
>>>> bit more generic.  That would allow us to not touch FrontendAction at all
>>>> except for providing that single virtual method that can be overridden in
>>>> subclasses.  I somewhat prefer this approach because it provides a cleaner
>>>> separation of concerns between FrontendAction (which is defined
>>>> libFrontend) and the static analyzer.  That would also allow you to get rid
>>>> of isModelParsingAction() entirely (replacing it with something more
>>>> generic).
>>>>
>>>>
>>> You are right, it is much cleaner to use a virtual function, so I
>>> modified the patch to use that approach. The new virtual function has the
>>> same name because I have yet to find any better and more general name yet.
>>> Do you have an idea for a better name?
>>>
>>>
>>>> As for the test case:
>>>>
>>>> +typedef int* intptr;
>>>> +
>>>> +void modelled(intptr p);
>>>> +
>>>> +int main() {
>>>> + modelled(0);
>>>> + return 0;
>>>> +}
>>>>
>>>>
>>>> Please add some comments in this test file explaining what is
>>>> happening.  Also, it would be great if this both used FileCheck (which it
>>>> does now) but also verified the diagnostics so we get cross-checking of the
>>>> output (we see this in some analyzer tests).  It also makes it easier to
>>>> understand the test.
>>>>
>>>> Also, is there a reason to break up the tests
>>>> between model-suppress-falsepos.cpp and model-file.cpp?  It seems like one
>>>> test file will do fine; just clearly comment on what is happening for each
>>>> test.  I also recommend called the modeled function "modeledFunction"
>>>> instead of "modelled" (which according to my spell checker has an
>>>> additional 'l').
>>>>
>>>
>>> I have merged the test files and also added some commets to explain what
>>> is going on. I have fixed the misspelling as well. The nullpointer
>>> dereference is only checked through plist because the point where the
>>> comment with the expected warning should be added is inside the model file
>>> and it did not work for me if the comment was in a separate file. If there
>>> is a different way to verify the warnings that are in a separate file and I
>>> did not find it, please let me know.
>>>
>>>
>>>>
>>>> As for the model files themselves:
>>>>
>>>> Index: test/Analysis/modelled.model
>>>> ===================================================================
>>>> --- test/Analysis/modelled.model  (revision 0)
>>>> +++ test/Analysis/modelled.model  (working copy)
>>>> @@ -0,0 +1,3 @@
>>>> +void modelled(intptr p) {
>>>> + ++*p;
>>>> +}
>>>> \ No newline at end of file
>>>> Index: test/Analysis/notzero.model
>>>> ===================================================================
>>>> --- test/Analysis/notzero.model (revision 0)
>>>> +++ test/Analysis/notzero.model (working copy)
>>>>
>>>>
>>>> Let's put these in a separate subdirectory, for example, "models",
>>>> instead of mixing them with the tests.  This way they really serve as
>>>> "inputs" to the analyzer.
>>>>
>>>
>>> I have moved the model files to tests/Inputs/Models.
>>>
>>>
>>>>
>>>> Overall this is looking good.  I think the explanatory comments will
>>>> really help people understand what this is doing, and I think changing how
>>>> we thread the information through FrontendAction will help not introduce an
>>>> artificial tainting of FrontendAction with concepts specific to the static
>>>> analyzer.
>>>>
>>>> Cheers,
>>>> Ted
>>>>
>>>>
>>> Thanks,
>>> Gábor
>>>
>>>
>>>>
>>>> On Jul 16, 2014, at 2:45 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 14 July 2014 19:32, Anna Zaks <ganna at apple.com> wrote:
>>>>
>>>>>
>>>>> On Jul 13, 2014, at 6:11 AM, Gábor Horváth <xazax.hun at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Anna,
>>>>>
>>>>> Thank you for the review. I have tweaked the test, so it no longer
>>>>> requires the error reporting tweak that is not done yet to pass. I have
>>>>> also added some high level comments to some files, if you think some
>>>>> information is lacking I will add them in the next iteration as well. The
>>>>> BugReporter patch is now separated into a different patch.
>>>>>
>>>>>
>>>>> On 11 July 2014 18:02, Anna Zaks <ganna at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> For example, modeling functions should allow you to find bugs and
>>>>>> suppress false positives outside of those functions. I would suggest adding
>>>>>> a few of those tests first.
>>>>>>
>>>>>>
>>>>> How are the false positives suppressed? I did not find any resource on
>>>>> that. Found some analyzer attributes but I did not find them suitable for
>>>>> this purpuse at the first glance. But I think once the locations that are
>>>>> in a model file are omitted from the report path, the regular methods for
>>>>> suppressing false positives should work (and I will definitely add test
>>>>> case to ensure this once it is done).
>>>>>
>>>>>
>>>>> What I meant is that it is possible to construct a test where ability
>>>>> to model a function would eliminate a false positive. This would be another
>>>>> way to test your patch without worrying about BugReporter.
>>>>>
>>>>
>>>> I got it now, thansk. I have updated the patch with a test case where a
>>>> false positive case is eliminated by a model file.
>>>>
>>>> Thanks,
>>>> Gábor
>>>>
>>>>
>>>>> Thanks,
>>>>> Gábor
>>>>> <api_modeling.patch><bugreporter.patch>
>>>>>
>>>>>
>>>>>
>>>> <api_modeling.patch><bugreporter.patch>
>>>>
>>>>
>>>>
>>> <api_modeling.patch><bugreporter.patch>
>>>
>>>
>>>
>> <api_modeling.patch><bugreporter.patch>
>>
>>
>>
>
> Thanks,
> Gábor
> <api_modeling.patch><bugreporter.patch><clangTidy.patch>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140814/00bacfc3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: api_modeling.patch
Type: text/x-patch
Size: 44431 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140814/00bacfc3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugreporter.patch
Type: text/x-patch
Size: 1728 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140814/00bacfc3/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clangTidy.patch
Type: text/x-patch
Size: 775 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140814/00bacfc3/attachment-0002.bin>