[cfe-dev] [GSOC] Static Analyzer infrastructure improvement, BodyFarm

Wed Mar 12 04:43:31 PDT 2014

Hi Ted,

I am glad that you could mentor me. I agree that the first option may be a
too ambitious for this summer, so I would like to work on the second one
for this summer. I like that approach, where one could write the model as
source code. However I think that in the long term the textual
representation of models may not be appropriate if we would like to use
this feature as a foundation for summary based analysis. I think it would
be very useful to have an API to serialize some parts of the AST of a
translation unit to a model file, that can be loaded later. That API could
be utilized by a future project that implements two-phased analysis.

Should we discuss the possible milestones and the scope privately or it is
better to discuss it here on the mailing list? Am I supposed to define
those milestones myself? Sorry if the answers are obvious, this is the
first time I'm attending to GSoC.

Thanks,
Gábor

On 12 March 2014 08:09, Ted Kremenek <kremenek at apple.com> wrote:

> Hi Gábor,
>
> Extending BodyFarm to model a wide variety of APIs would be quite useful.
>  Even with the lack of cross translation unit analysis, there will also be
> a set of core APIs whose source is unavailable when analyzing a project.
>  Having good models for those APIs could be invaluable for specific
> contexts.  Moreover, the synthesized body can be optimized more for the
> task of static analysis, and less on the actual implementation details
> which can contribute additional complexity for the analyzer to reason about.
>
> With the two-phase analysis, what you are basically suggesting is that we
> implement summary-based analysis using BodyFarm.  That's an interesting
> approach, and its one we have privately discussed in the past.  One
> advantage of that approach is that you could possibly generate models from
> one codebase (e.g., some library providing an API) and then use that model
> for analyzing other codebases.  Summary-based analysis runs into possible
> limitations here when you need to iterate on a fixed point to generate the
> summary, or the summary needs to be context-sensitive, but those are
> refinements we can explore over time.
>
> More generally, having a good way to get models into BodyFarm without hand
> coding AST construction logic would be useful.  Thus I see two possible
> (complimentary) projects:
>
> 1. Implement a two-phased analysis, like you suggest, where models are
> created automatically from analysis and fed into subsequent analysis
> passes.  This would require defining a fair amount of infrastructure to
> generate models, save and read them from disk, and the tooling support
> needed to do the two-phase analysis.
>
> 2. Provide an easy way for people to author models.  A natural approach
> would be that someone could write the model in source code and its AST gets
> turned into a pre-baked model.  This could be something as simple as
> writing a dumper that translates AST elements into the current AST
> construction logic in BodyFarm, or some other canned representation we can
> load from disk (pre-baked ASTs are a little tricky to just load within an
> existing AST for the translation unit we are analyzing).
>
> Both of these are fantastic projects.  I'm a bit concerned that #1 may be
> a bit ambitious for a single GSoC project, and #2 provides much of the
> infrastructure needed for #1 but can be broken down into smaller pieces
> that are incremental and general goodness.  One attractive thing about #2
> is that it possible could allow us to remove all the existing BodyFarm
> models that are hand-coded in Clang itself and just marshal them in from
> model files.
>
> I'd be happy to mentor you on this project.  The key, however, would be to
> identify incremental milestones and scope the project out so that it could
> be feasible to achieve reasonable progress over a GSoC period.
>
> Cheers,
> Ted
>
> On Mar 11, 2014, at 12:25 AM, Gábor Horváth <xazax.hun at gmail.com> wrote:
>
> > Hello clang-devel,
> >
> > I am a student, and I would like to participate in Google Summer of Code
> 2014. I am mainly interested in the Static Analyzer and I would like to
> make infrastructural improvements. I have some experience in writing
> checkers and other clang based tools. Right now I am an intern at a company
> where my job is to implement checkers to verify that wether their code
> follows their design rules.
> >
> > In my work one of my biggest obstacle was that, the static analyzer
> lacks the ability of cross translation unit analysis. This is the reason
> why I am very interested in BodyFarm.
> >
> > There is an open project to model standard library functions using
> BodyFarm to make the analysis more precise. I would like to help to make
> BodyFarm working.
> >
> > Furthermore I would like to make BodyFarm something more general.
> Coverity is doing it's analysis in two steps. First it builds a model, and
> than it uses that model when it does the analysis. I want to make it
> possible for checker writers to do a preliminary run to collect some
> definitions that can be used during the analysis. This would provide
> checker writers with some limited cross translation unit support which
> would be a great improvement in my opinion.
> >
> > What are your opinions? Is there someone who willing to mentor this
> project? What are the chances it will get accepted?
> >
> > Thanks in advance,
> > Gábor Horváth
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140312/583fb7a0/attachment.html>