[cfe-dev] [analyzer][GSoC] Implementing a dataflow framework for the Clang Static Analyzer

Fri Mar 1 15:11:05 PST 2019

Hey!

First of all, yup, we're totally doing GSoC this year! I'll do my best
to finally force myself to update the open projects list soon with a
funny random idea that some folks around there are interested in :)
Now, also i'm alone these days since George changed jobs; Gabor, are
you interested in / capable of helping out again? 'Cause they'll most
likely only let me pick one student, while we already have a few
people looking into participating :)

Data flow... Well, this one's definitely wanted, but that's a big one,
definitely bigger than any of the projects that we've tried so far :)
I guess such project would aim at allowing data-flow checkers that
don't have to write down transfer functions for every kind of
statement in every checker, but instead rely on the engine to handle
common basic effects of statements. If done properly, it pretty much
means writing a new Static Analyzer, which is more complicated than
the old one because it also needs its state merge operations defined.

There may be "poor-man's" solutions to this, which would allow us to
have a bit of progress without developing too much complex machinery.
One such solution is to introduce careful (conservative) tracking of
dropped execution paths in the Static Analyzer; it would allow finding
bugs that require analysis of all paths as long as the Analyzer is
known to have explored the function in its entirety. In practice
that'd be a lot of dropped coverage (much more than a proper
solution), but it'd still probably find *something*, while allowing us
to re-use most of the infrastructure. This, of course, is a dead-end
in the long run - if we ever want to regain the lost coverage, we'd
have to start from scratch.

So my overall feel here is that i'd love to hear a specific proposal,
but i suspect that if you go for doing this properly, one summer would
only be enough to scratch the surface. If you want to have a feel of
what it'd take, i'd recommend poking our existing data flow analyses.
Eg., what would it take to remove my C++ "object under construction"
liveness hacks by improving live variables analysis? You might also
want to write a few analyses of your own and see if you can generalize
something out of them. Eg., if you write an analysis that checks
whether a function is pure, we can instantly use it to improve our
conservative evaluation of pure functions (eg., produce the same
return value symbol every time it's called with the same parameters
and skip invalidations - we may even try to do it across translation
units because such summary is easy to serialize).