[cfe-dev] [analyzer][GSoC] Implementing a dataflow framework for the Clang Static Analyzer

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Sat Mar 2 06:50:43 PST 2019


Hi!

I think I could have the same level of commitment I had last summer. If you
find that sufficient, I am happy to help with the mentoring!

I do agree with Artem, if you just started to familiarize yourself with the
topic, I would suggest starting with implementing some algorithms from
scratch (or tweak existing ones). While improving existing algorithms is
always nice, and it is good to get rid of some technical debt, it is
sometimes harder to write a good report in the end.

Some random ideas that I would love to see implemented:
* Points to analysis would have a number of uses like more precise loop
widening, call graph construction, ability to experiment with other
invalidation schemes
* Static backward slicing could help us shorten bug paths and filter events
* For some checks even just a set of dominators/post dominators for each
basic block would be useful, see Kristof's uninitialized value check's
"guarded member" heuristic
* Maybe some better measure of function complexity for the analyzer's
inlining decision? As far as I remember we use the number of basic blocks
now.
* Maybe some static edge frequency estimate for the CFG to be able to
experiment with other exploration strategies? Taking the most or least
frequently taken edge leading to an unvisited node sounds like a fun
experiment, but nothing guarantee that the end result will actually be
useful.

I think it is also ok to write multiple proposals, but you should aim for
quality over quantity.

Regards,
Gabor

On Sat, 2 Mar 2019 at 00:11, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> Hey!
>
> First of all, yup, we're totally doing GSoC this year! I'll do my best
> to finally force myself to update the open projects list soon with a
> funny random idea that some folks around there are interested in :)
> Now, also i'm alone these days since George changed jobs; Gabor, are
> you interested in / capable of helping out again? 'Cause they'll most
> likely only let me pick one student, while we already have a few
> people looking into participating :)
>
> Data flow... Well, this one's definitely wanted, but that's a big one,
> definitely bigger than any of the projects that we've tried so far :)
> I guess such project would aim at allowing data-flow checkers that
> don't have to write down transfer functions for every kind of
> statement in every checker, but instead rely on the engine to handle
> common basic effects of statements. If done properly, it pretty much
> means writing a new Static Analyzer, which is more complicated than
> the old one because it also needs its state merge operations defined.
>
> There may be "poor-man's" solutions to this, which would allow us to
> have a bit of progress without developing too much complex machinery.
> One such solution is to introduce careful (conservative) tracking of
> dropped execution paths in the Static Analyzer; it would allow finding
> bugs that require analysis of all paths as long as the Analyzer is
> known to have explored the function in its entirety. In practice
> that'd be a lot of dropped coverage (much more than a proper
> solution), but it'd still probably find *something*, while allowing us
> to re-use most of the infrastructure. This, of course, is a dead-end
> in the long run - if we ever want to regain the lost coverage, we'd
> have to start from scratch.
>
> So my overall feel here is that i'd love to hear a specific proposal,
> but i suspect that if you go for doing this properly, one summer would
> only be enough to scratch the surface. If you want to have a feel of
> what it'd take, i'd recommend poking our existing data flow analyses.
> Eg., what would it take to remove my C++ "object under construction"
> liveness hacks by improving live variables analysis? You might also
> want to write a few analyses of your own and see if you can generalize
> something out of them. Eg., if you write an analysis that checks
> whether a function is pure, we can instantly use it to improve our
> conservative evaluation of pure functions (eg., produce the same
> return value symbol every time it's called with the same parameters
> and skip invalidations - we may even try to do it across translation
> units because such summary is easy to serialize).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190302/11e83cac/attachment.html>


More information about the cfe-dev mailing list