<div dir="ltr"><div dir="ltr">Hi Gábor,<div><br></div><div>Thanks for your feedback.</div><div><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>How does the content of the summaries you proposed compare to CppCheck's summaries? Is it the same or did you add/remove something?</div></div></div></blockquote><div>I consider the annotations in CppCheck valuable, so most of the proposed argument constraints are coming from there. The exception is the not-tainted argument constraint, that's new compared to CppCheck. Also, return value properties are not present in CppCheck summaries.</div><div>  </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"></div></blockquote><div>Being able to write summaries in a file that is not hardcoded into the compiler is nice. Would APINotes work for this purpose? </div></div></div></blockquote><div>My biggest concern with APINotes is that <a href="https://github.com/apple/llvm-project/blob/apple/master/clang/lib/Sema/SemaAPINotes.cpp">it modifies the AST with Sema</a> and then the whole compilation chain - including CodeGen - works on the modified AST. With summaries we'd like to describe complex information (e.g. relations between parameters and the return value) that is important only for static analysis and that may be totally irrelevant for CodeGen or other parts of the compiler. Another concern is that APINotes does not support C++.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>What would be the distribution model for those summaries? Would clang ship them? Would the library authors ship them along with their libraries? If the latter how would clang pick it up?</div></div></div></blockquote><div>In short term, summaries for stable libs like libc, libc++ and posix can be hard-coded in the source of StdLibraryFunctionsChecker. In long term, I think we should have a specified file format (.e.g YAML) in which we can describe these summaries. Summaries for stable libs (libc, libc++, posix) should be shipped with Clang. For extensiblity, (for library authors) probably we will need an option that specifies a directory from which we could read additional summary files. Similarly to what we already have with ModelInjector's `-model-path` config option.</div><div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div>What if multiple files have summaries for the same function? (In C it is not that hard to have a situation like this due to the lack of namespaces.)</div></div></div></blockquote><div>Option1: Give a run-time warning, so when we discover the situation we skip the summary and the analysis continues.</div><div>Option2: Give a run-time error and then stop the analysis, the user must provide unambiguous summaries.</div><div>I prefer Option1.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>How widespread is CppCheck's format? Should we support that? </div></div></div></blockquote><div>I think it is a CppCheck internal format. So, I'd rather not support a file format that is not controlled in LLVM/Clang. I think we should support only one file format which is easily extensible.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>While I'd really like to have summaries I think we should lower our expectations a bit here. The current constraint solver of the analyzer might not be powerful enough to handle non-trivial summaries of this form. I do believe it can provide some value but often those papers describing these summaries are using a much more powerful SMT solver as a backend.</div></div></div></blockquote><div>I agree that the range based solver is not powerful enough. But we have the more powerful Z3 there, unfortunately that is slow. If we have a summary that has preconditions then we must check in the caller whether that pre is satisfied before applying the summary, and this would require the use of Z3. There are techniques to cache and/or decrease the problem space for the solver, but probably this might not scale, I agree. Another option could be to apply the summary without checking the pre and once we found a bug only then would we check the path feasibility with Z3 (similarly to what we do today with inlining).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>The other concern is how to model checker specific pre- and post-conditions. Our checker API is not written in an SMT solver friendly way, so it would be fairly hard to synthesize those summaries for our checkers.</div></div></div></blockquote><div>I'd like to avoid checker specific pre/post conditions. I hope that the summaries could consist of local path constraints of the function and once the summary is applied then the checker could populate its own GDM based on that.</div><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Also, I do believe the analyzer could profit a lot from some classical dataflow analysis like mod/ref analysis. If we would annotate the arguments with the results of such an analysis it could help the analyzer to invalidate the right amount of state when evaluating a function call without a body.</div></div></div></blockquote><div>Yes, as Artem mentioned, we could get some of these annotations from the IR, e.g. by utilizing the Attributor framework.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>I think the previous plans with automatically generated summaries would already render the analyzer into a two-phase analysis. </div></div></div></blockquote><div>Not necessarily (I hope). The methods described in Godefroid's work generates the summaries on-demand when the symbolic execution first sees the functions.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>In case we are interested in infer style summaries I think we should first make our memory modeling based on separation logic. I have no idea how far we are from that at this point and how much work would it be. Also, I believe reasoning about such summaries requires specialized solvers that are much more powerful than the current range based one. </div></div></div></blockquote><div>Yes, indeed. That would mean developing a whole new/different kind of reasoning for summary generation and that might be too much to deal with. On the other hand, I've seen some very valuable thoughts in the paper about incorrectness logic, perhaps we could reuse some of those even in the current form of the CSA symbolic execution.</div><div><br></div><div>Gabor</div></div></div>