<div dir="ltr"><div dir="ltr">Hi Artem,<div><br></div><div>Thanks for your feedback.</div><div><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">When doing a file format for summaries, please try to somehow coordinate <br>
with the current work on taint checker yaml files. I don't want multiple <br>
incompatible formats to be circulating around :)<br></blockquote><div>Sure, I am in sync with Balazs :) </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">One of the fresher thoughts that i've had recently on automatically <br>
generated summaries was re-using The Attributor <br>
(<a href="https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech24" rel="noreferrer" target="_blank">https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech24</a>) to produce <br>
function summaries that the static analyzer can import and rely upon.<br></blockquote><div>Yes, absolutely. I consider this approach very valuable. I think we should continue the work towards this direction after we are done with the extension of StdLibraryFunctionsChecker and before starting to experiment with Godefroid's methods.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
If you go for automatically generated summaries, please keep in mind the <br>
following potential problems:<br>
<br>
(a) We do not want checker API to become significantly more complicated <br>
than it currently is. It is very likely that generating checker-specific <br>
summaries of a function would require very non-trivial checker-side <br>
support that'll be too hard for a typical checker developer to handle. <br>
Please talk to me once you get there; i'll try to throw in a few <br>
examples to see if your approach is lightweight and general enough.<br></blockquote><div>I'd like to avoid checker specific pre/post conditions. I hope that the summaries could consist of local path constraints of the function and once the summary is applied then the checker could populate its own GDM based on that.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
(b) Please understand the problem of "inlined defensive checks" that we <br>
have with inlining-based IPA. Basically, when modeling a function call, <br>
you need to have a good intuition about whether multiple branches within <br>
the function imply multiple branches in the caller context. Like, it's <br>
fine to warn that passing a null pointer into foo() causes a null <br>
dereference, but it's not fine to emit null dereference warnings upon <br>
dereferencing a pointer simply because that pointer was previously <br>
passed into function bar() that checked it for null but has long exited <br>
since then. Most of the time the ideal solution is to have exactly one <br>
branch in the summary. Functions like `isalpha` are a rare exception. <br>
Most of the time the state split is not justified because there's no <br>
indication on the caller side that a certain callee branch is <br>
necessarily taken, so we should not keep these assumptions in the state <br>
after the function has exited, so they shouldn't be part of the summary.<br></blockquote><div>Ok, this seems to be an interesting problem. I think if we have a summary for bar() then after we applied that summary then that becomes part of the path constraint and that constraint should be checked before we apply the summary for foo(). This implies that we should have a proper precondition for foo() as well, which is not clear for me how could we deduce that pre automatically. Also, perhaps we could apply the summary without checking the pre and once we found a bug only then would we check the path feasibility with Z3 (similarly to what we do today with inlining).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
In other news - did you ever consider using ASTImporter for generating <br>
function body ASTs from hand-written code? Like BodyFarms but without <br>
constructing the AST by hand. I imagine that the advantage here is that <br>
the solution is more reliable than ASTImporter-based CTU because you can <br>
easily make sure that a certain hand-written code under your control is <br>
always imported correctly (as opposed to arbitrary code written by the <br>
users). This is probably not that hard to implement and that'd speed up <br>
our progress on supporting functions that generally benefit from body <br>
farms, such as atomic intrinsics. I.e.:<br>
<br>
const Stmt *CompareAndSwapBody = ASTImporter.import("""<br>
bool compare_and_swap(int *p, int old, int new) {<br>
if (*p != old)<br>
return false;<br>
*p = new;<br>
return true;<br>
}<br>
""");<br>
<br>
?<br></blockquote><div>Ok, this could be another approach, but very similar to the work that have been done by Gabor(xazax) in ModelInjector and by <a href="https://reviews.llvm.org/D13731">this patch </a>which builds on that. However, I see a few concerns: (1) This requires that summaries are syntactically valid C/C++ code, thus we must use __attributes__ or [[attributes]]. Meaning we must extend the generic Attr.td very often with analyzer exclusive data, and I think we'd have a hard time to get those changes accepted there. Also I am not sure if we could express easily everything we want to with these attributes, e.g. what about relations between two arguments? (2) In the past some developers were strictly <a href="http://lists.llvm.org/pipermail/cfe-dev/2015-December/046378.html">against this approach</a> . (3) A DeclContext can be set up to have an external source and then later this source could be queried to extend the list of the contained Decls. This mechanism is used in LLDB when we parse an expression. There we import everything into the context of the expression. But in the approach you mention, the direction of the import would be reversed, and I see some technical difficulties in that. I am not sure if ASTImporter and DeclContext is prepared at the moment to realize this unless the summary code is a self contained code (meaning it can be parsed in it's own).</div><div>So, I'd rather strive for the Attributor/IR based approach after we pimped up StdLibraryFunctionsChecker (and after we came up with a nice YAML format or something).</div><div><br></div><div>Thanks,</div><div>Gabor</div></div></div>