<div dir="ltr"><div>I'm impressed.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);</div></blockquote><div> That looks like something I've dreamed about, thank you.</div><div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I don't know whether it's a good idea. Not only we have too little <br>
positive experience with array bound checking for now, but also there <br>
are other powerful tools to deal with buffer overflows, like ASan/MSan <br>
and, well, C++ containers. It's unclear to me how often would it be <br>
impossible</blockquote><div>I totally agree, and that is why I was looking into eg.: `ArrayBoundCheckerV2` and it's buggy behavior.</div><div><br></div><div>This problem raised by implementing taint warnings for the `CStringChecker`, which meant that it would warn for cases like I mentioned in my first email.<br></div><div>Which would result in false positives without any mechanism to silence them, since we can not tell the analyzer about the implicit assumptions which are not present in the code.<br><br></div><div>How should we ship this `analyzer_assert.h` with the analyzer?<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> ezt írta (időpont: 2020. márc. 16., H, 15:41):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Wait, so you're not trying to do this in a checker, but trying to <br>
introduce, like, a mechanism for the users to use or something? Ok, got <br>
it. Interesting, yeah.<br>
<br>
Well, you could always add a magic function like <br>
__clang_analyzer_getExtent() that would work exactly like <br>
clang_analyzer_getExtent() from the debug.ExprInspection checker but <br>
will be on by default and double-underscored so that to avoid <br>
conflicting with user code (note that defining double-underscored <br>
declarations outside of the standard library is UB). Then give them a <br>
header:<br>
<br>
// analyzer_helper.h<br>
#ifdef __clang_analyzer__<br>
size_t __clang_analyzer_getExtent(void *x);<br>
#define ANALYZER_ASSERT assert<br>
#define ANALYZER_EXTENT(x) __clang_analyzer_getExtent(x)<br>
#else<br>
#define ANALYZER_ASSERT<br>
#define ANALYZER_EXTENT(x)<br>
#endif<br>
<br>
And then tell them to write code like this:<br>
<br>
ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);<br>
<br>
I don't know whether it's a good idea. Not only we have too little <br>
positive experience with array bound checking for now, but also there <br>
are other powerful tools to deal with buffer overflows, like ASan/MSan <br>
and, well, C++ containers. It's unclear to me how often would it be <br>
impossible<br>
<br>
Another possible approach is to introduce annotations and teach the <br>
analyzer to understand them, eg.:<br>
<br>
int g_user_function(<br>
__attribute__((buffer_ptr("buff"))) int *buff,<br>
__attribute__((buffer_size("buff"))) int size);<br>
<br>
Or like this:<br>
<br>
struct BoundedBuff {<br>
__attribute__((buffer_ptr("buff"))) int *buff;<br>
__attribute__((buffer_size("buff"))) int size);<br>
};<br>
<br>
That's slightly less flexible but also less ugly (when you hide those <br>
behind macros as well).<br>
<br>
> I think that no '/annotations/' should introduce UB even if that<br>
> assumption expressed via an annotation is turned out to be _invalid_.<br>
<br>
Well, that's a fairly common thing to happen when annotations are <br>
introduced for optimizations rather than for hardening. Say, if you <br>
violate __builtin_assume it's a UB. If you violate a `restrict` <br>
qualifier contract it's a UB. Even if you violate a `const` qualifier <br>
contract it's a UB. Interestingly, iirc in-language assertions in C++23 <br>
contracts are currently expected to introduce UB when violated. That <br>
said, i definitely agree with you in your case.<br>
<br>
<br>
On 3/16/20 5:06 PM, Balázs Benics wrote:<br>
> Hi Artem and Csaba,<br>
> Thank you for your response.<br>
><br>
> Sorry for the long description of the problem in my original email, <br>
> but I wanted to provide as much context as I could.<br>
> Provided that we have to analyze code like in that example, the <br>
> analyzer (correctly) assumes that the pointer points to an unknown <br>
> memory region.<br>
> However, the user knows that the function will be called with a valid <br>
> buffer, which would be capable of holding at least/most n elements. <br>
> For now, we can not tell this assumption to the analyzer. There is no <br>
> way in standard C++ to express such notion precisely. Asserts would <br>
> not be powerful enough, as I earlier stated.<br>
> So we have to tell this kind of assumption to the analyzer in a <br>
> different way. But in what way?<br>
><br>
> Some sort of annotation or special analyzer assert? I'm not sure, but <br>
> none of these look promising, really.<br>
><br>
> I think we should find a way to properly analyze the following two cases:<br>
><br>
> * `int f_user_function(int *points_to_at_least_5_elements);`<br>
> The user knows that this function must be called with a pointer<br>
> pointing to an array capable of holding at least 5 elements.<br>
> We should be able to tell this assumption to the analyzer, to<br>
> analyze its body according to this assumption.<br>
><br>
> * `int g_user_function(int *buff, int size);`<br>
> There is a connection between the `buff`and `size`, which denotes<br>
> similar properties that were described in the previous bullet<br>
> point, but the analyzer would not know.<br>
><br>
> Regards Balazs.<br>
><br>
> Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a> <mailto:<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>> ezt <br>
> írta (időpont: 2020. márc. 16., H, 3:08):<br>
><br>
> I think you're looking for SymbolExtent. It is *the* symbol that<br>
> denotes<br>
> the [otherwise completely] unknown size of a region. The helper<br>
> function<br>
> for obtaining either a known extent or a SymbolExtent is currently<br>
> known<br>
> as getDynamicSize() (but i'd rather rename it back to "extent" - see<br>
> also <a href="https://reviews.llvm.org/D69726" rel="noreferrer" target="_blank">https://reviews.llvm.org/D69726</a>).<br>
><br>
> ArrayBoundChecker already has the code that you're looking for.<br>
><br>
> On 3/12/20 3:20 PM, Balázs Benics via cfe-dev wrote:<br>
> > Hi, checker devs<br>
> ><br>
> > TLDR:<br>
> > How to constrain the size of unknown memory regions, eg. pointed by<br>
> > 'raw' char pointers?<br>
> ><br>
> > longer version:<br>
> > Working on taint analysis I'm facing with the following problem:<br>
> ><br>
> > void strncpy_bounded_tainted_buffer(char *src, char *dst) {<br>
> > // assert(strlen(src) >= 10 && "src must have at leas 10<br>
> elements");<br>
> > int n;<br>
> > scanf("%d", &n);<br>
> > if (0 < n && n < 10) {<br>
> > strncpy(dst, src, n); // Should we warn or not?<br>
> > }<br>
> > }<br>
> ><br>
> ><br>
> ><br>
> > In this example we analyze a function taking two raw pointers,<br>
> so we<br>
> > don't know how big those arrays are.<br>
> > We will check the `strncpy` call, whether it will access /(read and<br>
> > write)/ only valid memory.<br>
> > We will check the pointers (src and dst) separately by checking if<br>
> > /`/&src[0]` and `&src[n-1]` would be in bound of the memory region<br>
> > pointed by the pointer. Since the analyzer don't know (both<br>
> states are<br>
> > non-null), we should check if the `length` parameter is tainted,<br>
> and<br>
> > if so, we should still issue a warning telling that "String copy<br>
> > function might overflow the given buffer since untrusted data is<br>
> used<br>
> > to specify the buffer size."<br>
> > Obviously, if the `length` parameter is not tainted, we will assume<br>
> > (conservatively) that the access would be valid.<br>
> ><br>
> ><br>
> > How should tell the analyzer that the array which is pointed by the<br>
> > pointer holds at least/most N elements?<br>
> > For example in the code above, express something similar via an<br>
> > assertion, like saying that `src` points to a c-string, which<br>
> has at<br>
> > least 10 + 1 element underlying storage.<br>
> > Although this assertion using `strlen` seems like a solution,<br>
> > unfortunately not applicable for example to the `dst` buffer,<br>
> which is<br>
> > most likely uninitialized - so not a c-string, in other words<br>
> calling<br>
> > `strlen` would be undefined behavior.<br>
> ><br>
> > The only (hacky) option which came in my mind was to abuse the<br>
> > standard regarding pointer arithmetic.<br>
> ><br>
> > assert(&src[n] - &src[-1]);<br>
> ><br>
> > The standard is clear about that pointer arithmetic is only<br>
> applicable<br>
> > for pointers pointing to elements of the same array OR to a<br>
> > hypothetical ONE past/before element of that array.<br>
> > <a href="http://eel.is/c++draft/expr.add#4.2" rel="noreferrer" target="_blank">http://eel.is/c++draft/expr.add#4.2</a><br>
> ><br>
> > This assertion would be undefined behavior if the size of the array<br>
> > pointed by `src` would be smaller than `n`.<br>
> ><br>
> > IMO this looks really ugly.<br>
> > I think that no '/annotations/' should introduce UB even if that<br>
> > assumption expressed via an annotation is turned out to be<br>
> _invalid_.<br>
> ><br>
> ><br>
> > What would be the right approach to guide (to constrain the size<br>
> of a<br>
> > memory region) the analyzer?<br>
> > How can the analyzer inference such constraint?<br>
> ><br>
> > Thanks Balazs.<br>
> ><br>
> > _______________________________________________<br>
> > cfe-dev mailing list<br>
> > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a> <mailto:<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>><br>
> > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
><br>
<br>
</blockquote></div>