<div dir="ltr"><div>I'm impressed.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);</div></blockquote><div> That looks like something I've dreamed about, thank you.</div><div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I don't know whether it's a good idea. Not only we have too little <br>

positive experience with array bound checking for now, but also there <br>

are other powerful tools to deal with buffer overflows, like ASan/MSan <br>

and, well, C++ containers. It's unclear to me how often would it be <br>

impossible</blockquote><div>I totally agree, and that is why I was looking into eg.: `ArrayBoundCheckerV2` and it's buggy behavior.</div><div><br></div><div>This problem raised by implementing taint warnings for the `CStringChecker`, which meant that it would warn for cases like I mentioned in my first email.<br></div><div>Which would result in false positives without any mechanism to silence them, since we can not tell the analyzer about the implicit assumptions which are not present in the code.<br><br></div><div>How should we ship this `analyzer_assert.h` with the analyzer?<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com">noqnoqneo@gmail.com</a>> ezt írta (időpont: 2020. márc. 16., H, 15:41):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Wait, so you're not trying to do this in a checker, but trying to <br>

introduce, like, a mechanism for the users to use or something? Ok, got <br>

it. Interesting, yeah.<br>

<br>

Well, you could always add a magic function like <br>

__clang_analyzer_getExtent() that would work exactly like <br>

clang_analyzer_getExtent() from the debug.ExprInspection checker but <br>

will be on by default and double-underscored so that to avoid <br>

conflicting with user code (note that defining double-underscored <br>

declarations outside of the standard library is UB). Then give them a <br>

header:<br>

<br>

   // analyzer_helper.h<br>

   #ifdef __clang_analyzer__<br>

     size_t __clang_analyzer_getExtent(void *x);<br>

     #define ANALYZER_ASSERT assert<br>

     #define ANALYZER_EXTENT(x) __clang_analyzer_getExtent(x)<br>

   #else<br>

     #define ANALYZER_ASSERT<br>

     #define ANALYZER_EXTENT(x)<br>

   #endif<br>

<br>

And then tell them to write code like this:<br>

<br>

   ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);<br>

<br>

I don't know whether it's a good idea. Not only we have too little <br>

positive experience with array bound checking for now, but also there <br>

are other powerful tools to deal with buffer overflows, like ASan/MSan <br>

and, well, C++ containers. It's unclear to me how often would it be <br>

impossible<br>

<br>

Another possible approach is to introduce annotations and teach the <br>

analyzer to understand them, eg.:<br>

<br>

   int g_user_function(<br>

     __attribute__((buffer_ptr("buff"))) int *buff,<br>

     __attribute__((buffer_size("buff"))) int size);<br>

<br>

Or like this:<br>

<br>

   struct BoundedBuff {<br>

     __attribute__((buffer_ptr("buff"))) int *buff;<br>

     __attribute__((buffer_size("buff"))) int size);<br>

   };<br>

<br>

That's slightly less flexible but also less ugly (when you hide those <br>

behind macros as well).<br>

<br>

 > I think that no '/annotations/' should introduce UB even if that<br>

 > assumption expressed via an annotation is turned out to be _invalid_.<br>

<br>

Well, that's a fairly common thing to happen when annotations are <br>

introduced for optimizations rather than for hardening. Say, if you <br>

violate __builtin_assume it's a UB. If you violate a `restrict` <br>

qualifier contract it's a UB. Even if you violate a `const` qualifier <br>

contract it's a UB. Interestingly, iirc in-language assertions in C++23 <br>

contracts are currently expected to introduce UB when violated. That <br>

said, i definitely agree with you in your case.<br>

<br>

<br>

On 3/16/20 5:06 PM, Balázs Benics wrote:<br>

> Hi Artem and Csaba,<br>

> Thank you for your response.<br>

><br>

> Sorry for the long description of the problem in my original email, <br>

> but I wanted to provide as much context as I could.<br>

> Provided that we have to analyze code like in that example, the <br>

> analyzer (correctly) assumes that the pointer points to an unknown <br>

> memory region.<br>

> However, the user knows that the function will be called with a valid <br>

> buffer, which would be capable of holding at least/most n elements. <br>

> For now, we can not tell this assumption to the analyzer. There is no <br>

> way in standard C++ to express such notion precisely. Asserts would <br>

> not be powerful enough, as I earlier stated.<br>

> So we have to tell this kind of assumption to the analyzer in a <br>

> different way. But in what way?<br>

><br>

> Some sort of annotation or special analyzer assert? I'm not sure, but <br>

> none of these look promising, really.<br>

><br>

> I think we should find a way to properly analyze the following two cases:<br>

><br>

>   * `int f_user_function(int *points_to_at_least_5_elements);`<br>

>     The user knows that this function must be called with a pointer<br>

>     pointing to an array capable of holding at least 5 elements.<br>

>     We should be able to tell this assumption to the analyzer, to<br>

>     analyze its body according to this assumption.<br>

><br>

>   * `int g_user_function(int *buff, int size);`<br>

>     There is a connection between the `buff`and `size`, which denotes<br>

>     similar properties that were described in the previous bullet<br>

>     point, but the analyzer would not know.<br>

><br>

> Regards Balazs.<br>

><br>

> Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a> <mailto:<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>>> ezt <br>

> írta (időpont: 2020. márc. 16., H, 3:08):<br>

><br>

>     I think you're looking for SymbolExtent. It is *the* symbol that<br>

>     denotes<br>

>     the [otherwise completely] unknown size of a region. The helper<br>

>     function<br>

>     for obtaining either a known extent or a SymbolExtent is currently<br>

>     known<br>

>     as getDynamicSize() (but i'd rather rename it back to "extent" - see<br>

>     also <a href="https://reviews.llvm.org/D69726" rel="noreferrer" target="_blank">https://reviews.llvm.org/D69726</a>).<br>

><br>

>     ArrayBoundChecker already has the code that you're looking for.<br>

><br>

>     On 3/12/20 3:20 PM, Balázs Benics via cfe-dev wrote:<br>

>     > Hi, checker devs<br>

>     ><br>

>     > TLDR:<br>

>     > How to constrain the size of unknown memory regions, eg. pointed by<br>

>     > 'raw' char pointers?<br>

>     ><br>

>     > longer version:<br>

>     > Working on taint analysis I'm facing with the following problem:<br>

>     ><br>

>     >     void strncpy_bounded_tainted_buffer(char *src, char *dst) {<br>

>     >       // assert(strlen(src) >= 10 && "src must have at leas 10<br>

>     elements");<br>

>     >       int n;<br>

>     >       scanf("%d", &n);<br>

>     >       if (0 < n && n < 10) {<br>

>     >         strncpy(dst, src, n); // Should we warn or not?<br>

>     >       }<br>

>     >     }<br>

>     ><br>

>     ><br>

>     ><br>

>     > In this example we analyze a function taking two raw pointers,<br>

>     so we<br>

>     > don't know how big those arrays are.<br>

>     > We will check the `strncpy` call, whether it will access /(read and<br>

>     > write)/ only valid memory.<br>

>     > We will check the pointers (src and dst) separately by checking if<br>

>     > /`/&src[0]` and `&src[n-1]` would be in bound of the memory region<br>

>     > pointed by the pointer. Since the analyzer don't know (both<br>

>     states are<br>

>     > non-null), we should check if the `length` parameter is tainted,<br>

>     and<br>

>     > if so, we should still issue a warning telling that "String copy<br>

>     > function might overflow the given buffer since untrusted data is<br>

>     used<br>

>     > to specify the buffer size."<br>

>     > Obviously, if the `length` parameter is not tainted, we will assume<br>

>     > (conservatively) that the access would be valid.<br>

>     ><br>

>     ><br>

>     > How should tell the analyzer that the array which is pointed by the<br>

>     > pointer holds at least/most N elements?<br>

>     > For example in the code above, express something similar via an<br>

>     > assertion, like saying that `src` points to a c-string, which<br>

>     has at<br>

>     > least 10 + 1 element underlying storage.<br>

>     > Although this assertion using `strlen` seems like a solution,<br>

>     > unfortunately not applicable for example to the `dst` buffer,<br>

>     which is<br>

>     > most likely uninitialized - so not a c-string, in other words<br>

>     calling<br>

>     > `strlen` would be undefined behavior.<br>

>     ><br>

>     > The only (hacky) option which came in my mind was to abuse the<br>

>     > standard regarding pointer arithmetic.<br>

>     ><br>

>     >     assert(&src[n] - &src[-1]);<br>

>     ><br>

>     > The standard is clear about that pointer arithmetic is only<br>

>     applicable<br>

>     > for pointers pointing to elements of the same array OR to a<br>

>     > hypothetical ONE past/before element of that array.<br>

>     > <a href="http://eel.is/c++draft/expr.add#4.2" rel="noreferrer" target="_blank">http://eel.is/c++draft/expr.add#4.2</a><br>

>     ><br>

>     > This assertion would be undefined behavior if the size of the array<br>

>     > pointed by `src` would be smaller than `n`.<br>

>     ><br>

>     > IMO this looks really ugly.<br>

>     > I think that no '/annotations/' should introduce UB even if that<br>

>     > assumption expressed via an annotation is turned out to be<br>

>     _invalid_.<br>

>     ><br>

>     ><br>

>     > What would be the right approach to guide (to constrain the size<br>

>     of a<br>

>     > memory region) the analyzer?<br>

>     > How can the analyzer inference such constraint?<br>

>     ><br>

>     > Thanks Balazs.<br>

>     ><br>

>     > _______________________________________________<br>

>     > cfe-dev mailing list<br>

>     > <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a> <mailto:<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>><br>

>     > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

><br>

<br>

</blockquote></div>