[llvm-dev] [cfe-dev] [GSOC 2018] Information gathering

Wed Mar 7 19:29:38 PST 2018

On 06/03/2018 8:24 AM, Paul Semel wrote:
> Hi,
>
> Thanks for replying !
>
> On 03/02/2018 10:58 PM, Artem Dergachev wrote:
>> Hey, welcome!
>>
>> I'm curious about the unsequenced modification checker, is it 
>> something that I should have seen but missed for whatever reason? It 
>> might be useful, and I think I'm seeing why don't compiler warnings 
>> cover all cases, i.e. why the analyzer's path sensitivity would help 
>> here. But I can't answer until I see it :) -eg. on our Phabricator.
>>
>
> So.. I uploaded the checker on Phabricator !

Yay! I'll comment with my thoughts on this, so that you could polish it 
when you have time.

Note that this shouldn't necessarily have anything to do with GSoC - 
we're accepting code in all seasons :)

> Please keep in mind that it was for me a proof of concept, and I 
> didn't have in mind to purpose this patch at the time I was 
> developping it (and didn't have the time to improve it for the moment, 
> as I am currently working on a structure pretty printing builtin - 
> https://reviews.llvm.org/D44093).
>
> For the moment, this checker is not able to detect all the unsequenced 
> modifications, but can detect things like this :
>
> ```c
> static int a = 0;
>
> int foo(void)
> {
>   return a++;
> }
>
> int main(void)
> {
>   int res = a++ + foo();
>   return res;
> }
> ```

This sounds like, for once, a bug that the analyzer might be really good 
at finding, and the check isn't going to be super loud, which makes me 
quite excited about this check.

> So here is the link on Phabricator : https://reviews.llvm.org/D44154
>
>> We are currently having two confirmed mentors for the Analyzer for 
>> now (me and George), so we'd most likely be able to mentor one 
>> student each, for two projects, and it'd likely be the two projects 
>> we proposed - unless someone proposes something really interesting. 
>> And already two fairly motivated students have shown up here in the 
>> mailing lists, but this shouldn't stop you from posting your own 
>> proposal here in cfe-dev (most of the analyzer contributors aren't 
>> actively scanning llvm-dev, as far as I know).
>>
>> I don't know much about the binutils replacement project; someone 
>> else should reply on that one.
>>
>
> Sure, I would really like to have some other info on this one ! Maybe 
> you know someone I could had in cc of this thread ? 🙂

Sorry, I'm completely out of topic on that one. This project has two 
assigned mentors, as mentioned in 
http://llvm.org/OpenProjects.html#replace_binary_utilities - you might 
try to contact them directly in case they accidentally missed your mail.

>
>> A couple of words about the use-after-free-like checker for values 
>> managed by temporary objects (mostly strings) that go out of scope. 
>> Because internals of std::string and other similar classes are too 
>> hard for the analyzer's generic use-after-free checker to understand 
>> (mostly due to how hard it is to track STL's internal invariants, and 
>> how not all of the code is necessarily present in the header), an 
>> API-specific checker seems to be necessary. The original plan we've 
>> had in mind was to keep track of dangerous values like str.c_str() in 
>> the program state (similarly to how SimpleStreamChecker tracks file 
>> descriptors) and then see if any of them are still present in memory 
>> at the end of the original value's lifetime (similarly to how 
>> StackAddrEscape checker finds stack pointers at the end of a 
>> function's stack frame).
>>
>
> Ok I think that I understand the idea. So the idea is that this 
> checker will be an API that will permit to track those invariants (and 
> we will use this API to track str.c_str()).
> Am I right ?

No-no, i mean that .c_str() is a (part of) certain API :) ...and we want 
see if it's used correctly. But in order to do that, we don't want to 
understand how it works in a particular implementation of, say, C++ 
standard library. Instead, we know how it is supposed to work, and 
encode part of this knowledge about this API into the analyzer so that 
it could find misused of it. Eg., we don't care what exact value is 
returned by .c_str() and how exactly it is allocated or deleted. The 
only thing we care about is that we shouldn't keep it around after the 
string is destroyed. In this sense, the checker is API-specific: it 
works by knowing about a particular API, not through generic knowledge 
of the language. Similarly, SimpleStreamChecker doesn't want to know 
what it takes to open a file: it only knows that the file that was 
opened must also be closed. For this checker it's more realistic to 
fully understand how the API works internally, but still hard. Just in 
case, i'm mentioning SimpleStreamChecker because it's essentially an 
example/hello-world checker described in a very detailed manner in 
https://youtu.be/kdxlsP5QVPw (totally recommended).

>> The unknowns here include how easy would it be to track scopes (for 
>> now we only track function scopes, but if fairly old but recently 
>> reincarnated patches [1] and [2] land any time soon, we may get a 
>> much better granularity), how easy would it be to track objects when 
>> they are moved or lifetime-extended by binding to references, which 
>> was a large problem for other C++ object checkers, but we may work 
>> our way around it to some extent (or do it properly, depending on my 
>> current work outlined in [3] and in follow-up mails in February), and 
>> also how helpful inlining would be (eg. would we be able to 
>> automagically support string_view-like classes by inlining their 
>> methods?). So the checker would need an almost indefinite amount of 
>> incremental improvements once the initial prototype is done, some of 
>> which must be fairly curious and would certainly expose you to some 
>> of the analyzer's internals.
>>
>>
>
> Wow. This project sounds really cool, it's really too bad that there 
> is already two students on this project.
>
>> On 01/03/2018 11:43 AM, Paul Semel via cfe-dev wrote:
>>> Hey,
>>>
>>> On 02/20/2018 11:51 PM, Paul Semel wrote:
>>>> Hello,
>>>>
>>>>
>>>> I'm Paul Semel, a French student in computer science. I am 
>>>> currently in my 4th year (1st year of graduate school) at EPITA and 
>>>> enrolled in the system and security laboratory of the school.
>>>>
>>>> I would be very interested in working on a LLVM project during this 
>>>> GSoC. Implementing a PoC for an unsequenced modification checker in 
>>>> CSA helped me discover LLVM. However, I would like to dive deeper 
>>>> in this project.
>>>>
>>>> I've seen some of the proposals, and I would like to ask a few 
>>>> questions about two of those.
>>>>
>>>> As you might have guessed, I have some interest in the checker for 
>>>> dangling string pointers :
>>>>
>>>> - Do you think it would help if I kept working on improving my 
>>>> unsequenced modification checker to get more familiar with Clang 
>>>> Static Analyzer ?
>>>>
>>>> I'm also interested in the command line replacements for GNU 
>>>> Binutils :
>>>>
>>>> - What tools would you like to replace in priority ?
>>>> - Does this subject imply to add options/features to some of the 
>>>> tools, or is it only about handling command line ?
>>>>
>>>> Thank you very much,
>>>>
>>>>
>>>
>>> Adding cfe-dev..
>>>
>>> Regards,
>>>
>>
>
> By the way, if you have some free time, I would really appreciate to 
> have some advices on a better way to do the unsequenced modification 
> checker. 🙂
>
>
> Thanks,
>