[cfe-dev] Proposal: Integrate static analysis test suites

Thu Dec 10 13:04:49 PST 2015

On Mon, Dec 7, 2015 at 9:50 PM, <Alexander G. Riccio> via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> First time Clang contributor here,
>
> I'd like to add the "C Test Suite for Source Code Analyzer v2", a
> relatively small test suite (102 cases/flaws), some of which Clang
> doesn't yet detect*. See link at bottom.
>
> Immediate questions:
> 0. Does the Clang community/project like the idea?

I've included a few other devs (CCed) to get further opinions.

I like the idea of being able to diagnose the issues covered by the
test suite, but I don't think including the test suite by itself is
particularly useful without that goal in mind. Also, one question I
would have has to do with the licensing of the tests themselves and
whether we would need to do anything special there.

> 1. What's the procedure for including new tests? (not the technical,
> but the community/project).

Getting the discussion going about the desired goal (as you are doing)
is the right first step.

> 2. How do I include failing tests without breaking things? Some of
> these tests will fail - that's why I'm proposing their inclusion - but
> they shouldn't yet cause the regression testing system to complain.

Agreed, any test cases that are failing would have to fail gracefully.
I assume that by failure, you mean "should diagnose in some way, but
currently does not". I would probably split the tests into two types:
one set of tests that properly diagnose the issue (can be checked with
FileCheck or -verify, depending on the kind of tests we're talking
about), and one set of tests where we do not diagnose, but want to see
them someday (which can be tested with expect-no-diagnostics, for
example). This way, we can ensure test cases continue to diagnose when
we want them to, and we can be alerted when new diagnostics start to
catch previously uncaught tests. This is assuming that it makes sense
to include all of the tests at once, which may not make sense in
practice.

> 3. How does Clang handle licensing of third party code? Some of these
> tests are clearly in the public domain (developed at NIST, says "in
> the public domain"), but others are less clearly licensed.

Oh look, you asked the same question I asked. ;-) If the tests are in
the public domain and clearly state as such, I think we can go ahead
and include them. If the other tests are not clearly licensed, we
should try to get NIST to clarify the license of them before
inclusion. Depending on the license, we may be able to include them
under their original license. If we cannot clarify the license, I
would guess that we simply should not include those tests as part of
our test suite. Note: I could be totally wrong, IANAL. :-)

> Should the community accept that testsuite, and I successfully add
> that test suite, then I'd like to step it up a bit, and include the
> "Juliet Test Suite for C/C++". "Juliet" is a huge test suite by the
> NSA Center for Assured Software & NIST's Software Assurance Metrics
> And Tool Evaluation project, which has 25,477 test cases (!!) for 118
> CWEs. I don't think any other open source compiler could compete with
> Clang after this. There's a ton of literature on the "Juliet" suite,
> and listing it here is not necessary.
>
> This project would be my first Clang contribution :)
>
> Personally, I'm interested in static analysis, and this is the first
> step in understanding & improving Clang's static analysis
> capabilities.
>
> I have some ideas on how to detect the currently undetected bugs, and
> I'm curious to see where things lead.

Adding the tests by themselves is not necessarily interesting to the
project unless they exercise the compiler in ways it's not currently
being exercised. So just having tests for the sake of having the tests
is not too useful (IMO). However, if the goal is to have the tests
because you would like to make efforts to have the compiler diagnose
their cases properly, that's far more interesting and a good reason to
bring in the tests.

One possible approach if you are interested in having the compiler
diagnose the cases is to bring the tests in one at a time. Start with
the initial batch of "these are diagnosed properly", then move on to
"this test is diagnosed properly because of this patch." Eventually
we'll get to the stage where all of the tests are diagnosed properly.

> Secondary questions:
> 1. How should I break the new tests up into patches? Should I just
> whack the whole 102 case suite into a single patch, or a bunch of
> smaller ones?

See comments above.

> 2. How does the Clang/LLVM static analysis testing infrastructure
> work? I'm going to have to figure this out myself anyways, but where
> should I start? Any tips on adding new tests?

http://clang-analyzer.llvm.org/checker_dev_manual.html

Another good place for some of these checkers may be clang-tidy, or
the compiler frontend itself. It's likely to depend on case-by-case
code patterns.

http://clang.llvm.org/extra/clang-tidy/

Thank you for looking into this!

~Aaron

>
> *If I remember correctly,
> https://samate.nist.gov/SRD/view_testcase.php?tID=149055 passes
> analysis without complaint. I manually spot checked a very small
> number of tests.
>
> "C Test Suite for Source Code Analyzer v2" (valid code):
> https://samate.nist.gov/SRD/view.php?tsID=101
> "C Test Suite for Source Code Analyzer v2" (invalid code):
> https://samate.nist.gov/SRD/view.php?tsID=100
>
> "Juliet Test Suite for C/C++" (files):
> https://samate.nist.gov/SRD/testsuites/juliet/Juliet_Test_Suite_v1.2_for_C_Cpp.zip
> "Juliet Test Suite for C/C++" (docs):
> https://samate.nist.gov/SRD/resources/Juliet_Test_Suite_v1.2_for_C_Cpp_-_User_Guide.pdf
>
>
> Sincerely,
> Alexander Riccio
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev