[llvm-bugs] [Bug 51036] New: [analyzer] integral casts are not modeled correctly

Fri Jul 9 01:30:50 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=51036

            Bug ID: 51036
           Summary: [analyzer] integral casts are not modeled correctly
           Product: clang
           Version: 12.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Static Analyzer
          Assignee: dcoughlin at apple.com
          Reporter: balazs.benics at sigmatechnology.se
                CC: dcoughlin at apple.com, llvm-bugs at lists.llvm.org

I've triaged a couple of false-positive reports and I observed something
interesting.
Check this example: https://godbolt.org/z/vb1Y5z6GE

  #include <cstdio>

  void clang_analyzer_printState() {}

  void test_single(signed char c) {
      if ((unsigned int)c <= 200) {
          // 'c' supposed to be restricted to be non-negative.

          // The associated value range is not even close to the expected
range.
          // 'c' is assumed to be within [-128,-56]
          // but it should be within [0,127].
          clang_analyzer_printState();

          // In fact, the execution of the program prints non-negative numbers.
          printf("%d\n", (int)c);
      }
  }

  #ifdef WITH_MAIN

  int main() {
    // All possible signed char values are tested.
    for (int i = -128; i <= 127; ++i) {
      test_single(i);
    }
  }

  #endif

In the example, we can see that the runtime behavior of the code differs from
the abstract machine that we are symbolically model.
As you can see, any bugreports in that path that depend on the value 'c' are
inherently false-positives.

It can materialize in false-positives by ArrayBound V1 and V2 checkers, but
supposedly any other checkers could also suffer from this:

  buf[(unsigned int)c]

At the indexing operation, the analyzer is sure that the 'index' is *negative*,
thus accessing an element prior to the first valid element of the array.
While the user sees that the index expression is of an *unsigned* type, so they
are confused with a reason.

AFAIK all released clang versions are affected by this behavior, according to
Compiler Explorer.

I can understand that 'omitting' symbolic integral casts were a /good/ idea
when we had a /dumb/ constraint solver.
That was capable of looking up constraints for symbolic expressions in a
*verbatim* manner.
However, this is no longer true. The EQ classes and other witty stuff made our
solver *much* smarter.
And the /new/ non-fixpoint-iterative simplifier by Gabor Marton is just making
it even more so.
BTW it's fascinating to see where it goes.

I think the way forward would be to emit symbolic casts whenever necessary and
see through them in the solver.
For the record: Denys proposed several changes in this area, simplifying the
representation of widening and narrowing symbolic cast chains, but I think
that's a slightly different topic.
That wouldn't immediately arm the solver with the necessary capabilities to see
through the casts AFACT.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210709/c83490e1/attachment.html>