[all-commits] [llvm/llvm-project] 744745: [analyzer] Add failing test case demonstrating bug...

Balazs Benics via All-commits all-commits at lists.llvm.org
Mon Feb 14 07:57:03 PST 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 744745ae195f0997e5bfd5aa2de47b9ea156b6a6
      https://github.com/llvm/llvm-project/commit/744745ae195f0997e5bfd5aa2de47b9ea156b6a6
  Author: Balazs Benics <balazs.benics at sigmatechnology.se>
  Date:   2022-02-14 (Mon, 14 Feb 2022)

  Changed paths:
    M clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
    A clang/test/Analysis/taint-checker-callback-order-has-definition.c
    A clang/test/Analysis/taint-checker-callback-order-without-definition.c

  Log Message:
  -----------
  [analyzer] Add failing test case demonstrating buggy taint propagation

Recently we uncovered a serious bug in the `GenericTaintChecker`.
It was already flawed before D116025, but that was the patch that turned
this silent bug into a crash.

It happens if the `GenericTaintChecker` has a rule for a function, which
also has a definition.

  char *fgets(char *s, int n, FILE *fp) {
    nested_call();   // no parameters!
    return (char *)0;
  }

  // Within some function:
  fgets(..., tainted_fd);

When the engine inlines the definition and finds a function call within
that, the `PostCall` event for the call will get triggered sooner than the
`PostCall` for the original function.
This mismatch violates the assumption of the `GenericTaintChecker` which
wants to propagate taint information from the `PreCall` event to the
`PostCall` event, where it can actually bind taint to the return value
**of the same call**.

Let's get back to the example and go through step-by-step.
The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>`
event, so it would 'remember' that it needs to taint the return value
and the buffer, from the `PostCall` handler, where it has access to the
return value symbol.
However, the engine will inline fgets and the `nested_call()` gets
evaluated subsequently, which produces an unimportant
`PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is
observed by the `GenericTaintChecker`, which will unconditionally mark
tainted the 'remembered' arg indexes, trying to access a non-existing
argument, resulting in a crash.
If it doesn't crash, it will behave completely unintuitively, by marking
completely unrelated memory regions tainted, which is even worse.

The resulting assertion is something like this:
  Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion
          `Arg < getNumArgs() && "Arg access out of range!"' failed.

The gist of the backtrace:
  CallExpr::getArg(unsigned int) const
  SimpleFunctionCall::getArgExpr(unsigned int)
  CallEvent::getArgSVal(unsigned int) const
  GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const

Prior to D116025, there was a check for the argument count before it
applied taint, however, it still suffered from the same underlying
issue/bug regarding propagation.

This path does not intend to fix the bug, rather start a discussion on
how to fix this.

---

Let me elaborate on how I see this problem.

This pre-call, post-call juggling is just a workaround.
The engine should by itself propagate taint where necessary right where
it invalidates regions.
For the tracked values, which potentially escape, we need to erase the
information we know about them; and this is exactly what is done by
invalidation.
However, in the case of taint, we basically want to approximate from the
opposite side of the spectrum.
We want to preserve taint in most cases, rather than cleansing them.

Now, we basically sanitize all escaping tainted regions implicitly,
since invalidation binds a fresh conjured symbol for the given region,
and that has not been associated with taint.

IMO this is a bad default behavior, we should be more aggressive about
preserving taint if not further spreading taint to the reachable
regions.

We have a couple of options for dealing with it (let's call it //tainting
policy//):
  1) Taint only the parameters which were tainted prior to the call.
  2) Taint the return value of the call, since it likely depends on the
     tainted input - if any arguments were tainted.
  3) Taint all escaped regions - (maybe transitively using the cluster
     algorithm) - if any arguments were tainted.
  4) Not taint anything - this is what we do right now :D

The `ExprEngine` should not deal with taint on its own. It should be done
by a checker, such as the `GenericTaintChecker`.
However, the `Pre`-`PostCall` checker callbacks are not designed for this.
`RegionChanges` would be a much better fit for modeling taint propagation.
What we would need in the `RegionChanges` callback is the `State` prior
invalidation, the `State` after the invalidation, and a `CheckerContext` in
which the checker can create transitions, where it would place `NoteTags`
for the modeled taint propagations and report errors if a taint sink
rule gets violated.
In this callback, we could query from the prior State, if the given
value was tainted; then act and taint if necessary according to the
checker's tainting policy.

By using RegionChanges for this, we would 'fix' the mentioned
propagation bug 'by-design'.

Reviewed By: Szelethus

Differential Revision: https://reviews.llvm.org/D118987


  Commit: b099e1e562555fbc67e2e0abbc15074e14a85ff1
      https://github.com/llvm/llvm-project/commit/b099e1e562555fbc67e2e0abbc15074e14a85ff1
  Author: Balazs Benics <balazs.benics at sigmatechnology.se>
  Date:   2022-02-14 (Mon, 14 Feb 2022)

  Changed paths:
    M clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
    M clang/test/Analysis/taint-checker-callback-order-has-definition.c

  Log Message:
  -----------
  [analyzer] Fix taint propagation by remembering to the location context

Fixes the issue D118987 by mapping the propagation to the callsite's
LocationContext.
This way we can keep track of the in-flight propagations.

Note that empty propagation sets won't be inserted.

Reviewed By: NoQ, Szelethus

Differential Revision: https://reviews.llvm.org/D119128


  Commit: bf5963bf19670ea58facdf57492e147c13bb650f
      https://github.com/llvm/llvm-project/commit/bf5963bf19670ea58facdf57492e147c13bb650f
  Author: Balazs Benics <balazs.benics at sigmatechnology.se>
  Date:   2022-02-14 (Mon, 14 Feb 2022)

  Changed paths:
    M clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
    M clang/test/Analysis/taint-checker-callback-order-has-definition.c
    M clang/test/Analysis/taint-checker-callback-order-without-definition.c
    M clang/test/Analysis/taint-generic.c

  Log Message:
  -----------
  [analyzer] Fix taint rule of fgets and setproctitle_init

There was a typo in the rule.
`{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the
variadic index is `-1`.
What we wanted instead is that both `0` and `-1` are in the discrete index
list.

Instead of this, we wanted to express that both `0` and the
`ReturnValueIndex` is in the discrete arg list.

The manual inspection revealed that `setproctitle_init` also suffered a
probably incomplete propagation rule.

Reviewed By: Szelethus, gamesh411

Differential Revision: https://reviews.llvm.org/D119129


Compare: https://github.com/llvm/llvm-project/compare/ae8b63866d76...bf5963bf1967


More information about the All-commits mailing list