[PATCH] D120310: [clang][analyzer] Add modeling of 'errno' (work-in-progress).

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 23 03:48:52 PST 2022


steakhal added inline comments.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h:767-768
            s->getType()->isBlockPointerType());
-    assert(isa<UnknownSpaceRegion>(sreg) || isa<HeapSpaceRegion>(sreg));
+    assert(isa<UnknownSpaceRegion>(sreg) || isa<HeapSpaceRegion>(sreg) ||
+           isa<GlobalSystemSpaceRegion>(sreg));
   }
----------------
NoQ wrote:
> steakhal wrote:
> > At this point, I'm not even sure if we should assert any of these.
> I mean, ideally this should be just `UnknownSpaceRegion`. For everything else we should have made a fresh `MemRegion` class. It is absurd that the same numeric address value (represented the symbol `s`) can correspond to two (now three) different locations in memory.
>From my perspective, a symbolic region is just a region that we don't know what its parent region is - let it be part of an object or an entire memory space.
The memory space is what we use for communicating some constraints on where that symbolic region.
E.g. a symbolic region of a stack memory space should never alias with any memory region of the heap memory space.

> It is absurd that the same numeric address value (represented the symbol s) can correspond to two (now three) different locations in memory.
IMO a given `s` should indeed refer to a single location in memory, but that's not a property we can enforce here in the constructor.
And my guess is that by removing this assertion, we would still have this invariant. What we should do instead, put an assertion into the `SymbolManager`, to enforce that with the same `s` symbol, one cannot construct two `SymbolicRegions` with different memory spaces.
Alternatively, we could remove the memory space from the `Profile` to map to the same entity in the Map; which would enforce this invariant on its own. WDYT?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Errno.h:24-26
+/// Returns if modeling of 'errno' is available.
+/// If not, the other functions here should not be used.
+bool isErrnoAvailable(ProgramStateRef State);
----------------
NoQ wrote:
> steakhal wrote:
> > I don't think we need this.
> > In `getErrnoValue()` we should simply return `UnknownVal` if we don't have 'errno.
> > And in `setErrnoValue()` we should return the incoming `State` unmodified.
> > 
> > THat being said, only a top-level `Check::BeginFunction` callback could see an 'errno' uninitialized, which I don't think is a real issue.
> > All the rest of the callbacks would run after it's initialized, thus would behave as expected.
> > And in case the translation unit doesn't have nor 'errno' variable nor 'errno_location_ish' functions, ignoring these `set/get` functions is actually the expected behavior.
> > 
> > Having to check `isErrnoAvailable()` would be really an anti-pattern.
> `UnknownVal` implies that it's an actual value but we don't know which one. If the value doesn't exist we shouldn't use it. And if the user doesn't include the appropriate header then the value really doesn't exist in the translation unit. So flavor-wise I think we shouldn't use `UnknownVal` here; I'd rather have an `Optional<SVal>`.
> 
> Other than that, yeah, I think this is a good suggestion.
Yeah, probably the `Optional` is a better alternative.

However, I'd like to explain my reasoning behind my previous suggestion:
`Unknown` can/should model any values that the analyzer cannot currently reason about: e.g. floating-point numbers.
In this case, we cannot reason about the `errno`, thus I recommended it.

My point is, that we shouldn't regress our API for a marginal problem. And according to my reasoning, this seems to be a marginal issue.
Using `Optional` here is better than the current implementation, but not by much.
It would still result in a suboptimal coding pattern, where we guard each access to get/set `errno` by an `if` statement. In the cases where we would use the `getOr()` getter, I'm expecting to see the users pass the `UnknownVal()` as the fallback value in most cases anyway.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp:32
+// FIXME: Is there a system where it is not called "errno" but is a variable?
+const char *ErrnoVarName = "errno";
+// Names of functions that return a location of the "errno" value.
----------------
NoQ wrote:
> steakhal wrote:
> > Why don't you use `StringRef` here as well?
> There's [[ https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors | that rule against static constructors/destructors ]]. We should use a lot of `constexpr` in these situations and/or try to turn these structures into plain C arrays as much as possible.
Ah, I see, thanks!


================
Comment at: clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp:134
+    const SymbolConjured *Sym =
+        SVB.conjureSymbol(nullptr, C.getLocationContext(), ACtx.IntTy, 1);
+
----------------
NoQ wrote:
> steakhal wrote:
> > 
> This is literally the first step of the analysis. The block count is //known//.
> 
> What I really want to see here is a non-trivial custom tag (i.e., the `const void *symbolTag` parameter on some of the overloads of `conjureSymbolVal()`) to make sure this symbol isn't treated as a duplicate of any other symbol some other checker conjures initially. Like all tags, you can initialize it with some address of some checker-static variable.
I would still use the `C.blockCount()` to make it more in-line with the rest of the uses of `conjureSymbol()`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120310/new/

https://reviews.llvm.org/D120310



More information about the cfe-commits mailing list