[PATCH] D114718: [analyzer] Implement a new checker for Strict Aliasing Rule.

Denys Petrov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 30 08:31:21 PST 2021


ASDenysPetrov added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:48-49
+  //
+  // NOTE: User must provide canonical and unqualified QualType's for the
+  // correct result.
+  static bool canAccess(QualType From, QualType To, ASTContext &Ctx) {
----------------
NoQ wrote:
> `assert()` it? Or maybe canonicalize defensively so that not to duplicate code in the caller, given that there's really only one correct way to do that?
>Or maybe canonicalize defensively so that not to duplicate code in the caller,
We need canonical types in the bug report as well, so we have to get them on the caller side.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:58
+      : From(From), To(To), Ctx(Ctx) {}
+  bool canAccessImpl() {
+    return isSame() || isCharOrByte() || isOppositeSign();
----------------
xazax.hun wrote:
> I'd love to see some detailed descriptions of the strict aliasing rule, what parts are we checking for and what parts we don't. E.g. it would be nice to document the differences between C and C++ aliasing rules. I do remember something about prefixes, i.e.: it would be well defined to access something with the dynamic type `struct { int x, y; };` and read `struct{ int x; };` from it. Does that not apply to C++?
>I'd love to see some detailed descriptions of the strict aliasing rule, 
I've added a description in the header.
>it would be nice to document the differences between C and C++ aliasing rules.
I need some time to gather those differences and will add them to the description as well.
> it would be well defined to access something with the dynamic type struct { int x, y; }; and read struct{ int x; }; from it. Does that not apply to C++?
AFAIK it's UB if you do any access to the object which lifetime is not started in C++.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:80
+class StrictAliasingChecker : public Checker<check::Location> {
+  mutable std::unique_ptr<BugType> BT;
+
----------------
NoQ wrote:
> The modern solution is
> ```lang=c++
> BugType BT{this, "...", "..."};
> ```
Got it.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:113
+  QualType getOriginalType(CheckerContext &C, SVal V, QualType T) const {
+    assert(V.getAs<Loc>() && "Location shall be a Loc.");
+    V = C.getState()->getSVal(V.castAs<Loc>(), T);
----------------
NoQ wrote:
> I suspect it might also be `UnknownVal` (?) It usually makes sense to protect against such scenarios with an early return.
I'll try to add some tests to model this.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:141-145
+    auto ER = dyn_cast<ElementRegion>(R);
+    while (ER) {
+      R = ER->getSuperRegion();
+      ER = dyn_cast<ElementRegion>(R);
+    }
----------------
NoQ wrote:
> You're basically running into https://bugs.llvm.org/show_bug.cgi?id=43364 here. The element region is not necessarily a consequence of a pointer cast; it may also be a legitimate array element access, and there's no way to figure out what it really was. So I suspect that you may fail the following test case:
> ```lang=c++
> void foo() {
>   int x[10];
>   int *p = &x[1];
>   *p = 1; // int incompatible with int[10]
> }
> ```
> 
> Separately from that, ultimatetly you'll most likely have to handle //regions with symbolic base// separately. These regions are special because they are built when the information about the original type is not really unavailable. For now you're dodging this problem by only supporting `VarRegion`s with element sub-regions. But in the general case you may encounter code like this
> ```lang=c++
> void foo(void *p) {
>   int *x = p;
>   float *y = x;
>   *y = 1.0;
> }
> ```
> which may be valid if the original pointer `p` points to a `float`. But this can be delayed until later.
Oh, unfortunately I've stumbled upon these representation issues:
```
int arr[2][8]
auto p1 = (int(*)[8])arr[0]; // &Element{arr,0 S64b,int[8]}
auto p2 = (int(*)[8])arr;    // &Element{arr,0 S64b,int[8]}
auto p3 = (int(*)[7])arr[0]; // &Element{arr,0 S64b,int[7]}
auto x = *p1; // OK
auto y = *p2; // UB
auto z = *p3; // UB
```
I think we need some sort of a new region `CastRegion` that we can store the **provenance/origin** and **cast** type separately (in a canonical form), like I did for integers in  D103096. I can investigate it soon. And `ElementRegion` would be in charge of arrays only.
Or
We could modificate `ElementRegion` to be always in a canonical form which we could easily differentiate.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:157
+
+    ExplodedNode *Node = C.generateNonFatalErrorNode();
+    if (!BT)
----------------
NoQ wrote:
> I suggest a fatal error node. Undefined behavior already occured, there's nothing interesting in the follow-up.
> I suggest a fatal error node. 
That's a big discussion what would be correct :-) since 98.99% of modern compilers treat e.g. `*((short*)x)` as a truncation and the following flow works as most users expect (who doesn't aware about UB). So here we could just warn about UB and explore the rest of the code for similar UB's without //sinking//. That's why I decided using non-fatal node.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StrictAliasingChecker.cpp:178-179
+  const LangOptions &LO = CM.getLangOpts();
+  // Ideally, the condition should be `LO.CPlusPlus11 || LO.CPlusPlus2b` but
+  // implemented checks can be partially applied for C++17 and lower versions.
+  return LO.CPlusPlus;
----------------
NoQ wrote:
> We should probably keep the checker disabled in these language modes until we're sure it works correctly (in a conservative sense, i.e. doesn't emit false positives).
> 
> Can we check whether `-fstrict-aliasing` is enabled here? That's probably appropriate.
> Can we check whether -fstrict-aliasing is enabled here? That's probably appropriate.
Basically, I've already tried to do some preparations for using `-fstrict-aliasing` in D114006. But now it needs a bit modification. I'll update it.

>We should probably keep the checker disabled in these language modes until we're sure it works correctly (in a conservative sense, i.e. doesn't emit false positives).
I did it for the testing purposes. Several extisting tests help to find uncovered cases (As you can see, those, which have failed in //pre-merge checks//).
Agree. I'll change the condition to `LO.CPlusPlus20 || LO.CPlusPlus2b`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114718/new/

https://reviews.llvm.org/D114718



More information about the cfe-commits mailing list