[PATCH] D123773: [clang][analyzer][ctu] Make CTU a two phase analysis

Gabor Marton via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri May 6 13:09:18 PDT 2022


martong marked 4 inline comments as done.
martong added a comment.

Thanks for the review Gabor, I'll update soon and hope I could answer your questions.



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h:116
 
+  const bool Foreign = false; // From CTU.
+
----------------
xazax.hun wrote:
> I feel that we use different terms for the imported declarations. Sometimes we call them `new`, sometimes `imported`, sometimes `foreign`. In case all of these means the same thing, it would be nice to standardize on a single way of naming. If there is a subtle difference between them, let's document that in a comment. It would be nice if we did not need the comment after the declaration but it would be obvious from the variable name.
Yes, I agree that this should deserver some more explanation. Maybe right above this declaration?

So, `new` means that a declaration is **created** newly by the ASTImporter.
`imported` means it has been imported, but not necessarily `new`. Think about this case, we import `foo`'s definition.
```
// to.cpp
void bar() {} // from a.h
// from.cpp
void bar() {} // from a.h
void foo() {
  bar();
}
```
Then `foo` will be `new` and `imported`, `bar` will be `imported` and not `new`.  
`foreign` basically means `imported` and `new`.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h:154
   llvm::PointerUnion<const Expr *, const Decl *> Origin;
+  mutable Optional<bool> Foreign; // From CTU.
 
----------------
xazax.hun wrote:
> Why do we need this to be mutable? Shouldn't we have this information when the CallEvent is created?
Unfortunately, we get this from the `RuntimeDefinition` which is not available during the creation of the `CallEvent`. We use `getRuntimeDefinition()` to retrieve the definition and attach that to the already existent `CallEvent`.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h:176
   WorkList *getWorkList() const { return WList.get(); }
+  WorkList *getCTUWorkList() const { return CTUWList.get(); }
 
----------------
xazax.hun wrote:
> I think we do not expect the STU WorkList to ever be null, maybe it is time to clean this up and return a reference. 
I'd rather do that independently, so this patch can be focused and kept minimal, if you don't mind.
(I mean `WorkList &getWorkList` would introduce irrelevant scattered changes.)


================
Comment at: clang/lib/StaticAnalyzer/Core/AnalyzerOptions.cpp:91
+               .Default(None);
+  assert(K.hasValue() && "CTU inlining mode is invalid.");
+  return K.getValue();
----------------
xazax.hun wrote:
> The `CTUPhase1InliningMode` is coming from the user, right? Unless we have some validation in place before this code get called, I think it might not be a good idea to assert fail on certain user inputs.
Yeah, I've copied this from below `assert(K.hasValue() && "IPA Mode is invalid.");`. Seems like the pattern we have everywhere. I suggest to get rid of this wrong pattern independently, but I'd not deviate from the pattern here, making it easier to do a bulk update once we do it.


================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:432
+
+bool ExprEngine::ctuBifurcate(const CallEvent &Call, const Decl *D,
+                              NodeBuilder &Bldr, ExplodedNode *Pred,
----------------
xazax.hun wrote:
> What is the meaning if the return value? It looks like the function always returns true.
Nothing. It is being used by `inlineCall` but there are no callers of `inlineCall` that would use it because `inlineCall` does return true on all paths as well. Good point, this is again a rotten legacy. I am going update this soon.


================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:446
+    }
+    const bool BState = State->contains<CTUDispatchBifurcationSet>(D);
+    if (!BState) { // This is the first time we see this foreign function.
----------------
xazax.hun wrote:
> So if we see the same foreign function called in multiple contexts, we will only queue one of the contexts for the CTU. Is this the intended design? 
> 
> So if I see:
> ```
> foreign(true);
> foreign(false);
> ```
> 
> The new CTU will only evaluate `foreign(true)` but not `foreign(false)`. 
This is intentional.
```
foreign(true);
foreign(false);
```
The new CTU will evaluate the following paths in the exploded graph:
```
foreign(true); // conservative evaluated
foreign(false); // conservative evaluated
foreign(true); // inlined
foreign(false); // inlined
```
The point is to keep bifurcation to a minimum and avoid the exponential blow up.
So, we will never have a path like this:
```
foreign(true); // conservative evaluated
foreign(false); // inlined
```

Actually, this is the same strategy that we use during the dynamic dispatch of C++ virtual calls. See `DynamicDispatchBifurcationMap`.

The conservative evaluation happens in the first phase, the inlining in the second phase (assuming the phase1 inlining option is set to none).


================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:516
+  // function in the main TU.
+  if (Engine.getCTUWorkList())
+    // Mark the decl as visited.
----------------
xazax.hun wrote:
> So `getCTUWorkList` will return `nullptr` in the second phase? Reading this code first I had the impression we will skip doing marking them visited in the first phase. I think having a helper like `IsSecondCTUPhase` or something would make this a bit less confusing.
Fair enough. I am going to update.


================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:1130
       // We are not bifurcating and we do have a Decl, so just inline.
-      if (inlineCall(*Call, D, Bldr, Pred, State))
+      if (ctuBifurcate(*Call, D, Bldr, Pred, State))
         return;
----------------
xazax.hun wrote:
> I think this is getting really confusing here. The comment saying we are not bifurcating and we call `ctuBifurcate`. While I do understand these are two different kinds of bifurcation but I think we should rethink how to name some of these functions. 
Yeah, good point, the comment is meaningless and confusing from this point, I'll just delete it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123773/new/

https://reviews.llvm.org/D123773



More information about the cfe-commits mailing list