[clang] bbcf11f - [clang][dataflow] Weaken abstract comparison to enable loop termination.

Wed Apr 13 12:51:12 PDT 2022

Author: Yitzhak Mandelbaum
Date: 2022-04-13T19:49:50Z
New Revision: bbcf11f5af98a6e0fa008e180404cfc397f336fa

URL: https://github.com/llvm/llvm-project/commit/bbcf11f5af98a6e0fa008e180404cfc397f336fa
DIFF: https://github.com/llvm/llvm-project/commit/bbcf11f5af98a6e0fa008e180404cfc397f336fa.diff

LOG: [clang][dataflow] Weaken abstract comparison to enable loop termination.

Currently, when the framework is used with an analysis that does not override
`compareEquivalent`, it does not terminate for most loops. The root cause is the
interaction of (the default implementation of) environment comparison
(`compareEquivalent`) and the means by which locations and values are
allocated. Specifically, the creation of certain values (including: reference
and pointer values; merged values) results in allocations of fresh locations in
the environment. As a result, analysis of even trivial loop bodies produces
different (if isomorphic) environments, on identical inputs. At the same time,
the default analysis relies on strict equality (versus some relaxed notion of
equivalence). Together, when the analysis compares these isomorphic, yet
unequal, environments, to determine whether the successors of the given block
need to be (re)processed, the result is invariably "yes", thus preventing loop
analysis from reaching a fixed point.

There are many possible solutions to this problem, including equivalence that is
less than strict pointer equality (like structural equivalence) and/or the
introduction of an explicit widening operation. However, these solutions will
require care to be implemented correctly. While a high priority, it seems more
urgent that we fix the current default implentation to allow
termination. Therefore, this patch proposes, essentially, to change the default
comparison to trivally equate any two values. As a result, we can say precisely
that the analysis will process the loop exactly twice -- once to establish an
initial result state and the second to produce an updated result which will
(always) compare equal to the previous. While clearly unsound -- we are not
reaching a fix point of the transfer function, in practice, this level of
analysis will find many practical issues where a single iteration of the loop
impacts abstract program state.

Note, however, that the change to the default `merge` operation does not affect
soundness, because the framework already produces a fresh (sound) abstraction of
the value when the two values are distinct. The previous setting was overly
conservative.

Differential Revision: https://reviews.llvm.org/D123586

Added: 
    

Modified: 
    clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
    clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
    clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
    clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp

Removed: 
    


################################################################################
diff  --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 68d6e6345f639..bf4b7e5cc5bd9 100644

--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -77,7 +77,11 @@ class Environment {
                                    const Environment &Env2) {
       // FIXME: Consider adding QualType to StructValue and removing the Type
       // argument here.
-      return false;
+      //
+      // FIXME: default to a sound comparison and/or expand the comparison logic
+      // built into the framework to support broader forms of equivalence than
+      // strict pointer equality.
+      return true;
     }
 
     /// Modifies `MergedVal` to approximate both `Val1` and `Val2`. This could
@@ -101,7 +105,7 @@ class Environment {
                        const Environment &Env1, const Value &Val2,
                        const Environment &Env2, Value &MergedVal,
                        Environment &MergedEnv) {
-      return false;
+      return true;
     }
   };
 

diff  --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index 5b372dde89532..75618fddae444 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -59,7 +59,8 @@ static bool equivalentValues(QualType Type, Value *Val1,
   if (auto *IndVal1 = dyn_cast<IndirectionValue>(Val1)) {
     auto *IndVal2 = cast<IndirectionValue>(Val2);
     assert(IndVal1->getKind() == IndVal2->getKind());
-    return &IndVal1->getPointeeLoc() == &IndVal2->getPointeeLoc();
+    if (&IndVal1->getPointeeLoc() == &IndVal2->getPointeeLoc())
+      return true;
   }
 
   return Model.compareEquivalent(Type, *Val1, Env1, *Val2, Env2);
@@ -88,6 +89,9 @@ static Value *mergeDistinctValues(QualType Type, Value *Val1, Environment &Env1,
   // depends on `FC1` and `FC2`) and modify `flowConditionImplies` to construct
   // a formula that includes the bi-conditionals for all flow condition atoms in
   // the transitive set, before invoking the solver.
+  //
+  // FIXME: Does not work for backedges, since the two (or more) paths will not
+  // have mutually exclusive conditions.
   if (auto *Expr1 = dyn_cast<BoolValue>(Val1)) {
     for (BoolValue *Constraint : Env1.getFlowConditionConstraints()) {
       Expr1 = &Env1.makeAnd(*Expr1, *Constraint);
@@ -285,9 +289,7 @@ bool Environment::equivalentTo(const Environment &Other,
   if (MemberLocToStruct != Other.MemberLocToStruct)
     return false;
 
-  if (LocToVal.size() != Other.LocToVal.size())
-    return false;
-
+  // Compare the contents for the intersection of their domains.
   for (auto &Entry : LocToVal) {
     const StorageLocation *Loc = Entry.first;
     assert(Loc != nullptr);
@@ -297,7 +299,7 @@ bool Environment::equivalentTo(const Environment &Other,
 
     auto It = Other.LocToVal.find(Loc);
     if (It == Other.LocToVal.end())
-      return false;
+      continue;
     assert(It->second != nullptr);
 
     if (!equivalentValues(Loc->getType(), Val, *this, It->second, Other, Model))
@@ -346,8 +348,7 @@ LatticeJoinEffect Environment::join(const Environment &Other,
       continue;
     assert(It->second != nullptr);
 
-    if (equivalentValues(Loc->getType(), Val, *this, It->second, Other,
-                         Model)) {
+    if (Val == It->second) {
       LocToVal.insert({Loc, Val});
       continue;
     }

diff  --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 32beab89a18be..a0b018c3c5724 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -2944,4 +2944,71 @@ TEST_F(TransferTest, CorrelatedBranches) {
       });
 }
 
+TEST_F(TransferTest, LoopWithAssignmentConverges) {
+  std::string Code = R"(
+
+    bool &foo();
+
+    void target() {
+       do {
+        bool Bar = foo();
+        if (Bar) break;
+        (void)Bar;
+        /*[[p]]*/
+      } while (true);
+    }
+  )";
+  // The key property that we are verifying is implicit in `runDataflow` --
+  // namely, that the analysis succeeds, rather than hitting the maximum number
+  // of iterations.
+  runDataflow(
+      Code, [](llvm::ArrayRef<
+                   std::pair<std::string, DataflowAnalysisState<NoopLattice>>>
+                   Results,
+               ASTContext &ASTCtx) {
+        ASSERT_THAT(Results, ElementsAre(Pair("p", _)));
+        const Environment &Env = Results[0].second.Env;
+
+        const ValueDecl *BarDecl = findValueDecl(ASTCtx, "Bar");
+        ASSERT_THAT(BarDecl, NotNull());
+
+        auto &BarVal = *cast<BoolValue>(Env.getValue(*BarDecl, SkipPast::None));
+        EXPECT_TRUE(Env.flowConditionImplies(Env.makeNot(BarVal)));
+      });
+}
+
+TEST_F(TransferTest, LoopWithReferenceAssignmentConverges) {
+  std::string Code = R"(
+
+    bool &foo();
+
+    void target() {
+       do {
+        bool& Bar = foo();
+        if (Bar) break;
+        (void)Bar;
+        /*[[p]]*/
+      } while (true);
+    }
+  )";
+  // The key property that we are verifying is implicit in `runDataflow` --
+  // namely, that the analysis succeeds, rather than hitting the maximum number
+  // of iterations.
+  runDataflow(
+      Code, [](llvm::ArrayRef<
+                   std::pair<std::string, DataflowAnalysisState<NoopLattice>>>
+                   Results,
+               ASTContext &ASTCtx) {
+        ASSERT_THAT(Results, ElementsAre(Pair("p", _)));
+        const Environment &Env = Results[0].second.Env;
+
+        const ValueDecl *BarDecl = findValueDecl(ASTCtx, "Bar");
+        ASSERT_THAT(BarDecl, NotNull());
+
+        auto &BarVal =
+            *cast<BoolValue>(Env.getValue(*BarDecl, SkipPast::Reference));
+        EXPECT_TRUE(Env.flowConditionImplies(Env.makeNot(BarVal)));
+      });
+}
+
 } // namespace

diff  --git a/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp b/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
index f93b3fc2e8ed8..2f5185a47e3e2 100644
--- a/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
@@ -365,8 +365,10 @@ class OptionalIntAnalysis
     if (HasValue2 == nullptr)
       return false;
 
-    assert(HasValue1 != HasValue2);
-    cast<StructValue>(&MergedVal)->setProperty("has_value", HasValueTop);
+    if (HasValue1 == HasValue2)
+      cast<StructValue>(&MergedVal)->setProperty("has_value", *HasValue1);
+    else
+      cast<StructValue>(&MergedVal)->setProperty("has_value", HasValueTop);
     return true;
   }