[clang] [clang][dataflow] Add an early-out to `flowConditionImplies()` / `flowConditionAllows()`. (PR #77453)

Tue Jan 9 04:25:05 PST 2024

https://github.com/martinboehme created https://github.com/llvm/llvm-project/pull/77453

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where `F` is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)
```

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.


>From c661a3d41f5132a6c4c5758a71faaae58bdb6e39 Mon Sep 17 00:00:00 2001
From: Martin Braenne <mboehme at google.com>
Date: Tue, 9 Jan 2024 12:24:02 +0000
Subject: [PATCH] [clang][dataflow] Add an early-out to
 `flowConditionImplies()` / `flowConditionAllows()`.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This saves having to assemble the set of constraints and run the SAT solver in
the trivial case where `F` is true.

This is a performance win on the benchmarks for the Crubit nullability checker:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)
```

When running clang-tidy on real-world code, the results are less clear. In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in benchmarks, so I
think it is worth making.
---
 clang/include/clang/Analysis/FlowSensitive/Formula.h        | 4 ++++
 .../lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp  | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/clang/include/clang/Analysis/FlowSensitive/Formula.h b/clang/include/clang/Analysis/FlowSensitive/Formula.h
index 982e400c1deff1..0e6352403a832f 100644
--- a/clang/include/clang/Analysis/FlowSensitive/Formula.h
+++ b/clang/include/clang/Analysis/FlowSensitive/Formula.h
@@ -75,6 +75,10 @@ class alignas(const Formula *) Formula {
     return static_cast<bool>(Value);
   }
 
+  bool isLiteral(bool b) const {
+    return kind() == Literal && static_cast<bool>(Value) == b;
+  }
+
   ArrayRef<const Formula *> operands() const {
     return ArrayRef(reinterpret_cast<Formula *const *>(this + 1),
                     numOperands(kind()));
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index fa114979c8e326..500fbb39955d20 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -174,6 +174,9 @@ Solver::Result DataflowAnalysisContext::querySolver(
 
 bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
                                                    const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   // Returns true if and only if truth assignment of the flow condition implies
   // that `F` is also true. We prove whether or not this property holds by
   // reducing the problem to satisfiability checking. In other words, we attempt
@@ -188,6 +191,9 @@ bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
 
 bool DataflowAnalysisContext::flowConditionAllows(Atom Token,
                                                   const Formula &F) {
+  if (F.isLiteral(true))
+    return true;
+
   llvm::SetVector<const Formula *> Constraints;
   Constraints.insert(&arena().makeAtomRef(Token));
   Constraints.insert(&F);