[llvm] [coverage] Add option for keeping mappings order (PR #91600)

via llvm-commits llvm-commits at lists.llvm.org
Sat Jun 22 04:18:04 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-pgo

Author: Lambdaris (Lambdaris)

<details>
<summary>Changes</summary>

This patch fixes possible exceptions due to processing mcdc coverage code from rust with [nested decisions](https://github.com/rust-lang/rust/pull/124255).

Rust now supports nested decisions such as 
```rust
fn inner_decision(a: bool) -> bool {
    !a
}

fn foo(a: bool, b: bool, c: bool, d: bool) {
    if inner_decision(a || b || c) && d {
        println!("true");
    }
}

fn main() {
    foo(true, false, true, false);
}
```
But this example could make llvm-cov panic. We can reproduce it with:
```bash
# Ensure installed latest nightly rust
cargo +nightly rustc --bin foo -- -Cinstrument-coverage -Zcoverage-options=mcdc
LLVM_PROFILE_FILE="foo.profraw" target/debug/foo
llvm-profdata merge -sparse foo.profraw -o foo.profdata
llvm-cov show target/debug/foo -instr-profile="foo.profdata" --show-mcdc
```

For now llvm-cov could generate wrong result with such kind of code when:
* Some conditions of the nested decision are located in front of at least one condition of the outer.
* The nested decision's start location is behind of the outer decision.

It would panic when:
* The nested decision has more conditions than the outer.

Let's consider that example still. The example code generate 2 decision:
* D1: the nested decision, comes from `a || b || c`, with 3 conditions `a`, `b`, `c`.
* D2: the outer decision, comes from `inner_decision(a || b || c) && d` , with 2 conditions `inner_decision(a || b || c)`, `d`.

After the sort, the order of mappings would become:
Conditions: `inner_decision(a || b || c)`, `a`, `b`, `c`, `d`
Decisions:   D2, D1
due to the start location comparison.
Then llvm-cov reconstructs mapping between decisions and conditions:
1. Try to bind `inner_decision(a || b || c)` with D2, ok. -> right
2. Try to bind `a` with D2, but D2 already has a condition with id 1. So bind `a` with D1, ok -> right
3. Try to bind `b` with D2, clearly span of D2 dominates `b`, thus ok -> wrong, `b` should be a condition of D1.

Note that `b` has a false next condition id 3 and is taken as condition of D2. Hence when llvm-cov tries to construct the decision tree of D2, it attempts to visit the third element of a vector with length 2, leading to boom.

Though we can forbid people to write code in such style, macros in rust also could generate code like it, even something more dreadful like `if if a || b || c { true } else { false } && d`.

Another reason to persist the nested decision implementation is that rust has pattern matching and there might be some code like 
```rust
// let-chains
if let Ok(Some(IpAddr::V4(addr)) = ip && let Some(size) = val { /*...*/ }
```
Here `Ok(Some(IpAddr::V4(addr))` could generate a mcdc decision because it also means branching in CFG.

The most painless way to fix it might keep the relative order of branch mappings and decision mappins, so that llvm-cov can regroup them in same order.

---
Full diff: https://github.com/llvm/llvm-project/pull/91600.diff


3 Files Affected:

- (modified) llvm/include/llvm/ProfileData/Coverage/CoverageMappingWriter.h (+4-2) 
- (modified) llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp (+19-16) 
- (modified) llvm/unittests/ProfileData/CoverageMappingTest.cpp (+79-4) 


``````````diff
diff --git a/llvm/include/llvm/ProfileData/Coverage/CoverageMappingWriter.h b/llvm/include/llvm/ProfileData/Coverage/CoverageMappingWriter.h
index 02848deaba9db1..00363a25e8806b 100644
--- a/llvm/include/llvm/ProfileData/Coverage/CoverageMappingWriter.h
+++ b/llvm/include/llvm/ProfileData/Coverage/CoverageMappingWriter.h
@@ -42,13 +42,15 @@ class CoverageMappingWriter {
   ArrayRef<unsigned> VirtualFileMapping;
   ArrayRef<CounterExpression> Expressions;
   MutableArrayRef<CounterMappingRegion> MappingRegions;
+  bool KeepMappingOrder;
 
 public:
   CoverageMappingWriter(ArrayRef<unsigned> VirtualFileMapping,
                         ArrayRef<CounterExpression> Expressions,
-                        MutableArrayRef<CounterMappingRegion> MappingRegions)
+                        MutableArrayRef<CounterMappingRegion> MappingRegions,
+                        bool KeepMappingOrder = false)
       : VirtualFileMapping(VirtualFileMapping), Expressions(Expressions),
-        MappingRegions(MappingRegions) {}
+        MappingRegions(MappingRegions), KeepMappingOrder(KeepMappingOrder) {}
 
   /// Write encoded coverage mapping data to the given output stream.
   void write(raw_ostream &OS);
diff --git a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
index adfd22804356e1..974138de359002 100644
--- a/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
+++ b/llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
@@ -161,22 +161,25 @@ void CoverageMappingWriter::write(raw_ostream &OS) {
 
   // Sort the regions in an ascending order by the file id and the starting
   // location. Sort by region kinds to ensure stable order for tests.
-  llvm::stable_sort(MappingRegions, [](const CounterMappingRegion &LHS,
-                                       const CounterMappingRegion &RHS) {
-    if (LHS.FileID != RHS.FileID)
-      return LHS.FileID < RHS.FileID;
-    if (LHS.startLoc() != RHS.startLoc())
-      return LHS.startLoc() < RHS.startLoc();
-
-    // Put `Decision` before `Expansion`.
-    auto getKindKey = [](CounterMappingRegion::RegionKind Kind) {
-      return (Kind == CounterMappingRegion::MCDCDecisionRegion
-                  ? 2 * CounterMappingRegion::ExpansionRegion - 1
-                  : 2 * Kind);
-    };
-
-    return getKindKey(LHS.Kind) < getKindKey(RHS.Kind);
-  });
+  if (!KeepMappingOrder) {
+    llvm::stable_sort(MappingRegions, [](const CounterMappingRegion &LHS,
+                                         const CounterMappingRegion &RHS) {
+      if (LHS.FileID != RHS.FileID)
+        return LHS.FileID < RHS.FileID;
+
+      if (LHS.startLoc() != RHS.startLoc())
+        return LHS.startLoc() < RHS.startLoc();
+
+      // Put `Decision` before `Expansion`.
+      auto getKindKey = [](CounterMappingRegion::RegionKind Kind) {
+        return (Kind == CounterMappingRegion::MCDCDecisionRegion
+                    ? 2 * CounterMappingRegion::ExpansionRegion - 1
+                    : 2 * Kind);
+      };
+
+      return getKindKey(LHS.Kind) < getKindKey(RHS.Kind);
+    });
+  }
 
   // Write out the fileid -> filename mapping.
   encodeULEB128(VirtualFileMapping.size(), OS);
diff --git a/llvm/unittests/ProfileData/CoverageMappingTest.cpp b/llvm/unittests/ProfileData/CoverageMappingTest.cpp
index ef147674591c51..78715a34a2495a 100644
--- a/llvm/unittests/ProfileData/CoverageMappingTest.cpp
+++ b/llvm/unittests/ProfileData/CoverageMappingTest.cpp
@@ -221,13 +221,16 @@ struct CoverageMappingTest : ::testing::TestWithParam<std::tuple<bool, bool>> {
     InputFunctions.back().Expressions.push_back(CE);
   }
 
-  std::string writeCoverageRegions(InputFunctionCoverageData &Data) {
+  std::string writeCoverageRegions(InputFunctionCoverageData &Data,
+                                   bool KeepMappingsOrder) {
     SmallVector<unsigned, 8> FileIDs(Data.ReverseVirtualFileMapping.size());
     for (const auto &E : Data.ReverseVirtualFileMapping)
       FileIDs[E.second] = E.first;
     std::string Coverage;
     llvm::raw_string_ostream OS(Coverage);
-    CoverageMappingWriter(FileIDs, Data.Expressions, Data.Regions).write(OS);
+    CoverageMappingWriter(FileIDs, Data.Expressions, Data.Regions,
+                          KeepMappingsOrder)
+        .write(OS);
     return OS.str();
   }
 
@@ -245,10 +248,12 @@ struct CoverageMappingTest : ::testing::TestWithParam<std::tuple<bool, bool>> {
     EXPECT_THAT_ERROR(Reader.read(), Succeeded());
   }
 
-  void writeAndReadCoverageRegions(bool EmitFilenames = true) {
+  void writeAndReadCoverageRegions(bool EmitFilenames = true,
+                                   bool KeepMappingsOrder = false) {
     OutputFunctions.resize(InputFunctions.size());
     for (unsigned I = 0; I < InputFunctions.size(); ++I) {
-      std::string Regions = writeCoverageRegions(InputFunctions[I]);
+      std::string Regions =
+          writeCoverageRegions(InputFunctions[I], KeepMappingsOrder);
       readCoverageRegions(Regions, OutputFunctions[I]);
       OutputFunctions[I].Name = InputFunctions[I].Name;
       OutputFunctions[I].Hash = InputFunctions[I].Hash;
@@ -316,6 +321,76 @@ TEST_P(CoverageMappingTest, basic_write_read) {
   }
 }
 
+TEST_P(CoverageMappingTest, SortMappings) {
+  startFunction("func", 0x1234);
+  addMCDCDecisionCMR(3, 2, "file", 7, 1, 7, 12);
+  addMCDCBranchCMR(Counter::getCounter(0), Counter::getCounter(1), 0, {-1, 1},
+                   "file", 7, 2, 7, 9);
+  addMCDCBranchCMR(Counter::getCounter(2), Counter::getCounter(3), 1, {-1, -1},
+                   "file", 7, 11, 7, 12);
+  addMCDCDecisionCMR(7, 3, "file", 7, 2, 7, 9);
+  addMCDCBranchCMR(Counter::getCounter(0), Counter::getCounter(5), 0, {-1, 1},
+                   "file", 7, 2, 7, 3);
+  addMCDCBranchCMR(Counter::getCounter(6), Counter::getCounter(7), 1, {-1, 2},
+                   "file", 7, 5, 7, 6);
+  addMCDCBranchCMR(Counter::getCounter(8), Counter::getCounter(9), 2, {-1, -1},
+                   "file", 7, 8, 7, 9);
+
+  const std::vector<CounterMappingRegion> MappingsBeforeSourt =
+      InputFunctions.back().Regions;
+  writeAndReadCoverageRegions(true, false);
+  ASSERT_EQ(1u, InputFunctions.size());
+  ASSERT_EQ(1u, OutputFunctions.size());
+  const auto &Input = MappingsBeforeSourt;
+  const auto &Output = OutputFunctions.back().Regions;
+
+  size_t N = ArrayRef(Input).size();
+  llvm::SmallVector<std::tuple<size_t, size_t>> SortedMap = {
+      {0, 0}, {3, 1}, {1, 2}, {4, 3}, {5, 4}, {6, 5}, {2, 6}};
+  ASSERT_EQ(N, Output.size());
+  for (const auto &[InputIdx, OutputIdx] : SortedMap) {
+    ASSERT_EQ(Input[InputIdx].Count, Output[OutputIdx].Count);
+    ASSERT_EQ(Input[InputIdx].FileID, Output[OutputIdx].FileID);
+    ASSERT_EQ(Input[InputIdx].startLoc(), Output[OutputIdx].startLoc());
+    ASSERT_EQ(Input[InputIdx].endLoc(), Output[OutputIdx].endLoc());
+    ASSERT_EQ(Input[InputIdx].Kind, Output[OutputIdx].Kind);
+  }
+}
+
+TEST_P(CoverageMappingTest, SkipMappingSorting) {
+  startFunction("func", 0x1234);
+  addMCDCDecisionCMR(3, 2, "file", 7, 1, 7, 12);
+  addMCDCBranchCMR(Counter::getCounter(0), Counter::getCounter(1), 0, {-1, 1},
+                   "file", 7, 2, 7, 9);
+  addMCDCBranchCMR(Counter::getCounter(2), Counter::getCounter(3), 1, {-1, -1},
+                   "file", 7, 11, 7, 12);
+  addMCDCDecisionCMR(7, 3, "file", 7, 2, 7, 9);
+  addMCDCBranchCMR(Counter::getCounter(4), Counter::getCounter(5), 0, {-1, 1},
+                   "file", 7, 2, 7, 3);
+  addMCDCBranchCMR(Counter::getCounter(6), Counter::getCounter(7), 1, {-1, 2},
+                   "file", 7, 5, 7, 6);
+  addMCDCBranchCMR(Counter::getCounter(8), Counter::getCounter(9), 2, {-1, -1},
+                   "file", 7, 8, 7, 9);
+
+  const std::vector<CounterMappingRegion> MappingsBeforeSourt =
+      InputFunctions.back().Regions;
+  writeAndReadCoverageRegions(true, true);
+  ASSERT_EQ(1u, InputFunctions.size());
+  ASSERT_EQ(1u, OutputFunctions.size());
+  const auto &Input = MappingsBeforeSourt;
+  const auto &Output = OutputFunctions.back().Regions;
+
+  size_t N = ArrayRef(Input).size();
+  ASSERT_EQ(N, Output.size());
+  for (size_t I = 0; I < N; ++I) {
+    ASSERT_EQ(Input[I].Count, Output[I].Count);
+    ASSERT_EQ(Input[I].FileID, Output[I].FileID);
+    ASSERT_EQ(Input[I].startLoc(), Output[I].startLoc());
+    ASSERT_EQ(Input[I].endLoc(), Output[I].endLoc());
+    ASSERT_EQ(Input[I].Kind, Output[I].Kind);
+  }
+}
+
 TEST_P(CoverageMappingTest, correct_deserialize_for_more_than_two_files) {
   const char *FileNames[] = {"bar", "baz", "foo"};
   static const unsigned N = std::size(FileNames);

``````````

</details>


https://github.com/llvm/llvm-project/pull/91600


More information about the llvm-commits mailing list