[llvm] 4ecdbf2 - llvm-reduce: Fix tsan failures

Thu Dec 1 11:41:26 PST 2022

Author: Matt Arsenault
Date: 2022-12-01T14:41:21-05:00
New Revision: 4ecdbf2e4ececec82cabb0d6e21f8397aa148b57

URL: https://github.com/llvm/llvm-project/commit/4ecdbf2e4ececec82cabb0d6e21f8397aa148b57
DIFF: https://github.com/llvm/llvm-project/commit/4ecdbf2e4ececec82cabb0d6e21f8397aa148b57.diff

LOG: llvm-reduce: Fix tsan failures

There's a data race on the UninterestingChunks set. The code seems to
be operating on the assumption that all the tasks completed, so ensure
the unused results do complete. This started showing up about 50% of
the time when running operands-skip-parallel.ll after the recent
switch to use DenseSet; previously it failed much less frequently with
std::set.

We should introduce a mechanism to early terminate unused
results. Alternatively, I've been thinking about ways to to make the
reduction order smarter. I frequently have tests that take multiple
minutes to compile and hit the failure. It may be helpful to see which
chunks took the least time and prefer those over just taking the first
result.

Added: 
    

Modified: 
    llvm/tools/llvm-reduce/deltas/Delta.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/tools/llvm-reduce/deltas/Delta.cpp b/llvm/tools/llvm-reduce/deltas/Delta.cpp
index 8381d0cd680bd..da9278dcf86e7 100644

--- a/llvm/tools/llvm-reduce/deltas/Delta.cpp
+++ b/llvm/tools/llvm-reduce/deltas/Delta.cpp
@@ -210,6 +210,16 @@ static SmallString<0> ProcessChunkFromSerializedBitcode(
   return Result;
 }
 
+using SharedTaskQueue = std::deque<std::shared_future<SmallString<0>>>;
+
+static void waitAndDiscardResultsBarrier(SharedTaskQueue &TaskQueue) {
+  while (!TaskQueue.empty()) {
+    auto &Future = TaskQueue.front();
+    Future.wait();
+    TaskQueue.pop_front();
+  }
+}
+
 /// Runs the Delta Debugging algorithm, splits the code into chunks and
 /// reduces the amount of chunks that are considered interesting by the
 /// given test. The number of chunks is determined by a preliminary run of the
@@ -280,7 +290,7 @@ void llvm::runDeltaPass(TestRunner &Test, ReductionFunc ExtractChunksFromModule,
       writeBitcode(Test.getProgram(), BCOS);
     }
 
-    std::deque<std::shared_future<SmallString<0>>> TaskQueue;
+    SharedTaskQueue TaskQueue;
     for (auto I = ChunksStillConsideredInteresting.rbegin(),
               E = ChunksStillConsideredInteresting.rend();
          I != E; ++I) {
@@ -350,6 +360,14 @@ void llvm::runDeltaPass(TestRunner &Test, ReductionFunc ExtractChunksFromModule,
                       Test.getToolName());
           break;
         }
+
+        // If we broke out of the loop, we still need to wait for everything to
+        // avoid race access to the chunk set.
+        //
+        // TODO: Create a way to kill remaining items we're ignoring; they could
+        // take a long time.
+        waitAndDiscardResultsBarrier(TaskQueue);
+
         // Forward I to the last chunk processed in parallel.
         I += NumChunksProcessed - 1;
       } else {