[PATCH] D113857: [llvm-reduce] Add parallel chunk processing.

Mon Nov 15 10:56:24 PST 2021

Meinersbur added a comment.

Fixes spelling in title and summary.

In D113857#3130113 <https://reviews.llvm.org/D113857#3130113>, @fhahn wrote:

> One issue is that the text output of processing each Chunk in a job batch will get mixed & jumbled together.

[suggestion] Task printouts could be redirected to a buffer and printed in-order.

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:39-40
+    "max-chunk-threads",
+    cl::desc("Maximum number of threads to use to process chunks. Set to 1 to "
+             "disables parallelism. Maximum capped to 32."),
+    cl::init(1));
----------------
Why stop me from using 128 jobs on a 128 core machine?

Could warn about diminishing returns because or reduced results being discarded, i.e. explointing SMT not that useful.

[suggestion] Use `-j` (for "jobs") shortcut for consistency with build tools such as ninja and make.

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:227
+      // there are at least a few chunks to process.
+      if (MaxChunkThreads > 1 && WorkLeft > 4) {
+        NumTasks = std::min(WorkLeft, unsigned(MaxChunkThreads));
----------------
Why require at least 5 parallel jobs? Synchronization overhead?

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:228
+      if (MaxChunkThreads > 1 && WorkLeft > 4) {
+        NumTasks = std::min(WorkLeft, unsigned(MaxChunkThreads));
+
----------------
`NumTasks` is originally assigned `MaxChunkThreads * 2` but here it is overwritten again with a different value?

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:236
+        // corresponding Result element.
+        for (unsigned J = 0; J < NumTasks; ++J) {
+          Results[J] = "";
----------------
Why not leave the number of tasks up to `ThreadPoolStrategy::apply_thread_strategy`?

Even if `--max-chunk-threads` is larger, the current approach is limited to what `ThreadPoolStrategy::compute_thread_count()` chooses.

[suggestion] Use `ThreadPool::async` `MaxChunkThreads` times and wait for the first one in the queue. If that one finishes, add another job using `ThreadPool::async` until either one successfully reduces or we are out of chunks still considered interesting.

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:249
+
+            auto Res = CheckChunk(*(I + J), std::move(CloneMMM));
+            if (Res) {
----------------
We have `Result` and `Res` in the same scope. Better names?

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:278
+        if (!Result)
+          I += NumTasks - 1;
+      } else {
----------------
There are two places where `I` is incremented and far apart. Suggestion:
```
for (auto I = ChunksStillConsideredInteresting.rbegin(),
              E = ChunksStillConsideredInteresting.rend();
         I != E; ) {
  bool UseThreadPool = MaxChunkThreads > 1 && WorkLeft > 4;
  int WorkItemsInThisJob = UseThreadPool ? WorkLeft : 1;
  ...
  I += WorkItemsInThisJob;
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113857/new/

https://reviews.llvm.org/D113857