[PATCH] D113857: [llvm-reduce] Add parallel chunk processing.

Thu Nov 18 15:45:14 PST 2021

Meinersbur added inline comments.

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:183
+// Used to communicate when a task reduced a chunk.
+static std::atomic<bool> AnyReduced;
+
----------------
[Not a change request] To avoid global variables, did you consider making `AnyReduced` inside `runDeltaPassInt` and `ProcessChunkFromSerializedBitcode` a lambda?

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:186
+template <typename T>
+SmallString<0> ProcessChunkFromSerializedBitcode(
+    Chunk &ChunkToCheckForUninterestingness, TestRunner &Test,
----------------
[Not a change request] Why prefer `SmallString<0>` over `std::string`?

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:306
+        //  * we have not reached the end of the chunk.
+        while (!TaskQueue.empty()) {
+          auto &F = TaskQueue.front();
----------------
fhahn wrote:
> Meinersbur wrote:
> > fhahn wrote:
> > > Meinersbur wrote:
> > > > This waits until the task queue is empty. Could we leave earlier as soon as a reduced version is available, and just forget the jobs not yet finished while scheduling the next ones?
> > > The current approach waits for each task, until it finds the first one that leads to a reduction. 
> > > 
> > > 
> > > It intentionally does not try to stop once any job reduces a chunk, because if we pick any reduced chunk we might get different results than during serial reduction.
> > Assume the current content of TaskQueue
> > 
> > 0: job finished, no valid reduction
> > 1: job still running
> > 2: job finished, valid reduction result
> > 3: job still running
> > 
> > I understood that the intention of AnyReduced is to not queue additional jobs (assuming `NumJobs > 4`) since we know that when hitting 2, we will have a reduced result. We still need to wait for job 1 which may also compute a valid reduction.
> > I think this is clever. I had `std::future<ResultTy>` in mind as you proposed yourself, until I discovered that `llvm::ThreadPool` only supports a void return type :-(. However, to avoid submitting more jobs after we know that there will a result available, I think a `AnyReduced` would still be useful flag.
> > 
> > My comment was regarding index 3 in the queue. Assuming job 1 finishes and job 2 becomes the front of the queue, do we still need to wait for job 3 before submitting jobs based on the new intermediate result? It may cause issues with who becomes responsible to free resources, so I am not sure its feasible.
> I updated the code to communicate the result via the `shared_future`. This depends on D114183 now.
> 
> > My comment was regarding index 3 in the queue. Assuming job 1 finishes and job 2 becomes the front of the queue, do we still need to wait for job 3 before submitting jobs based on the new intermediate result? It may cause issues with who becomes responsible to free resources, so I am not sure its feasible.
> 
> Oh I see, you are thinking about submitting jobs already with the updated state? At the moment I am not really sure what kind of issues that would bring and if it is feasible. I think it would be good to try to keep the initial version as simple as possible.
Agreed. A TODO in the code could mention this

================
Comment at: llvm/tools/llvm-reduce/deltas/Delta.cpp:236
+        // corresponding Result element.
+        for (unsigned J = 0; J < NumTasks; ++J) {
+          Results[J] = "";
----------------
fhahn wrote:
> Meinersbur wrote:
> > fhahn wrote:
> > > Meinersbur wrote:
> > > > Why not leave the number of tasks up to `ThreadPoolStrategy::apply_thread_strategy`?
> > > > 
> > > > Even if `--max-chunk-threads` is larger, the current approach is limited to what `ThreadPoolStrategy::compute_thread_count()` chooses.
> > > > 
> > > > [suggestion] Use `ThreadPool::async` `MaxChunkThreads` times and wait for the first one in the queue. If that one finishes, add another job using `ThreadPool::async` until either one successfully reduces or we are out of chunks still considered interesting.
> > > > Why not leave the number of tasks up to ThreadPoolStrategy::apply_thread_strategy?
> > > 
> > > Do you know how to do this? It seems like the thread pool constructor expects a strategy to be passed in.
> > > 
> > > > [suggestion] Use ThreadPool::async MaxChunkThreads times and wait for the first one in the queue. If that one finishes, add another job using ThreadPool::async until either one successfully reduces or we are out of chunks still considered interesting.
> > > 
> > > That sounds like a good idea, thanks! I updated the code to work along those lines, unless I missed something. I also added a shared variable to indicate whether *any* task already reduced something, to avoid adding new jobs in that case.
> > > 
> > > This approach is slightly faster than the original patch for reducing `llvm/test/tools/llvm-reduce/operands-skip.ll` in parallel.
> > > That sounds like a good idea, thanks! I updated the code to work along those lines, unless I missed something.
> > 
> > Nice! Thanks!
> > 
> > >>Why not leave the number of tasks up to ThreadPoolStrategy::apply_thread_strategy?
> > > Do you know how to do this? It seems like the thread pool constructor expects a strategy to be passed in.
> > 
> > I thought then we could override `apply_thread_strategy` to set either `NumThreads` or, if not number of threads is specified, call the inherited methods. But it's not-virtual and your approach `hardware_concurrency(NumJobs)` does about the same thing.
> > 
> > To get the number of effective jobs, we could call `compute_thread_count` of `ThreadPoolStrategy`.  This still applies if explicit with `hardware_concurrency(NumJobs)` since `compute_thread_count` caps it by the number of hardware threads.
> > 
> > Currently `-j<n>` requires the user to specify a number of jobs, but if we want have a heuristic, we could set `UseHyperThreads = false` without setting `ThreadsRequested`. This heuristically-determined number of jobs is what we might cap at 32. Since I am not sure this is a good heuristic, we may keep requiring the user to specify the number of jobs.
> > 
> > I'd remove the `+2` for `NumInitialTasks` as by `hardware_concurrency(NumJobs)`, there will not be more threads created for those two additional jobs.
> > Since I am not sure this is a good heuristic, we may keep requiring the user to specify the number of jobs.
> 
> I think for now requiring a user specified value is the easiest.
> 
> > I'd remove the +2 for NumInitialTasks as by hardware_concurrency(NumJobs), there will not be more threads created for those two additional jobs.
> 
> The main intention was to have. a few more jobs to fill threads when jobs exit earlier, but I think it could do more bad than good. I removed it
Agreed (to both)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113857/new/

https://reviews.llvm.org/D113857