[PATCH] [Core] Add parallel infrastructure to lld.

Tue Apr 9 17:16:29 PDT 2013

================
Comment at: include/lld/Core/Parallel.h:68-69
@@ +67,4 @@
+public:
+  ThreadPoolExecutor(std::size_t threadCount =
+                         std::thread::hardware_concurrency())
+      : _stop(false), _done(threadCount) {
----------------
Michael Spencer wrote:
> Shankar Kalpathi Easwaran wrote:
> > I think size_t is pretty big for a threadcount.
> I'll change it to the return type of hardware_concurrency (unsigned).
The core count is probably not a good choice for the default, since then you risk getting blocked waiting on IO.

Since LLD uses memory-mapped IO, there's no way to do it "async" since the IO will be incurred when you take a major fault (which is transparent to the program). So, for tasks that require IO, you really need as many threads as parallel IO's that you are going to be doing. Another alternative is careful use of madvise(), but I don't think LLD is currently introspective enough about its access patterns to be able to effectively do that at this point.

For compute-intensive tasks though, core count is probably fine. So I would discourage having a this take a default. Only the client of the API really knows how much parallelism they need, and they should specify it explicitly.

================
Comment at: include/lld/Core/Parallel.h:198
@@ +197,3 @@
+  // Do a sequential sort for small inputs.
+  if (std::distance(start, end) < detail::minParallelSize || depth > maxDepth) {
+    std::sort(start, end, comp);
----------------
This algorithm can go quadratic. You should fall back to std::sort if the depth becomes larger than logarithmic in the length of the range to be sorted to ensure n*log(n). Also, it would make sense to fall back to std::sort if the depth gets greater than log2(coreCount) to avoid increasing the parallelism beyond the number of cores.

================
Comment at: include/lld/Core/Parallel.h:213-219
@@ +212,9 @@
+
+  // Recurse.
+  tg.spawn([=, &tg] {
+    parallel_quick_sort(start, pivot, comp, tg, depth + 1);
+  });
+  tg.spawn([=, &tg] {
+    parallel_quick_sort(pivot + 1, end, comp, tg, depth + 1);
+  });
+}
----------------
You can "tail call" into one of these.

================
Comment at: include/lld/Core/Parallel.h:194-196
@@ +193,5 @@
+
+template <class RandomAccessIterator, class Comp>
+void parallel_quick_sort(RandomAccessIterator start, RandomAccessIterator end,
+                         const Comp &comp, TaskGroup &tg, size_t depth = 0) {
+  // Do a sequential sort for small inputs.
----------------
Have you benchmarked this? I'm curious what the cutoff is before this starts beating std::sort. (especially libcxx's std::sort). I wouldn't be surprised if the cutoff is *very* large.

http://llvm-reviews.chandlerc.com/D649