[PATCH] D69582: Let clang driver support parallel jobs

Tue Oct 29 14:38:53 PDT 2019

tra added a reviewer: echristo.
tra added a subscriber: echristo.
tra added a comment.

@echristo Eric, any thoughts/concerns on the overall direction for the driver?

@yaxunl One concern I have is diagnostics. When the jobs are executed in parallel, I assume all of them will go to the standard error and will be interleaved randomly across all parallel compilations. Figuring out what went wrong will be hard. Ideally we may want to collect output from individual sub-commands and print them once the job has finished, so there's no confusion about the source of the error.

> It is observed that device code compilation takes most of the compilation time when
>  clang compiles CUDA/HIP programs since device code usually contains complicated
>  computation code. Often times such code are highly coupled, which results in
>  a few large source files which become bottlenecks of a whole project. Things become
>  worse when such code is compiled with multiple gpu archs, since clang compiles for
>  each gpu arch sequentially. In practice, it is common to compile for more than 5 gpu
>  archs.

I think this change will only help with relatively small local builds with few relatively large CUDA/HIP files. We did talk internally about parallelizing CUDA builds in the past and came to the conclusion that it's not very useful in practice, at least for us. We have a lot of non-CUDA files to compile, too, and that usually provides enough work for the build to hide the long CUDA compilations. Distributed builds (and I guess local, too) often assume one compilation per CPU, so launching multiple parallel subcompilations for each top-level job may be not that helpful in practice beyond manual compilation of one file. That said, the change will be a nice improvement for quick rebuilds where only one/few CUDA files need to be recompiled. However, in that case being able to get  comprehensible error messages would also be very important.

Overall I'm on the fence about this change. It may be more trouble than it's worth.

================
Comment at: clang/include/clang/Driver/Job.h:77
+  /// Dependent actions
+  llvm::SmallVector<const Action *, 4> DependentActions;
+
----------------
Nit: Using pointer as a key will result in sub-compilations being done in different order from run to run and that may result in build results changing from run to run.

I can't think of a realistic scenario yet. One case where it may make a difference is generation of dependency file.
We currently leak some output file name flags to device-side compilations. E.g. `-fsyntax-only -MD -MF foo.d`  will write foo.d for each compilation.  At best we'll end up with the result of whichever sub-compilation finished last. At worst we'll end up with corrupt output. In this case it's the output argument leak that's the problem, but I suspect there may be other cases where execution order will be observable.

================
Comment at: clang/lib/Driver/Compilation.cpp:284-286
+      }
+      }
+    }
----------------
Indentation seems to be off. Run through clang-format?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69582/new/

https://reviews.llvm.org/D69582