[PATCH] D9888: [OPENMP] Driver support for OpenMP offloading

Tue Mar 22 16:50:37 PDT 2016

echristo added a comment.

First I'd like to note that the code quality here is really high, most of my comments are higher level design decisions going with the driver and the implementation here rather than that.

One meta comment: offload appears to be something that could be used for CUDA and OpenMP (and OpenACC etc) as a term. I think we should either merge these concepts or pick a different name :)

Thanks for all of your work and patience here! The rest of the comments are inline.

-eric


================
Comment at: include/clang/Driver/Driver.h:210-213
@@ +209,6 @@
+  /// owns all the ToolChain objects stored in it, and will clean them up when
+  /// torn down. We use a different cache for offloading as it is possible to
+  /// have offloading toolchains with the same triple the host has, and the
+  /// implementation has to differentiate the two in order to adjust the
+  /// commands for offloading.
+  mutable llvm::StringMap<ToolChain *> OffloadToolChains;
----------------
Example?

================
Comment at: include/clang/Driver/Driver.h:216-217
@@ +215,4 @@
+
+  /// \brief Array of the toolchains of offloading targets in the order they
+  /// were requested by the user.
+  SmallVector<const ToolChain *, 4> OrderedOffloadingToolchains;
----------------
Any reason?

================
Comment at: include/clang/Driver/Driver.h:427-435
@@ -383,10 +426,11 @@
   /// action \p A.
   void BuildJobsForAction(Compilation &C,
                           const Action *A,
                           const ToolChain *TC,
                           const char *BoundArch,
                           bool AtTopLevel,
                           bool MultipleArchs,
                           const char *LinkingOutput,
-                          InputInfo &Result) const;
+                          InputInfo &Result,
+                          OffloadingHostResultsTy &OffloadingHostResults) const;
 
----------------
This function is starting to get a little silly. Perhaps we should look into refactoring such that this doesn't need to be "the one function that rules them all". Perhaps a different ownership model for the things that are arguments here?

================
Comment at: lib/Driver/Compilation.cpp:66-67
@@ +65,4 @@
+
+    // Check if there is any offloading specific translation to do.
+    DerivedArgList *OffloadArgs = TC->TranslateOffloadArgs(*Entry, BoundArch);
+    if (OffloadArgs) {
----------------
Hmm?

================
Comment at: lib/Driver/Driver.cpp:224-225
@@ +223,4 @@
+
+/// \brief Dump the job bindings for a given action.
+static void DumpJobBindings(ArrayRef<const ToolChain *> TCs, StringRef ToolName,
+                            ArrayRef<InputInfo> Inputs,
----------------
This can probably be done separately? Can you split this out and make it generally useful?

================
Comment at: lib/Driver/Driver.cpp:2045-2051
@@ -1739,11 +2044,9 @@
     // checking the backend tool, check if the tool for the CompileJob
-    // has an integrated assembler.
-    const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
-    // Compile job may be wrapped in CudaHostAction, extract it if
-    // that's the case and update CollapsedCHA if we combine phases.
-    CudaHostAction *CHA = dyn_cast<CudaHostAction>(*BackendInputs->begin());
-    JobAction *CompileJA =
-        cast<CompileJobAction>(CHA ? *CHA->begin() : *BackendInputs->begin());
-    assert(CompileJA && "Backend job is not preceeded by compile job.");
-    const Tool *Compiler = TC->SelectTool(*CompileJA);
-    if (!Compiler)
+    // has an integrated assembler. However, if OpenMP offloading is required
+    // the backend and compile jobs have to be kept separate and an integrated
+    // assembler of the backend job will be queried instead.
+    JobAction *CurJA = cast<BackendJobAction>(*Inputs->begin());
+    const ActionList *BackendInputs = &CurJA->getInputs();
+    CudaHostAction *CHA = nullptr;
+    if (!RequiresOpenMPOffloading(TC)) {
+      // Compile job may be wrapped in CudaHostAction, extract it if
----------------
Might be time to make some specialized versions of this function. This may take it from "ridiculously confusing" to "code no one should ever look at" :)

================
Comment at: lib/Driver/Tools.cpp:6032
@@ +6031,3 @@
+  // The (un)bundling command looks like this:
+  // clang-offload-bundler -type=bc
+  //   -omptargets=host-triple,device-triple1,device-triple2
----------------
Should we get the offload bundler in first so that the interface is there and testable? (Honest question, no particular opinion here). Though the command lines there will affect how this code is written. 

================
Comment at: test/OpenMP/target_driver.c:41-47
@@ +40,9 @@
+
+// CHK-PHASES-LIB-DAG: {{.*}}: linker, {[[L0:[0-9]+]], [[A0:[0-9]+]]}, image
+// CHK-PHASES-LIB-DAG: [[A0]]: assembler, {[[A1:[0-9]+]]}, object
+// CHK-PHASES-LIB-DAG: [[A1]]: backend, {[[A2:[0-9]+]]}, assembler
+// CHK-PHASES-LIB-DAG: [[A2]]: compiler, {[[A3:[0-9]+]]}, ir
+// CHK-PHASES-LIB-DAG: [[A3]]: preprocessor, {[[I:[0-9]+]]}, cpp-output
+// CHK-PHASES-LIB-DAG: [[I]]: input, {{.*}}, c
+// CHK-PHASES-LIB-DAG: [[L0]]: input, "m", object
+
----------------
Do we really think the phases should be a DAG check?

================
Comment at: test/OpenMP/target_driver.c:54
@@ +53,3 @@
+// RUN:   echo 'bla' > %t.o
+// RUN:   %clang -ccc-print-phases -lm -fopenmp=libomp -target powerpc64-ibm-linux-gnu -omptargets=x86_64-pc-linux-gnu,powerpc64-ibm-linux-gnu %s %t.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-PHASES-OBJ %s
----------------
How do you pass options to individual omptargets? e.g. -mvsx or -mavx2?


http://reviews.llvm.org/D9888