[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

Wed Jul 5 09:05:50 PDT 2017

gtbercea added inline comments.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:443
+
+    // Get the compute capability from the -fopenmp-targets flag.
+    // The default compute capability is sm_20 since this is a CUDA
----------------
hfinkel wrote:
> Is this first sentence accurate?
Fixed. It should be -Xopenmp-target

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:446
+    // tool chain.
+    auto OptList = Args.getAllArgValues(options::OPT_Xopenmp_target_EQ);
+
----------------
hfinkel wrote:
> Why is this logic in this function? Don't you need the same logic in Generic_GCC::TranslateArgs to handle non-CUDA offloading toolchains?
> 
I would imagine that each toolchain needs to parse the list of flags since, given a toolchain, the flag may need to be passed to more than one tool and different tools may require different flags for passing the same information.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:474
+    for (StringRef Opt : OptList) {
+      AddMArchOption(DAL, Opts, Opt);
+    }
----------------
hfinkel wrote:
> Shouldn't you be adding all of the options, not just the -march= ones?
I thought that that would be the case but there are a few issues:

1. PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each flag is different from -march.

2. -Xopenmp-target passes a flag to the entire toolchain not to individual components of the toolchain so a translation of flags is required in some cases to adapt the flag to the actual tool. -march is one example, I'm not sure if there are others.

3. At this point in the code, in order to add a flag and its value to the DAL list, I need to be able to specify the option type (i.e. options::OPT_march_EQ). I therefore need to manually recognize the flag in the string representing the value of -Xopenmp-target or -Xopenmp-target=triple.

4. This patch handles the passing of the arch and can be extended to pass other flags (as is stands, no other flags are passed through to the CUDA toolchain). This can be extended on a flag by flag basis for flags that need translating to a particular tool's flag. If the flag doesn't need translating then the flag and it's value can be appended to the command line as they are.

https://reviews.llvm.org/D34784