[PATCH] D34784: [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading

Tue Jul 11 08:44:47 PDT 2017

hfinkel added a comment.

This is much closer to what I had in mind, thanks! Now I think we're in a position to make this work for more than just the CUDA target. It looks like the added code is now:

1. Remove -march from the translated arguments (because any existing -march would apply only for the host compilation).
2. Process the -Xopenmp-target flags and add those arguments to the list.
3. If we don't have an -march in the translated arguments, then add -march=sm_20 so that there's a suitable default (noting that this default must be higher than the regular CUDA default).

I propose the following:

(1) is good, but should be more general. It is not just the host's -march that should not apply to any arbitrary toolchain, but any of the -m<foo> options. You should remove all options that are in the m_Group options group (which, as noted in Options.td, are "Target-dependent compilation options"). I believe that you can iterate over them all using something like:

  for (const Arg *A : Args.filtered(options::OPT_m_Group)) {

and that might help. This should be in toolchain-independent code, and I'd prefer that we always remove these options whenever the host and target toolchain differ, but leave them when they're the same.

(2) is good, but, along with (1), should be in toolchain-independent code. I recommend that we add a new member function to ToolChain, called, to make a specific suggestion, TranslateOpenMPTargetArgs, and put the logic from (1) and (2) in  this function. Then, we can augment Compilation::getArgsForToolChain to do something like the following:

  const DerivedArgList &
  Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,
                                   Action::OffloadKind DeviceOffloadKind) {
    if (!TC)
      TC = &DefaultToolChain;

    DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
    if (!Entry) {
      // First, translate OpenMP toolchain arguments provided via the -Xopenmp-toolchain flags.
      Entry = TranslateOpenMPTargetArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
      if (!Entry)
        Entry = TranslatedArgs;

      Entry = TC->TranslateArgs(*Entry, BoundArch, DeviceOffloadKind);
      if (!Entry)
        Entry = TranslatedArgs;
    }

    return *Entry;
  }

And then (3) we leave as it is (where it is).

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:480
+    if (MArchList.empty())
+      // Default compute capability for CUDA toolchain is sm_20.
+      DAL->AddJoinedArg(nullptr,
----------------
This default is only for OpenMP, right? Please explain in the comment why this is the default for OpenMP.

https://reviews.llvm.org/D34784