r275645 - [CUDA][OpenMP] Create generic offload action

Mon Jul 18 16:27:06 PDT 2016

On 18 Jul 2016 4:03 p.m., "Samuel F Antao" <sfantao at us.ibm.com> wrote:
>
> Hi Richard,
>
> I agree, I don't think the second `addExtraOffloadCXXStdlibIncludeArgs`
is required. When I did this change my focus was to maintain functionality
of the existing code. I can confirm that removing that passes the existent
tests successfully. It is possible, however, there is some use case for the
existing CUDA implementation that requires C++ include paths to be included
for non C++  input types?
>
> Art, Justin can you confirm that is the case? If not, should I go ahead
and remove the duplicated code?

I don't think that's the right fix; we should presumably be adding the C
system include paths here, as the line above does for the normal toolchain.

> Thanks!
> Samuel
>
> On Mon, Jul 18, 2016 at 5:45 PM, Richard Smith via cfe-commits <
cfe-commits at lists.llvm.org> wrote:
>>
>>
>>
>> On Fri, Jul 15, 2016 at 4:13 PM, Samuel Antao via cfe-commits <
cfe-commits at lists.llvm.org> wrote:
>>>
>>> Author: sfantao
>>> Date: Fri Jul 15 18:13:27 2016
>>> New Revision: 275645
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=275645&view=rev
>>> Log:
>>> [CUDA][OpenMP] Create generic offload action
>>>
>>> Summary:
>>> This patch replaces the CUDA specific action by a generic offload
action. The offload action may have multiple dependences classier in “host”
and “device”. The way this generic offloading action is used is very
similar to what is done today by the CUDA implementation: it is used to set
a specific toolchain and architecture to its dependences during the
generation of jobs.
>>>
>>> This patch also proposes propagating the offloading information through
the action graph so that that information can be easily retrieved at any
time during the generation of commands. This allows e.g. the "clang tool”
to evaluate whether CUDA should be supported for the device or host and
ptas to easily retrieve the target architecture.
>>>
>>> This is an example of how the action graphs would look like
(compilation of a single CUDA file with two GPU architectures)
>>> ```
>>> 0: input, "cudatests.cu", cuda, (host-cuda)
>>> 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> 2: compiler, {1}, ir, (host-cuda)
>>> 3: input, "cudatests.cu", cuda, (device-cuda, sm_35)
>>> 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_35)
>>> 5: compiler, {4}, ir, (device-cuda, sm_35)
>>> 6: backend, {5}, assembler, (device-cuda, sm_35)
>>> 7: assembler, {6}, object, (device-cuda, sm_35)
>>> 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {7}, object
>>> 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {6}, assembler
>>> 10: input, "cudatests.cu", cuda, (device-cuda, sm_37)
>>> 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_37)
>>> 12: compiler, {11}, ir, (device-cuda, sm_37)
>>> 13: backend, {12}, assembler, (device-cuda, sm_37)
>>> 14: assembler, {13}, object, (device-cuda, sm_37)
>>> 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {14}, object
>>> 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {13}, assembler
>>> 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>> 18: offload, "host-cuda (powerpc64le-unknown-linux-gnu)" {2},
"device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>> 19: backend, {18}, assembler
>>> 20: assembler, {19}, object
>>> 21: input, "cuda", object
>>> 22: input, "cudart", object
>>> 23: linker, {20, 21, 22}, image
>>> ```
>>> The changes in this patch pass the existent regression tests (keeps the
existent functionality) and resulting binaries execute correctly in a
Power8+K40 machine.
>>>
>>> Reviewers: echristo, hfinkel, jlebar, ABataev, tra
>>>
>>> Subscribers: guansong, andreybokhanko, tcramer, mkuron, cfe-commits,
arpith-jacob, carlo.bertolli, caomhin
>>>
>>> Differential Revision: https://reviews.llvm.org/D18171
>>>
>>> Added:
>>>     cfe/trunk/test/Driver/cuda_phases.cu
>>> Modified:
>>>     cfe/trunk/include/clang/Driver/Action.h
>>>     cfe/trunk/include/clang/Driver/Compilation.h
>>>     cfe/trunk/include/clang/Driver/Driver.h
>>>     cfe/trunk/lib/Driver/Action.cpp
>>>     cfe/trunk/lib/Driver/Driver.cpp
>>>     cfe/trunk/lib/Driver/ToolChain.cpp
>>>     cfe/trunk/lib/Driver/Tools.cpp
>>>     cfe/trunk/lib/Driver/Tools.h
>>>     cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>
>>> Modified: cfe/trunk/include/clang/Driver/Action.h
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Action.h?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/include/clang/Driver/Action.h (original)
>>> +++ cfe/trunk/include/clang/Driver/Action.h Fri Jul 15 18:13:27 2016
>>> @@ -13,6 +13,7 @@
>>>  #include "clang/Basic/Cuda.h"
>>>  #include "clang/Driver/Types.h"
>>>  #include "clang/Driver/Util.h"
>>> +#include "llvm/ADT/STLExtras.h"
>>>  #include "llvm/ADT/SmallVector.h"
>>>
>>>  namespace llvm {
>>> @@ -27,6 +28,8 @@ namespace opt {
>>>  namespace clang {
>>>  namespace driver {
>>>
>>> +class ToolChain;
>>> +
>>>  /// Action - Represent an abstract compilation step to perform.
>>>  ///
>>>  /// An action represents an edge in the compilation graph; typically
>>> @@ -50,8 +53,7 @@ public:
>>>    enum ActionClass {
>>>      InputClass = 0,
>>>      BindArchClass,
>>> -    CudaDeviceClass,
>>> -    CudaHostClass,
>>> +    OffloadClass,
>>>      PreprocessJobClass,
>>>      PrecompileJobClass,
>>>      AnalyzeJobClass,
>>> @@ -65,17 +67,13 @@ public:
>>>      VerifyDebugInfoJobClass,
>>>      VerifyPCHJobClass,
>>>
>>> -    JobClassFirst=PreprocessJobClass,
>>> -    JobClassLast=VerifyPCHJobClass
>>> +    JobClassFirst = PreprocessJobClass,
>>> +    JobClassLast = VerifyPCHJobClass
>>>    };
>>>
>>>    // The offloading kind determines if this action is binded to a
particular
>>>    // programming model. Each entry reserves one bit. We also have a
special kind
>>>    // to designate the host offloading tool chain.
>>> -  //
>>> -  // FIXME: This is currently used to indicate that tool chains are
used in a
>>> -  // given programming, but will be used here as well once a generic
offloading
>>> -  // action is implemented.
>>>    enum OffloadKind {
>>>      OFK_None = 0x00,
>>>      // The host offloading tool chain.
>>> @@ -95,6 +93,19 @@ private:
>>>    ActionList Inputs;
>>>
>>>  protected:
>>> +  ///
>>> +  /// Offload information.
>>> +  ///
>>> +
>>> +  /// The host offloading kind - a combination of kinds encoded in a
mask.
>>> +  /// Multiple programming models may be supported simultaneously by
the same
>>> +  /// host.
>>> +  unsigned ActiveOffloadKindMask = 0u;
>>> +  /// Offloading kind of the device.
>>> +  OffloadKind OffloadingDeviceKind = OFK_None;
>>> +  /// The Offloading architecture associated with this action.
>>> +  const char *OffloadingArch = nullptr;
>>> +
>>>    Action(ActionClass Kind, types::ID Type) : Action(Kind,
ActionList(), Type) {}
>>>    Action(ActionClass Kind, Action *Input, types::ID Type)
>>>        : Action(Kind, ActionList({Input}), Type) {}
>>> @@ -124,6 +135,40 @@ public:
>>>    input_const_range inputs() const {
>>>      return input_const_range(input_begin(), input_end());
>>>    }
>>> +
>>> +  /// Return a string containing the offload kind of the action.
>>> +  std::string getOffloadingKindPrefix() const;
>>> +  /// Return a string that can be used as prefix in order to generate
unique
>>> +  /// files for each offloading kind.
>>> +  std::string getOffloadingFileNamePrefix(StringRef NormalizedTriple)
const;
>>> +
>>> +  /// Set the device offload info of this action and propagate it to
its
>>> +  /// dependences.
>>> +  void propagateDeviceOffloadInfo(OffloadKind OKind, const char
*OArch);
>>> +  /// Append the host offload info of this action and propagate it to
its
>>> +  /// dependences.
>>> +  void propagateHostOffloadInfo(unsigned OKinds, const char *OArch);
>>> +  /// Set the offload info of this action to be the same as the
provided action,
>>> +  /// and propagate it to its dependences.
>>> +  void propagateOffloadInfo(const Action *A);
>>> +
>>> +  unsigned getOffloadingHostActiveKinds() const {
>>> +    return ActiveOffloadKindMask;
>>> +  }
>>> +  OffloadKind getOffloadingDeviceKind() const { return
OffloadingDeviceKind; }
>>> +  const char *getOffloadingArch() const { return OffloadingArch; }
>>> +
>>> +  /// Check if this action have any offload kinds. Note that host
offload kinds
>>> +  /// are only set if the action is a dependence to a host offload
action.
>>> +  bool isHostOffloading(OffloadKind OKind) const {
>>> +    return ActiveOffloadKindMask & OKind;
>>> +  }
>>> +  bool isDeviceOffloading(OffloadKind OKind) const {
>>> +    return OffloadingDeviceKind == OKind;
>>> +  }
>>> +  bool isOffloading(OffloadKind OKind) const {
>>> +    return isHostOffloading(OKind) || isDeviceOffloading(OKind);
>>> +  }
>>>  };
>>>
>>>  class InputAction : public Action {
>>> @@ -156,39 +201,126 @@ public:
>>>    }
>>>  };
>>>
>>> -class CudaDeviceAction : public Action {
>>> +/// An offload action combines host or/and device actions according to
the
>>> +/// programming model implementation needs and propagates the
offloading kind to
>>> +/// its dependences.
>>> +class OffloadAction final : public Action {
>>>    virtual void anchor();
>>>
>>> -  const CudaArch GpuArch;
>>> -
>>> -  /// True when action results are not consumed by the host action
(e.g when
>>> -  /// -fsyntax-only or --cuda-device-only options are used).
>>> -  bool AtTopLevel;
>>> -
>>>  public:
>>> -  CudaDeviceAction(Action *Input, CudaArch Arch, bool AtTopLevel);
>>> +  /// Type used to communicate device actions. It associates bound
architecture,
>>> +  /// toolchain, and offload kind to each action.
>>> +  class DeviceDependences final {
>>> +  public:
>>> +    typedef SmallVector<const ToolChain *, 3> ToolChainList;
>>> +    typedef SmallVector<const char *, 3> BoundArchList;
>>> +    typedef SmallVector<OffloadKind, 3> OffloadKindList;
>>> +
>>> +  private:
>>> +    // Lists that keep the information for each dependency. All the
lists are
>>> +    // meant to be updated in sync. We are adopting separate lists
instead of a
>>> +    // list of structs, because that simplifies forwarding the actions
list to
>>> +    // initialize the inputs of the base Action class.
>>> +
>>> +    /// The dependence actions.
>>> +    ActionList DeviceActions;
>>> +    /// The offloading toolchains that should be used with the action.
>>> +    ToolChainList DeviceToolChains;
>>> +    /// The architectures that should be used with this action.
>>> +    BoundArchList DeviceBoundArchs;
>>> +    /// The offload kind of each dependence.
>>> +    OffloadKindList DeviceOffloadKinds;
>>> +
>>> +  public:
>>> +    /// Add a action along with the associated toolchain, bound arch,
and
>>> +    /// offload kind.
>>> +    void add(Action &A, const ToolChain &TC, const char *BoundArch,
>>> +             OffloadKind OKind);
>>> +
>>> +    /// Get each of the individual arrays.
>>> +    const ActionList &getActions() const { return DeviceActions; };
>>> +    const ToolChainList &getToolChains() const { return
DeviceToolChains; };
>>> +    const BoundArchList &getBoundArchs() const { return
DeviceBoundArchs; };
>>> +    const OffloadKindList &getOffloadKinds() const {
>>> +      return DeviceOffloadKinds;
>>> +    };
>>> +  };
>>>
>>> -  /// Get the CUDA GPU architecture to which this Action corresponds.
Returns
>>> -  /// UNKNOWN if this Action corresponds to multiple architectures.
>>> -  CudaArch getGpuArch() const { return GpuArch; }
>>> +  /// Type used to communicate host actions. It associates bound
architecture,
>>> +  /// toolchain, and offload kinds to the host action.
>>> +  class HostDependence final {
>>> +    /// The dependence action.
>>> +    Action &HostAction;
>>> +    /// The offloading toolchain that should be used with the action.
>>> +    const ToolChain &HostToolChain;
>>> +    /// The architectures that should be used with this action.
>>> +    const char *HostBoundArch = nullptr;
>>> +    /// The offload kind of each dependence.
>>> +    unsigned HostOffloadKinds = 0u;
>>> +
>>> +  public:
>>> +    HostDependence(Action &A, const ToolChain &TC, const char
*BoundArch,
>>> +                   const unsigned OffloadKinds)
>>> +        : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch),
>>> +          HostOffloadKinds(OffloadKinds){};
>>> +    /// Constructor version that obtains the offload kinds from the
device
>>> +    /// dependencies.
>>> +    HostDependence(Action &A, const ToolChain &TC, const char
*BoundArch,
>>> +                   const DeviceDependences &DDeps);
>>> +    Action *getAction() const { return &HostAction; };
>>> +    const ToolChain *getToolChain() const { return &HostToolChain; };
>>> +    const char *getBoundArch() const { return HostBoundArch; };
>>> +    unsigned getOffloadKinds() const { return HostOffloadKinds; };
>>> +  };
>>>
>>> -  bool isAtTopLevel() const { return AtTopLevel; }
>>> +  typedef llvm::function_ref<void(Action *, const ToolChain *, const
char *)>
>>> +      OffloadActionWorkTy;
>>>
>>> -  static bool classof(const Action *A) {
>>> -    return A->getKind() == CudaDeviceClass;
>>> -  }
>>> -};
>>> +private:
>>> +  /// The host offloading toolchain that should be used with the
action.
>>> +  const ToolChain *HostTC = nullptr;
>>>
>>> -class CudaHostAction : public Action {
>>> -  virtual void anchor();
>>> -  ActionList DeviceActions;
>>> +  /// The tool chains associated with the list of actions.
>>> +  DeviceDependences::ToolChainList DevToolChains;
>>>
>>>  public:
>>> -  CudaHostAction(Action *Input, const ActionList &DeviceActions);
>>> -
>>> -  const ActionList &getDeviceActions() const { return DeviceActions; }
>>> +  OffloadAction(const HostDependence &HDep);
>>> +  OffloadAction(const DeviceDependences &DDeps, types::ID Ty);
>>> +  OffloadAction(const HostDependence &HDep, const DeviceDependences
&DDeps);
>>> +
>>> +  /// Execute the work specified in \a Work on the host dependence.
>>> +  void doOnHostDependence(const OffloadActionWorkTy &Work) const;
>>> +
>>> +  /// Execute the work specified in \a Work on each device dependence.
>>> +  void doOnEachDeviceDependence(const OffloadActionWorkTy &Work) const;
>>> +
>>> +  /// Execute the work specified in \a Work on each dependence.
>>> +  void doOnEachDependence(const OffloadActionWorkTy &Work) const;
>>> +
>>> +  /// Execute the work specified in \a Work on each host or device
dependence if
>>> +  /// \a IsHostDependenceto is true or false, respectively.
>>> +  void doOnEachDependence(bool IsHostDependence,
>>> +                          const OffloadActionWorkTy &Work) const;
>>> +
>>> +  /// Return true if the action has a host dependence.
>>> +  bool hasHostDependence() const;
>>> +
>>> +  /// Return the host dependence of this action. This function is only
expected
>>> +  /// to be called if the host dependence exists.
>>> +  Action *getHostDependence() const;
>>> +
>>> +  /// Return true if the action has a single device dependence. If \a
>>> +  /// DoNotConsiderHostActions is set, ignore the host dependence, if
any, while
>>> +  /// accounting for the number of dependences.
>>> +  bool hasSingleDeviceDependence(bool DoNotConsiderHostActions =
false) const;
>>> +
>>> +  /// Return the single device dependence of this action. This
function is only
>>> +  /// expected to be called if a single device dependence exists. If \a
>>> +  /// DoNotConsiderHostActions is set, a host dependence is allowed.
>>> +  Action *
>>> +  getSingleDeviceDependence(bool DoNotConsiderHostActions = false)
const;
>>>
>>> -  static bool classof(const Action *A) { return A->getKind() ==
CudaHostClass; }
>>> +  static bool classof(const Action *A) { return A->getKind() ==
OffloadClass; }
>>>  };
>>>
>>>  class JobAction : public Action {
>>>
>>> Modified: cfe/trunk/include/clang/Driver/Compilation.h
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Compilation.h?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/include/clang/Driver/Compilation.h (original)
>>> +++ cfe/trunk/include/clang/Driver/Compilation.h Fri Jul 15 18:13:27
2016
>>> @@ -98,12 +98,7 @@ public:
>>>    const Driver &getDriver() const { return TheDriver; }
>>>
>>>    const ToolChain &getDefaultToolChain() const { return
DefaultToolChain; }
>>> -  const ToolChain *getOffloadingHostToolChain() const {
>>> -    auto It = OrderedOffloadingToolchains.find(Action::OFK_Host);
>>> -    if (It != OrderedOffloadingToolchains.end())
>>> -      return It->second;
>>> -    return nullptr;
>>> -  }
>>> +
>>>    unsigned isOffloadingHostKind(Action::OffloadKind Kind) const {
>>>      return ActiveOffloadMask & Kind;
>>>    }
>>> @@ -121,8 +116,8 @@ public:
>>>      return OrderedOffloadingToolchains.equal_range(Kind);
>>>    }
>>>
>>> -  // Return an offload toolchain of the provided kind. Only one is
expected to
>>> -  // exist.
>>> +  /// Return an offload toolchain of the provided kind. Only one is
expected to
>>> +  /// exist.
>>>    template <Action::OffloadKind Kind>
>>>    const ToolChain *getSingleOffloadToolChain() const {
>>>      auto TCs = getOffloadToolChains<Kind>();
>>>
>>> Modified: cfe/trunk/include/clang/Driver/Driver.h
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Driver.h?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/include/clang/Driver/Driver.h (original)
>>> +++ cfe/trunk/include/clang/Driver/Driver.h Fri Jul 15 18:13:27 2016
>>> @@ -394,12 +394,13 @@ public:
>>>    /// BuildJobsForAction - Construct the jobs to perform for the
action \p A and
>>>    /// return an InputInfo for the result of running \p A.  Will only
construct
>>>    /// jobs for a given (Action, ToolChain, BoundArch) tuple once.
>>> -  InputInfo BuildJobsForAction(Compilation &C, const Action *A,
>>> -                               const ToolChain *TC, const char
*BoundArch,
>>> -                               bool AtTopLevel, bool MultipleArchs,
>>> -                               const char *LinkingOutput,
>>> -                               std::map<std::pair<const Action *,
std::string>,
>>> -                                        InputInfo> &CachedResults)
const;
>>> +  InputInfo
>>> +  BuildJobsForAction(Compilation &C, const Action *A, const ToolChain
*TC,
>>> +                     const char *BoundArch, bool AtTopLevel, bool
MultipleArchs,
>>> +                     const char *LinkingOutput,
>>> +                     std::map<std::pair<const Action *, std::string>,
InputInfo>
>>> +                         &CachedResults,
>>> +                     bool BuildForOffloadDevice) const;
>>>
>>>    /// Returns the default name for linked images (e.g., "a.out").
>>>    const char *getDefaultImageName() const;
>>> @@ -415,12 +416,11 @@ public:
>>>    /// \param BoundArch - The bound architecture.
>>>    /// \param AtTopLevel - Whether this is a "top-level" action.
>>>    /// \param MultipleArchs - Whether multiple -arch options were
supplied.
>>> -  const char *GetNamedOutputPath(Compilation &C,
>>> -                                 const JobAction &JA,
>>> -                                 const char *BaseInput,
>>> -                                 const char *BoundArch,
>>> -                                 bool AtTopLevel,
>>> -                                 bool MultipleArchs) const;
>>> +  /// \param NormalizedTriple - The normalized triple of the relevant
target.
>>> +  const char *GetNamedOutputPath(Compilation &C, const JobAction &JA,
>>> +                                 const char *BaseInput, const char
*BoundArch,
>>> +                                 bool AtTopLevel, bool MultipleArchs,
>>> +                                 StringRef NormalizedTriple) const;
>>>
>>>    /// GetTemporaryPath - Return the pathname of a temporary file to use
>>>    /// as part of compilation; the file will have the given prefix and
suffix.
>>> @@ -467,7 +467,8 @@ private:
>>>        const char *BoundArch, bool AtTopLevel, bool MultipleArchs,
>>>        const char *LinkingOutput,
>>>        std::map<std::pair<const Action *, std::string>, InputInfo>
>>> -          &CachedResults) const;
>>> +          &CachedResults,
>>> +      bool BuildForOffloadDevice) const;
>>>
>>>  public:
>>>    /// GetReleaseVersion - Parse (([0-9]+)(.([0-9]+)(.([0-9]+)?))?)? and
>>>
>>> Modified: cfe/trunk/lib/Driver/Action.cpp
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Action.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Driver/Action.cpp (original)
>>> +++ cfe/trunk/lib/Driver/Action.cpp Fri Jul 15 18:13:27 2016
>>> @@ -8,6 +8,7 @@
>>>
 //===----------------------------------------------------------------------===//
>>>
>>>  #include "clang/Driver/Action.h"
>>> +#include "clang/Driver/ToolChain.h"
>>>  #include "llvm/ADT/StringSwitch.h"
>>>  #include "llvm/Support/ErrorHandling.h"
>>>  #include "llvm/Support/Regex.h"
>>> @@ -21,8 +22,8 @@ const char *Action::getClassName(ActionC
>>>    switch (AC) {
>>>    case InputClass: return "input";
>>>    case BindArchClass: return "bind-arch";
>>> -  case CudaDeviceClass: return "cuda-device";
>>> -  case CudaHostClass: return "cuda-host";
>>> +  case OffloadClass:
>>> +    return "offload";
>>>    case PreprocessJobClass: return "preprocessor";
>>>    case PrecompileJobClass: return "precompiler";
>>>    case AnalyzeJobClass: return "analyzer";
>>> @@ -40,6 +41,82 @@ const char *Action::getClassName(ActionC
>>>    llvm_unreachable("invalid class");
>>>  }
>>>
>>> +void Action::propagateDeviceOffloadInfo(OffloadKind OKind, const char
*OArch) {
>>> +  // Offload action set its own kinds on their dependences.
>>> +  if (Kind == OffloadClass)
>>> +    return;
>>> +
>>> +  assert((OffloadingDeviceKind == OKind || OffloadingDeviceKind ==
OFK_None) &&
>>> +         "Setting device kind to a different device??");
>>> +  assert(!ActiveOffloadKindMask && "Setting a device kind in a host
action??");
>>> +  OffloadingDeviceKind = OKind;
>>> +  OffloadingArch = OArch;
>>> +
>>> +  for (auto *A : Inputs)
>>> +    A->propagateDeviceOffloadInfo(OffloadingDeviceKind, OArch);
>>> +}
>>> +
>>> +void Action::propagateHostOffloadInfo(unsigned OKinds, const char
*OArch) {
>>> +  // Offload action set its own kinds on their dependences.
>>> +  if (Kind == OffloadClass)
>>> +    return;
>>> +
>>> +  assert(OffloadingDeviceKind == OFK_None &&
>>> +         "Setting a host kind in a device action.");
>>> +  ActiveOffloadKindMask |= OKinds;
>>> +  OffloadingArch = OArch;
>>> +
>>> +  for (auto *A : Inputs)
>>> +    A->propagateHostOffloadInfo(ActiveOffloadKindMask, OArch);
>>> +}
>>> +
>>> +void Action::propagateOffloadInfo(const Action *A) {
>>> +  if (unsigned HK = A->getOffloadingHostActiveKinds())
>>> +    propagateHostOffloadInfo(HK, A->getOffloadingArch());
>>> +  else
>>> +    propagateDeviceOffloadInfo(A->getOffloadingDeviceKind(),
>>> +                               A->getOffloadingArch());
>>> +}
>>> +
>>> +std::string Action::getOffloadingKindPrefix() const {
>>> +  switch (OffloadingDeviceKind) {
>>> +  case OFK_None:
>>> +    break;
>>> +  case OFK_Host:
>>> +    llvm_unreachable("Host kind is not an offloading device kind.");
>>> +    break;
>>> +  case OFK_Cuda:
>>> +    return "device-cuda";
>>> +
>>> +    // TODO: Add other programming models here.
>>> +  }
>>> +
>>> +  if (!ActiveOffloadKindMask)
>>> +    return "";
>>> +
>>> +  std::string Res("host");
>>> +  if (ActiveOffloadKindMask & OFK_Cuda)
>>> +    Res += "-cuda";
>>> +
>>> +  // TODO: Add other programming models here.
>>> +
>>> +  return Res;
>>> +}
>>> +
>>> +std::string
>>> +Action::getOffloadingFileNamePrefix(StringRef NormalizedTriple) const {
>>> +  // A file prefix is only generated for device actions and consists
of the
>>> +  // offload kind and triple.
>>> +  if (!OffloadingDeviceKind)
>>> +    return "";
>>> +
>>> +  std::string Res("-");
>>> +  Res += getOffloadingKindPrefix();
>>> +  Res += "-";
>>> +  Res += NormalizedTriple;
>>> +  return Res;
>>> +}
>>> +
>>>  void InputAction::anchor() {}
>>>
>>>  InputAction::InputAction(const Arg &_Input, types::ID _Type)
>>> @@ -51,16 +128,138 @@ void BindArchAction::anchor() {}
>>>  BindArchAction::BindArchAction(Action *Input, const char *_ArchName)
>>>      : Action(BindArchClass, Input), ArchName(_ArchName) {}
>>>
>>> -void CudaDeviceAction::anchor() {}
>>> +void OffloadAction::anchor() {}
>>> +
>>> +OffloadAction::OffloadAction(const HostDependence &HDep)
>>> +    : Action(OffloadClass, HDep.getAction()),
HostTC(HDep.getToolChain()) {
>>> +  OffloadingArch = HDep.getBoundArch();
>>> +  ActiveOffloadKindMask = HDep.getOffloadKinds();
>>> +  HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>> +                                             HDep.getBoundArch());
>>> +};
>>> +
>>> +OffloadAction::OffloadAction(const DeviceDependences &DDeps, types::ID
Ty)
>>> +    : Action(OffloadClass, DDeps.getActions(), Ty),
>>> +      DevToolChains(DDeps.getToolChains()) {
>>> +  auto &OKinds = DDeps.getOffloadKinds();
>>> +  auto &BArchs = DDeps.getBoundArchs();
>>> +
>>> +  // If all inputs agree on the same kind, use it also for this action.
>>> +  if (llvm::all_of(OKinds, [&](OffloadKind K) { return K ==
OKinds.front(); }))
>>> +    OffloadingDeviceKind = OKinds.front();
>>> +
>>> +  // If we have a single dependency, inherit the architecture from it.
>>> +  if (OKinds.size() == 1)
>>> +    OffloadingArch = BArchs.front();
>>> +
>>> +  // Propagate info to the dependencies.
>>> +  for (unsigned i = 0, e = getInputs().size(); i != e; ++i)
>>> +    getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);
>>> +}
>>> +
>>> +OffloadAction::OffloadAction(const HostDependence &HDep,
>>> +                             const DeviceDependences &DDeps)
>>> +    : Action(OffloadClass, HDep.getAction()),
HostTC(HDep.getToolChain()),
>>> +      DevToolChains(DDeps.getToolChains()) {
>>> +  // We use the kinds of the host dependence for this action.
>>> +  OffloadingArch = HDep.getBoundArch();
>>> +  ActiveOffloadKindMask = HDep.getOffloadKinds();
>>> +  HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>> +                                             HDep.getBoundArch());
>>> +
>>> +  // Add device inputs and propagate info to the device actions. Do
work only if
>>> +  // we have dependencies.
>>> +  for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)
>>> +    if (auto *A = DDeps.getActions()[i]) {
>>> +      getInputs().push_back(A);
>>> +      A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],
>>> +                                    DDeps.getBoundArchs()[i]);
>>> +    }
>>> +}
>>> +
>>> +void OffloadAction::doOnHostDependence(const OffloadActionWorkTy
&Work) const {
>>> +  if (!HostTC)
>>> +    return;
>>> +  assert(!getInputs().empty() && "No dependencies for offload
action??");
>>> +  auto *A = getInputs().front();
>>> +  Work(A, HostTC, A->getOffloadingArch());
>>> +}
>>>
>>> -CudaDeviceAction::CudaDeviceAction(Action *Input, clang::CudaArch Arch,
>>> -                                   bool AtTopLevel)
>>> -    : Action(CudaDeviceClass, Input), GpuArch(Arch),
AtTopLevel(AtTopLevel) {}
>>> +void OffloadAction::doOnEachDeviceDependence(
>>> +    const OffloadActionWorkTy &Work) const {
>>> +  auto I = getInputs().begin();
>>> +  auto E = getInputs().end();
>>> +  if (I == E)
>>> +    return;
>>> +
>>> +  // We expect to have the same number of input dependences and device
tool
>>> +  // chains, except if we also have a host dependence. In that case we
have one
>>> +  // more dependence than we have device tool chains.
>>> +  assert(getInputs().size() == DevToolChains.size() + (HostTC ? 1 : 0)
&&
>>> +         "Sizes of action dependences and toolchains are not
consistent!");
>>> +
>>> +  // Skip host action
>>> +  if (HostTC)
>>> +    ++I;
>>> +
>>> +  auto TI = DevToolChains.begin();
>>> +  for (; I != E; ++I, ++TI)
>>> +    Work(*I, *TI, (*I)->getOffloadingArch());
>>> +}
>>> +
>>> +void OffloadAction::doOnEachDependence(const OffloadActionWorkTy
&Work) const {
>>> +  doOnHostDependence(Work);
>>> +  doOnEachDeviceDependence(Work);
>>> +}
>>> +
>>> +void OffloadAction::doOnEachDependence(bool IsHostDependence,
>>> +                                       const OffloadActionWorkTy
&Work) const {
>>> +  if (IsHostDependence)
>>> +    doOnHostDependence(Work);
>>> +  else
>>> +    doOnEachDeviceDependence(Work);
>>> +}
>>>
>>> -void CudaHostAction::anchor() {}
>>> +bool OffloadAction::hasHostDependence() const { return HostTC !=
nullptr; }
>>>
>>> -CudaHostAction::CudaHostAction(Action *Input, const ActionList
&DeviceActions)
>>> -    : Action(CudaHostClass, Input), DeviceActions(DeviceActions) {}
>>> +Action *OffloadAction::getHostDependence() const {
>>> +  assert(hasHostDependence() && "Host dependence does not exist!");
>>> +  assert(!getInputs().empty() && "No dependencies for offload
action??");
>>> +  return HostTC ? getInputs().front() : nullptr;
>>> +}
>>> +
>>> +bool OffloadAction::hasSingleDeviceDependence(
>>> +    bool DoNotConsiderHostActions) const {
>>> +  if (DoNotConsiderHostActions)
>>> +    return getInputs().size() == (HostTC ? 2 : 1);
>>> +  return !HostTC && getInputs().size() == 1;
>>> +}
>>> +
>>> +Action *
>>> +OffloadAction::getSingleDeviceDependence(bool
DoNotConsiderHostActions) const {
>>> +  assert(hasSingleDeviceDependence(DoNotConsiderHostActions) &&
>>> +         "Single device dependence does not exist!");
>>> +  // The previous assert ensures the number of entries in getInputs()
is
>>> +  // consistent with what we are doing here.
>>> +  return HostTC ? getInputs()[1] : getInputs().front();
>>> +}
>>> +
>>> +void OffloadAction::DeviceDependences::add(Action &A, const ToolChain
&TC,
>>> +                                           const char *BoundArch,
>>> +                                           OffloadKind OKind) {
>>> +  DeviceActions.push_back(&A);
>>> +  DeviceToolChains.push_back(&TC);
>>> +  DeviceBoundArchs.push_back(BoundArch);
>>> +  DeviceOffloadKinds.push_back(OKind);
>>> +}
>>> +
>>> +OffloadAction::HostDependence::HostDependence(Action &A, const
ToolChain &TC,
>>> +                                              const char *BoundArch,
>>> +                                              const DeviceDependences
&DDeps)
>>> +    : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch) {
>>> +  for (auto K : DDeps.getOffloadKinds())
>>> +    HostOffloadKinds |= K;
>>> +}
>>>
>>>  void JobAction::anchor() {}
>>>
>>>
>>> Modified: cfe/trunk/lib/Driver/Driver.cpp
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Driver.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Driver/Driver.cpp (original)
>>> +++ cfe/trunk/lib/Driver/Driver.cpp Fri Jul 15 18:13:27 2016
>>> @@ -435,7 +435,9 @@ void Driver::CreateOffloadingDeviceToolC
>>>        })) {
>>>      const ToolChain &TC = getToolChain(
>>>          C.getInputArgs(),
>>> -
llvm::Triple(C.getOffloadingHostToolChain()->getTriple().isArch64Bit()
>>> +        llvm::Triple(C.getSingleOffloadToolChain<Action::OFK_Host>()
>>> +                             ->getTriple()
>>> +                             .isArch64Bit()
>>>                           ? "nvptx64-nvidia-cuda"
>>>                           : "nvptx-nvidia-cuda"));
>>>      C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
>>> @@ -1022,19 +1024,33 @@ static unsigned PrintActions1(const Comp
>>>    } else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
>>>      os << '"' << BIA->getArchName() << '"' << ", {"
>>>         << PrintActions1(C, *BIA->input_begin(), Ids) << "}";
>>> -  } else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>> -    CudaArch Arch = CDA->getGpuArch();
>>> -    if (Arch != CudaArch::UNKNOWN)
>>> -      os << "'" << CudaArchToString(Arch) << "', ";
>>> -    os << "{" << PrintActions1(C, *CDA->input_begin(), Ids) << "}";
>>> +  } else if (OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>> +    bool IsFirst = true;
>>> +    OA->doOnEachDependence(
>>> +        [&](Action *A, const ToolChain *TC, const char *BoundArch) {
>>> +          // E.g. for two CUDA device dependences whose bound arch is
sm_20 and
>>> +          // sm_35 this will generate:
>>> +          // "cuda-device" (nvptx64-nvidia-cuda:sm_20) {#ID},
"cuda-device"
>>> +          // (nvptx64-nvidia-cuda:sm_35) {#ID}
>>> +          if (!IsFirst)
>>> +            os << ", ";
>>> +          os << '"';
>>> +          if (TC)
>>> +            os << A->getOffloadingKindPrefix();
>>> +          else
>>> +            os << "host";
>>> +          os << " (";
>>> +          os << TC->getTriple().normalize();
>>> +
>>> +          if (BoundArch)
>>> +            os << ":" << BoundArch;
>>> +          os << ")";
>>> +          os << '"';
>>> +          os << " {" << PrintActions1(C, A, Ids) << "}";
>>> +          IsFirst = false;
>>> +        });
>>>    } else {
>>> -    const ActionList *AL;
>>> -    if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>> -      os << "{" << PrintActions1(C, *CHA->input_begin(), Ids) << "}"
>>> -         << ", gpu binaries ";
>>> -      AL = &CHA->getDeviceActions();
>>> -    } else
>>> -      AL = &A->getInputs();
>>> +    const ActionList *AL = &A->getInputs();
>>>
>>>      if (AL->size()) {
>>>        const char *Prefix = "{";
>>> @@ -1047,10 +1063,24 @@ static unsigned PrintActions1(const Comp
>>>        os << "{}";
>>>    }
>>>
>>> +  // Append offload info for all options other than the offloading
action
>>> +  // itself (e.g. (cuda-device, sm_20) or (cuda-host)).
>>> +  std::string offload_str;
>>> +  llvm::raw_string_ostream offload_os(offload_str);
>>> +  if (!isa<OffloadAction>(A)) {
>>> +    auto S = A->getOffloadingKindPrefix();
>>> +    if (!S.empty()) {
>>> +      offload_os << ", (" << S;
>>> +      if (A->getOffloadingArch())
>>> +        offload_os << ", " << A->getOffloadingArch();
>>> +      offload_os << ")";
>>> +    }
>>> +  }
>>> +
>>>    unsigned Id = Ids.size();
>>>    Ids[A] = Id;
>>>    llvm::errs() << Id << ": " << os.str() << ", "
>>> -               << types::getTypeName(A->getType()) << "\n";
>>> +               << types::getTypeName(A->getType()) << offload_os.str()
<< "\n";
>>>
>>>    return Id;
>>>  }
>>> @@ -1378,8 +1408,12 @@ static Action *buildCudaActions(Compilat
>>>        PartialCompilationArg &&
>>>
 PartialCompilationArg->getOption().matches(options::OPT_cuda_device_only);
>>>
>>> -  if (CompileHostOnly)
>>> -    return C.MakeAction<CudaHostAction>(HostAction, ActionList());
>>> +  if (CompileHostOnly) {
>>> +    OffloadAction::HostDependence HDep(
>>> +        *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>> +        /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>> +    return C.MakeAction<OffloadAction>(HDep);
>>> +  }
>>>
>>>    // Collect all cuda_gpu_arch parameters, removing duplicates.
>>>    SmallVector<CudaArch, 4> GpuArchList;
>>> @@ -1408,8 +1442,6 @@ static Action *buildCudaActions(Compilat
>>>      CudaDeviceInputs.push_back(std::make_pair(types::TY_CUDA_DEVICE,
InputArg));
>>>
>>>    // Build actions for all device inputs.
>>> -  assert(C.getSingleOffloadToolChain<Action::OFK_Cuda>() &&
>>> -         "Missing toolchain for device-side compilation.");
>>>    ActionList CudaDeviceActions;
>>>    C.getDriver().BuildActions(C, Args, CudaDeviceInputs,
CudaDeviceActions);
>>>    assert(GpuArchList.size() == CudaDeviceActions.size() &&
>>> @@ -1421,6 +1453,8 @@ static Action *buildCudaActions(Compilat
>>>          return a->getKind() != Action::AssembleJobClass;
>>>        });
>>>
>>> +  const ToolChain *CudaTC =
C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>> +
>>>    // Figure out what to do with device actions -- pass them as inputs
to the
>>>    // host action or run each of them independently.
>>>    if (PartialCompilation || CompileDeviceOnly) {
>>> @@ -1436,10 +1470,13 @@ static Action *buildCudaActions(Compilat
>>>        return nullptr;
>>>      }
>>>
>>> -    for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
>>> -
Actions.push_back(C.MakeAction<CudaDeviceAction>(CudaDeviceActions[I],
>>> -                                                       GpuArchList[I],
>>> -                                                       /* AtTopLevel
*/ true));
>>> +    for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
>>> +      OffloadAction::DeviceDependences DDep;
>>> +      DDep.add(*CudaDeviceActions[I], *CudaTC,
CudaArchToString(GpuArchList[I]),
>>> +               Action::OFK_Cuda);
>>> +      Actions.push_back(
>>> +          C.MakeAction<OffloadAction>(DDep,
CudaDeviceActions[I]->getType()));
>>> +    }
>>>      // Kill host action in case of device-only compilation.
>>>      if (CompileDeviceOnly)
>>>        return nullptr;
>>> @@ -1459,19 +1496,23 @@ static Action *buildCudaActions(Compilat
>>>      Action* BackendAction = AssembleAction->getInputs()[0];
>>>      assert(BackendAction->getType() == types::TY_PP_Asm);
>>>
>>> -    for (const auto& A : {AssembleAction, BackendAction}) {
>>> -      DeviceActions.push_back(C.MakeAction<CudaDeviceAction>(
>>> -          A, GpuArchList[I], /* AtTopLevel */ false));
>>> +    for (auto &A : {AssembleAction, BackendAction}) {
>>> +      OffloadAction::DeviceDependences DDep;
>>> +      DDep.add(*A, *CudaTC, CudaArchToString(GpuArchList[I]),
Action::OFK_Cuda);
>>> +      DeviceActions.push_back(C.MakeAction<OffloadAction>(DDep,
A->getType()));
>>>      }
>>>    }
>>> -  auto FatbinAction = C.MakeAction<CudaDeviceAction>(
>>> -      C.MakeAction<LinkJobAction>(DeviceActions,
types::TY_CUDA_FATBIN),
>>> -      CudaArch::UNKNOWN,
>>> -      /* AtTopLevel = */ false);
>>> +  auto FatbinAction =
>>> +      C.MakeAction<LinkJobAction>(DeviceActions,
types::TY_CUDA_FATBIN);
>>> +
>>>    // Return a new host action that incorporates original host action
and all
>>>    // device actions.
>>> -  return C.MakeAction<CudaHostAction>(std::move(HostAction),
>>> -                                      ActionList({FatbinAction}));
>>> +  OffloadAction::HostDependence HDep(
>>> +      *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>> +      /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>> +  OffloadAction::DeviceDependences DDep;
>>> +  DDep.add(*FatbinAction, *CudaTC, /*BoundArch=*/nullptr,
Action::OFK_Cuda);
>>> +  return C.MakeAction<OffloadAction>(HDep, DDep);
>>>  }
>>>
>>>  void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
>>> @@ -1580,6 +1621,9 @@ void Driver::BuildActions(Compilation &C
>>>      YcArg = YuArg = nullptr;
>>>    }
>>>
>>> +  // Track the host offload kinds used on this compilation.
>>> +  unsigned CompilationActiveOffloadHostKinds = 0u;
>>> +
>>>    // Construct the actions to perform.
>>>    ActionList LinkerInputs;
>>>
>>> @@ -1648,6 +1692,9 @@ void Driver::BuildActions(Compilation &C
>>>              ? phases::Compile
>>>              : FinalPhase;
>>>
>>> +    // Track the host offload kinds used on this input.
>>> +    unsigned InputActiveOffloadHostKinds = 0u;
>>> +
>>>      // Build the pipeline for this file.
>>>      Action *Current = C.MakeAction<InputAction>(*InputArg, InputType);
>>>      for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e =
PL.end();
>>> @@ -1679,21 +1726,36 @@ void Driver::BuildActions(Compilation &C
>>>          Current = buildCudaActions(C, Args, InputArg, Current,
Actions);
>>>          if (!Current)
>>>            break;
>>> +
>>> +        // We produced a CUDA action for this input, so the host has
to support
>>> +        // CUDA.
>>> +        InputActiveOffloadHostKinds |= Action::OFK_Cuda;
>>> +        CompilationActiveOffloadHostKinds |= Action::OFK_Cuda;
>>>        }
>>>
>>>        if (Current->getType() == types::TY_Nothing)
>>>          break;
>>>      }
>>>
>>> -    // If we ended with something, add to the output list.
>>> -    if (Current)
>>> +    // If we ended with something, add to the output list. Also,
propagate the
>>> +    // offload information to the top-level host action related with
the current
>>> +    // input.
>>> +    if (Current) {
>>> +      if (InputActiveOffloadHostKinds)
>>> +        Current->propagateHostOffloadInfo(InputActiveOffloadHostKinds,
>>> +                                          /*BoundArch=*/nullptr);
>>>        Actions.push_back(Current);
>>> +    }
>>>    }
>>>
>>> -  // Add a link action if necessary.
>>> -  if (!LinkerInputs.empty())
>>> +  // Add a link action if necessary and propagate the offload
information for
>>> +  // the current compilation.
>>> +  if (!LinkerInputs.empty()) {
>>>      Actions.push_back(
>>>          C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image));
>>> +
Actions.back()->propagateHostOffloadInfo(CompilationActiveOffloadHostKinds,
>>> +                                             /*BoundArch=*/nullptr);
>>> +  }
>>>
>>>    // If we are linking, claim any options which are obviously only
used for
>>>    // compilation.
>>> @@ -1829,7 +1891,8 @@ void Driver::BuildJobs(Compilation &C) c
>>>                         /*BoundArch*/ nullptr,
>>>                         /*AtTopLevel*/ true,
>>>                         /*MultipleArchs*/ ArchNames.size() > 1,
>>> -                       /*LinkingOutput*/ LinkingOutput, CachedResults);
>>> +                       /*LinkingOutput*/ LinkingOutput, CachedResults,
>>> +                       /*BuildForOffloadDevice*/ false);
>>>    }
>>>
>>>    // If the user passed -Qunused-arguments or there were errors, don't
warn
>>> @@ -1878,7 +1941,28 @@ void Driver::BuildJobs(Compilation &C) c
>>>      }
>>>    }
>>>  }
>>> -
>>> +/// Collapse an offloading action looking for a job of the given type.
The input
>>> +/// action is changed to the input of the collapsed sequence. If we
effectively
>>> +/// had a collapse return the corresponding offloading action,
otherwise return
>>> +/// null.
>>> +template <typename T>
>>> +static OffloadAction *collapseOffloadingAction(Action *&CurAction) {
>>> +  if (!CurAction)
>>> +    return nullptr;
>>> +  if (auto *OA = dyn_cast<OffloadAction>(CurAction)) {
>>> +    if (OA->hasHostDependence())
>>> +      if (auto *HDep = dyn_cast<T>(OA->getHostDependence())) {
>>> +        CurAction = HDep;
>>> +        return OA;
>>> +      }
>>> +    if (OA->hasSingleDeviceDependence())
>>> +      if (auto *DDep = dyn_cast<T>(OA->getSingleDeviceDependence())) {
>>> +        CurAction = DDep;
>>> +        return OA;
>>> +      }
>>> +  }
>>> +  return nullptr;
>>> +}
>>>  // Returns a Tool for a given JobAction.  In case the action and its
>>>  // predecessors can be combined, updates Inputs with the inputs of the
>>>  // first combined action. If one of the collapsed actions is a
>>> @@ -1888,34 +1972,39 @@ static const Tool *selectToolForJob(Comp
>>>                                      bool EmbedBitcode, const ToolChain
*TC,
>>>                                      const JobAction *JA,
>>>                                      const ActionList *&Inputs,
>>> -                                    const CudaHostAction
*&CollapsedCHA) {
>>> +                                    ActionList
&CollapsedOffloadAction) {
>>>    const Tool *ToolForJob = nullptr;
>>> -  CollapsedCHA = nullptr;
>>> +  CollapsedOffloadAction.clear();
>>>
>>>    // See if we should look for a compiler with an integrated
assembler. We match
>>>    // bottom up, so what we are actually looking for is an assembler
job with a
>>>    // compiler input.
>>>
>>> +  // Look through offload actions between assembler and backend
actions.
>>> +  Action *BackendJA = (isa<AssembleJobAction>(JA) && Inputs->size() ==
1)
>>> +                          ? *Inputs->begin()
>>> +                          : nullptr;
>>> +  auto *BackendOA =
collapseOffloadingAction<BackendJobAction>(BackendJA);
>>> +
>>>    if (TC->useIntegratedAs() && !SaveTemps &&
>>>        !C.getArgs().hasArg(options::OPT_via_file_asm) &&
>>>        !C.getArgs().hasArg(options::OPT__SLASH_FA) &&
>>> -      !C.getArgs().hasArg(options::OPT__SLASH_Fa) &&
>>> -      isa<AssembleJobAction>(JA) && Inputs->size() == 1 &&
>>> -      isa<BackendJobAction>(*Inputs->begin())) {
>>> +      !C.getArgs().hasArg(options::OPT__SLASH_Fa) && BackendJA &&
>>> +      isa<BackendJobAction>(BackendJA)) {
>>>      // A BackendJob is always preceded by a CompileJob, and without
-save-temps
>>>      // or -fembed-bitcode, they will always get combined together, so
instead of
>>>      // checking the backend tool, check if the tool for the CompileJob
has an
>>>      // integrated assembler. For -fembed-bitcode, CompileJob is still
used to
>>>      // look up tools for BackendJob, but they need to match before we
can split
>>>      // them.
>>> -    const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
>>> -    // Compile job may be wrapped in CudaHostAction, extract it if
>>> -    // that's the case and update CollapsedCHA if we combine phases.
>>> -    CudaHostAction *CHA =
dyn_cast<CudaHostAction>(*BackendInputs->begin());
>>> -    JobAction *CompileJA = cast<CompileJobAction>(
>>> -        CHA ? *CHA->input_begin() : *BackendInputs->begin());
>>> -    assert(CompileJA && "Backend job is not preceeded by compile
job.");
>>> -    const Tool *Compiler = TC->SelectTool(*CompileJA);
>>> +
>>> +    // Look through offload actions between backend and compile
actions.
>>> +    Action *CompileJA = *BackendJA->getInputs().begin();
>>> +    auto *CompileOA =
collapseOffloadingAction<CompileJobAction>(CompileJA);
>>> +
>>> +    assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>> +           "Backend job is not preceeded by compile job.");
>>> +    const Tool *Compiler =
TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>      if (!Compiler)
>>>        return nullptr;
>>>      // When using -fembed-bitcode, it is required to have the same
tool (clang)
>>> @@ -1929,7 +2018,12 @@ static const Tool *selectToolForJob(Comp
>>>      if (Compiler->hasIntegratedAssembler()) {
>>>        Inputs = &CompileJA->getInputs();
>>>        ToolForJob = Compiler;
>>> -      CollapsedCHA = CHA;
>>> +      // Save the collapsed offload actions because they may still
contain
>>> +      // device actions.
>>> +      if (CompileOA)
>>> +        CollapsedOffloadAction.push_back(CompileOA);
>>> +      if (BackendOA)
>>> +        CollapsedOffloadAction.push_back(BackendOA);
>>>      }
>>>    }
>>>
>>> @@ -1939,20 +2033,23 @@ static const Tool *selectToolForJob(Comp
>>>    if (isa<BackendJobAction>(JA)) {
>>>      // Check if the compiler supports emitting LLVM IR.
>>>      assert(Inputs->size() == 1);
>>> -    // Compile job may be wrapped in CudaHostAction, extract it if
>>> -    // that's the case and update CollapsedCHA if we combine phases.
>>> -    CudaHostAction *CHA = dyn_cast<CudaHostAction>(*Inputs->begin());
>>> -    JobAction *CompileJA =
>>> -        cast<CompileJobAction>(CHA ? *CHA->input_begin() :
*Inputs->begin());
>>> -    assert(CompileJA && "Backend job is not preceeded by compile
job.");
>>> -    const Tool *Compiler = TC->SelectTool(*CompileJA);
>>> +
>>> +    // Look through offload actions between backend and compile
actions.
>>> +    Action *CompileJA = *JA->getInputs().begin();
>>> +    auto *CompileOA =
collapseOffloadingAction<CompileJobAction>(CompileJA);
>>> +
>>> +    assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>> +           "Backend job is not preceeded by compile job.");
>>> +    const Tool *Compiler =
TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>      if (!Compiler)
>>>        return nullptr;
>>>      if (!Compiler->canEmitIR() ||
>>>          (!SaveTemps && !EmbedBitcode)) {
>>>        Inputs = &CompileJA->getInputs();
>>>        ToolForJob = Compiler;
>>> -      CollapsedCHA = CHA;
>>> +
>>> +      if (CompileOA)
>>> +        CollapsedOffloadAction.push_back(CompileOA);
>>>      }
>>>    }
>>>
>>> @@ -1963,12 +2060,21 @@ static const Tool *selectToolForJob(Comp
>>>    // See if we should use an integrated preprocessor. We do so when we
have
>>>    // exactly one input, since this is the only use case we care about
>>>    // (irrelevant since we don't support combine yet).
>>> -  if (Inputs->size() == 1 &&
isa<PreprocessJobAction>(*Inputs->begin()) &&
>>> +
>>> +  // Look through offload actions after preprocessing.
>>> +  Action *PreprocessJA = (Inputs->size() == 1) ? *Inputs->begin() :
nullptr;
>>> +  auto *PreprocessOA =
>>> +      collapseOffloadingAction<PreprocessJobAction>(PreprocessJA);
>>> +
>>> +  if (PreprocessJA && isa<PreprocessJobAction>(PreprocessJA) &&
>>>        !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
>>>        !C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps
&&
>>>        !C.getArgs().hasArg(options::OPT_rewrite_objc) &&
>>> -      ToolForJob->hasIntegratedCPP())
>>> -    Inputs = &(*Inputs)[0]->getInputs();
>>> +      ToolForJob->hasIntegratedCPP()) {
>>> +    Inputs = &PreprocessJA->getInputs();
>>> +    if (PreprocessOA)
>>> +      CollapsedOffloadAction.push_back(PreprocessOA);
>>> +  }
>>>
>>>    return ToolForJob;
>>>  }
>>> @@ -1976,8 +2082,8 @@ static const Tool *selectToolForJob(Comp
>>>  InputInfo Driver::BuildJobsForAction(
>>>      Compilation &C, const Action *A, const ToolChain *TC, const char
*BoundArch,
>>>      bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>> -    std::map<std::pair<const Action *, std::string>, InputInfo>
&CachedResults)
>>> -    const {
>>> +    std::map<std::pair<const Action *, std::string>, InputInfo>
&CachedResults,
>>> +    bool BuildForOffloadDevice) const {
>>>    // The bound arch is not necessarily represented in the toolchain's
triple --
>>>    // for example, armv7 and armv7s both map to the same triple -- so
we need
>>>    // both in our map.
>>> @@ -1991,9 +2097,9 @@ InputInfo Driver::BuildJobsForAction(
>>>    if (CachedResult != CachedResults.end()) {
>>>      return CachedResult->second;
>>>    }
>>> -  InputInfo Result =
>>> -      BuildJobsForActionNoCache(C, A, TC, BoundArch, AtTopLevel,
MultipleArchs,
>>> -                                LinkingOutput, CachedResults);
>>> +  InputInfo Result = BuildJobsForActionNoCache(
>>> +      C, A, TC, BoundArch, AtTopLevel, MultipleArchs, LinkingOutput,
>>> +      CachedResults, BuildForOffloadDevice);
>>>    CachedResults[ActionTC] = Result;
>>>    return Result;
>>>  }
>>> @@ -2001,21 +2107,65 @@ InputInfo Driver::BuildJobsForAction(
>>>  InputInfo Driver::BuildJobsForActionNoCache(
>>>      Compilation &C, const Action *A, const ToolChain *TC, const char
*BoundArch,
>>>      bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>> -    std::map<std::pair<const Action *, std::string>, InputInfo>
&CachedResults)
>>> -    const {
>>> +    std::map<std::pair<const Action *, std::string>, InputInfo>
&CachedResults,
>>> +    bool BuildForOffloadDevice) const {
>>>    llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");
>>>
>>> -  InputInfoList CudaDeviceInputInfos;
>>> -  if (const CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>> -    // Append outputs of device jobs to the input list.
>>> -    for (const Action *DA : CHA->getDeviceActions()) {
>>> -      CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>> -          C, DA, TC, nullptr, AtTopLevel,
>>> -          /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>> -    }
>>> -    // Override current action with a real host compile action and
continue
>>> -    // processing it.
>>> -    A = *CHA->input_begin();
>>> +  InputInfoList OffloadDependencesInputInfo;
>>> +  if (const OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>> +    // The offload action is expected to be used in four different
situations.
>>> +    //
>>> +    // a) Set a toolchain/architecture/kind for a host action:
>>> +    //    Host Action 1 -> OffloadAction -> Host Action 2
>>> +    //
>>> +    // b) Set a toolchain/architecture/kind for a device action;
>>> +    //    Device Action 1 -> OffloadAction -> Device Action 2
>>> +    //
>>> +    // c) Specify a device dependences to a host action;
>>> +    //    Device Action 1  _
>>> +    //                      \
>>> +    //      Host Action 1  ---> OffloadAction -> Host Action 2
>>> +    //
>>> +    // d) Specify a host dependence to a device action.
>>> +    //      Host Action 1  _
>>> +    //                      \
>>> +    //    Device Action 1  ---> OffloadAction -> Device Action 2
>>> +    //
>>> +    // For a) and b), we just return the job generated for the
dependence. For
>>> +    // c) and d) we override the current action with the host/device
dependence
>>> +    // if the current toolchain is host/device and set the offload
dependences
>>> +    // info with the jobs obtained from the device/host dependence(s).
>>> +
>>> +    // If there is a single device option, just generate the job for
it.
>>> +    if (OA->hasSingleDeviceDependence()) {
>>> +      InputInfo DevA;
>>> +      OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain
*DepTC,
>>> +                                       const char *DepBoundArch) {
>>> +        DevA =
>>> +            BuildJobsForAction(C, DepA, DepTC, DepBoundArch,
AtTopLevel,
>>> +                               /*MultipleArchs*/ !!DepBoundArch,
LinkingOutput,
>>> +                               CachedResults,
/*BuildForOffloadDevice=*/true);
>>> +      });
>>> +      return DevA;
>>> +    }
>>> +
>>> +    // If 'Action 2' is host, we generate jobs for the device
dependences and
>>> +    // override the current action with the host dependence.
Otherwise, we
>>> +    // generate the host dependences and override the action with the
device
>>> +    // dependence. The dependences can't therefore be a top-level
action.
>>> +    OA->doOnEachDependence(
>>> +        /*IsHostDependence=*/BuildForOffloadDevice,
>>> +        [&](Action *DepA, const ToolChain *DepTC, const char
*DepBoundArch) {
>>> +          OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>> +              C, DepA, DepTC, DepBoundArch, /*AtTopLevel=*/false,
>>> +              /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
CachedResults,
>>> +
/*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>> +                  Action::OFK_None));
>>> +        });
>>> +
>>> +    A = BuildForOffloadDevice
>>> +            ?
OA->getSingleDeviceDependence(/*DoNotConsiderHostActions=*/true)
>>> +            : OA->getHostDependence();
>>>    }
>>>
>>>    if (const InputAction *IA = dyn_cast<InputAction>(A)) {
>>> @@ -2042,41 +2192,34 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>        TC = &C.getDefaultToolChain();
>>>
>>>      return BuildJobsForAction(C, *BAA->input_begin(), TC, ArchName,
AtTopLevel,
>>> -                              MultipleArchs, LinkingOutput,
CachedResults);
>>> +                              MultipleArchs, LinkingOutput,
CachedResults,
>>> +                              BuildForOffloadDevice);
>>>    }
>>>
>>> -  if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>> -    // Initial processing of CudaDeviceAction carries host params.
>>> -    // Call BuildJobsForAction() again, now with correct device
parameters.
>>> -    InputInfo II = BuildJobsForAction(
>>> -        C, *CDA->input_begin(),
C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
>>> -        CudaArchToString(CDA->getGpuArch()), CDA->isAtTopLevel(),
>>> -        /*MultipleArchs=*/true, LinkingOutput, CachedResults);
>>> -    // Currently II's Action is *CDA->input_begin().  Set it to CDA
instead, so
>>> -    // that one can retrieve II's GPU arch.
>>> -    II.setAction(A);
>>> -    return II;
>>> -  }
>>>
>>>    const ActionList *Inputs = &A->getInputs();
>>>
>>>    const JobAction *JA = cast<JobAction>(A);
>>> -  const CudaHostAction *CollapsedCHA = nullptr;
>>> +  ActionList CollapsedOffloadActions;
>>> +
>>>    const Tool *T =
>>>        selectToolForJob(C, isSaveTempsEnabled(), embedBitcodeEnabled(),
TC, JA,
>>> -                       Inputs, CollapsedCHA);
>>> +                       Inputs, CollapsedOffloadActions);
>>>    if (!T)
>>>      return InputInfo();
>>>
>>> -  // If we've collapsed action list that contained CudaHostAction we
>>> -  // need to build jobs for device-side inputs it may have held.
>>> -  if (CollapsedCHA) {
>>> -    for (const Action *DA : CollapsedCHA->getDeviceActions()) {
>>> -      CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>> -          C, DA, TC, "", AtTopLevel,
>>> -          /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>> -    }
>>> -  }
>>> +  // If we've collapsed action list that contained OffloadAction we
>>> +  // need to build jobs for host/device-side inputs it may have held.
>>> +  for (const auto *OA : CollapsedOffloadActions)
>>> +    cast<OffloadAction>(OA)->doOnEachDependence(
>>> +        /*IsHostDependence=*/BuildForOffloadDevice,
>>> +        [&](Action *DepA, const ToolChain *DepTC, const char
*DepBoundArch) {
>>> +          OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>> +              C, DepA, DepTC, DepBoundArch, AtTopLevel,
>>> +              /*MultipleArchs=*/!!DepBoundArch, LinkingOutput,
CachedResults,
>>> +
/*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>> +                  Action::OFK_None));
>>> +        });
>>>
>>>    // Only use pipes when there is exactly one input.
>>>    InputInfoList InputInfos;
>>> @@ -2086,9 +2229,9 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>      // FIXME: Clean this up.
>>>      bool SubJobAtTopLevel =
>>>          AtTopLevel && (isa<DsymutilJobAction>(A) ||
isa<VerifyJobAction>(A));
>>> -    InputInfos.push_back(BuildJobsForAction(C, Input, TC, BoundArch,
>>> -                                            SubJobAtTopLevel,
MultipleArchs,
>>> -                                            LinkingOutput,
CachedResults));
>>> +    InputInfos.push_back(BuildJobsForAction(
>>> +        C, Input, TC, BoundArch, SubJobAtTopLevel, MultipleArchs,
LinkingOutput,
>>> +        CachedResults, BuildForOffloadDevice));
>>>    }
>>>
>>>    // Always use the first input as the base input.
>>> @@ -2099,9 +2242,10 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>    if (JA->getType() == types::TY_dSYM)
>>>      BaseInput = InputInfos[0].getFilename();
>>>
>>> -  // Append outputs of cuda device jobs to the input list
>>> -  if (CudaDeviceInputInfos.size())
>>> -    InputInfos.append(CudaDeviceInputInfos.begin(),
CudaDeviceInputInfos.end());
>>> +  // Append outputs of offload device jobs to the input list
>>> +  if (!OffloadDependencesInputInfo.empty())
>>> +    InputInfos.append(OffloadDependencesInputInfo.begin(),
>>> +                      OffloadDependencesInputInfo.end());
>>>
>>>    // Determine the place to write output to, if any.
>>>    InputInfo Result;
>>> @@ -2109,7 +2253,8 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>      Result = InputInfo(A, BaseInput);
>>>    else
>>>      Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput,
BoundArch,
>>> -                                             AtTopLevel,
MultipleArchs),
>>> +                                             AtTopLevel, MultipleArchs,
>>> +
 TC->getTriple().normalize()),
>>>                         BaseInput);
>>>
>>>    if (CCCPrintBindings && !CCGenDiagnostics) {
>>> @@ -2169,7 +2314,8 @@ static const char *MakeCLOutputFilename(
>>>  const char *Driver::GetNamedOutputPath(Compilation &C, const JobAction
&JA,
>>>                                         const char *BaseInput,
>>>                                         const char *BoundArch, bool
AtTopLevel,
>>> -                                       bool MultipleArchs) const {
>>> +                                       bool MultipleArchs,
>>> +                                       StringRef NormalizedTriple)
const {
>>>    llvm::PrettyStackTraceString CrashInfo("Computing output path");
>>>    // Output to a user requested destination?
>>>    if (AtTopLevel && !isa<DsymutilJobAction>(JA) &&
!isa<VerifyJobAction>(JA)) {
>>> @@ -2255,6 +2401,7 @@ const char *Driver::GetNamedOutputPath(C
>>>            MakeCLOutputFilename(C.getArgs(), "", BaseName,
types::TY_Image);
>>>      } else if (MultipleArchs && BoundArch) {
>>>        SmallString<128> Output(getDefaultImageName());
>>> +      Output += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>        Output += "-";
>>>        Output.append(BoundArch);
>>>        NamedOutput = C.getArgs().MakeArgString(Output.c_str());
>>> @@ -2271,6 +2418,7 @@ const char *Driver::GetNamedOutputPath(C
>>>      if (!types::appendSuffixForType(JA.getType()))
>>>        End = BaseName.rfind('.');
>>>      SmallString<128> Suffixed(BaseName.substr(0, End));
>>> +    Suffixed += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>      if (MultipleArchs && BoundArch) {
>>>        Suffixed += "-";
>>>        Suffixed.append(BoundArch);
>>>
>>> Modified: cfe/trunk/lib/Driver/ToolChain.cpp
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChain.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Driver/ToolChain.cpp (original)
>>> +++ cfe/trunk/lib/Driver/ToolChain.cpp Fri Jul 15 18:13:27 2016
>>> @@ -248,8 +248,7 @@ Tool *ToolChain::getTool(Action::ActionC
>>>
>>>    case Action::InputClass:
>>>    case Action::BindArchClass:
>>> -  case Action::CudaDeviceClass:
>>> -  case Action::CudaHostClass:
>>> +  case Action::OffloadClass:
>>>    case Action::LipoJobClass:
>>>    case Action::DsymutilJobClass:
>>>    case Action::VerifyDebugInfoJobClass:
>>>
>>> Modified: cfe/trunk/lib/Driver/Tools.cpp
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Driver/Tools.cpp (original)
>>> +++ cfe/trunk/lib/Driver/Tools.cpp Fri Jul 15 18:13:27 2016
>>> @@ -296,12 +296,45 @@ static bool forwardToGCC(const Option &O
>>>           !O.hasFlag(options::DriverOption) &&
!O.hasFlag(options::LinkerInput);
>>>  }
>>>
>>> +/// Add the C++ include args of other offloading toolchains. If this
is a host
>>> +/// job, the device toolchains are added. If this is a device job, the
host
>>> +/// toolchains will be added.
>>> +static void addExtraOffloadCXXStdlibIncludeArgs(Compilation &C,
>>> +                                                const JobAction &JA,
>>> +                                                const ArgList &Args,
>>> +                                                ArgStringList
&CmdArgs) {
>>> +
>>> +  if (JA.isHostOffloading(Action::OFK_Cuda))
>>> +    C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>> +        ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>> +  else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>> +    C.getSingleOffloadToolChain<Action::OFK_Host>()
>>> +        ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>> +
>>> +  // TODO: Add support for other programming models here.
>>> +}
>>> +
>>> +/// Add the include args that are specific of each offloading
programming model.
>>> +static void addExtraOffloadSpecificIncludeArgs(Compilation &C,
>>> +                                               const JobAction &JA,
>>> +                                               const ArgList &Args,
>>> +                                               ArgStringList &CmdArgs)
{
>>> +
>>> +  if (JA.isHostOffloading(Action::OFK_Cuda))
>>> +
C.getSingleOffloadToolChain<Action::OFK_Host>()->AddCudaIncludeArgs(
>>> +        Args, CmdArgs);
>>> +  else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>> +
C.getSingleOffloadToolChain<Action::OFK_Cuda>()->AddCudaIncludeArgs(
>>> +        Args, CmdArgs);
>>> +
>>> +  // TODO: Add support for other programming models here.
>>> +}
>>> +
>>>  void Clang::AddPreprocessingOptions(Compilation &C, const JobAction
&JA,
>>>                                      const Driver &D, const ArgList
&Args,
>>>                                      ArgStringList &CmdArgs,
>>>                                      const InputInfo &Output,
>>> -                                    const InputInfoList &Inputs,
>>> -                                    const ToolChain *AuxToolChain)
const {
>>> +                                    const InputInfoList &Inputs) const
{
>>>    Arg *A;
>>>    const bool IsIAMCU = getToolChain().getTriple().isOSIAMCU();
>>>
>>> @@ -566,31 +599,27 @@ void Clang::AddPreprocessingOptions(Comp
>>>    // OBJCPLUS_INCLUDE_PATH - system includes enabled when compiling
ObjC++.
>>>    addDirectoryList(Args, CmdArgs, "-objcxx-isystem",
"OBJCPLUS_INCLUDE_PATH");
>>>
>>> -  // Optional AuxToolChain indicates that we need to include headers
>>> -  // for more than one target. If that's the case, add include paths
>>> -  // from AuxToolChain right after include paths of the same kind for
>>> -  // the current target.
>>> +  // While adding the include arguments, we also attempt to retrieve
the
>>> +  // arguments of related offloading toolchains or arguments that are
specific
>>> +  // of an offloading programming model.
>>>
>>>    // Add C++ include arguments, if needed.
>>>    if (types::isCXX(Inputs[0].getType())) {
>>>      getToolChain().AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>> -    if (AuxToolChain)
>>> -      AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>> +    addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>>    }
>>>
>>>    // Add system include arguments for all targets but IAMCU.
>>>    if (!IsIAMCU) {
>>>      getToolChain().AddClangSystemIncludeArgs(Args, CmdArgs);
>>> -    if (AuxToolChain)
>>> -      AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>> +    addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>
>>
>> This doesn't make much sense to me: we already added the C++ stdlib
includes a few lines above for C++ compiles. Should this be adding the
(non-C++) system include args instead?
>>
>>>
>>>    } else {
>>>      // For IAMCU add special include arguments.
>>>      getToolChain().AddIAMCUIncludeArgs(Args, CmdArgs);
>>>    }
>>>
>>> -  // Add CUDA include arguments, if needed.
>>> -  if (types::isCuda(Inputs[0].getType()))
>>> -    getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
>>> +  // Add offload include arguments, if needed.
>>> +  addExtraOffloadSpecificIncludeArgs(C, JA, Args, CmdArgs);
>>>  }
>>>
>>>  // FIXME: Move to target hook.
>>> @@ -3799,7 +3828,7 @@ void Clang::ConstructJob(Compilation &C,
>>>    // CUDA compilation may have multiple inputs (source file + results
of
>>>    // device-side compilations). All other jobs are expected to have
exactly one
>>>    // input.
>>> -  bool IsCuda = types::isCuda(Input.getType());
>>> +  bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
>>>    assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple
inputs.");
>>>
>>>    // C++ is not supported for IAMCU.
>>> @@ -3815,21 +3844,21 @@ void Clang::ConstructJob(Compilation &C,
>>>    CmdArgs.push_back("-triple");
>>>    CmdArgs.push_back(Args.MakeArgString(TripleStr));
>>>
>>> -  const ToolChain *AuxToolChain = nullptr;
>>>    if (IsCuda) {
>>> -    // FIXME: We need a (better) way to pass information about
>>> -    // particular compilation pass we're constructing here. For now we
>>> -    // can check which toolchain we're using and pick the other one to
>>> -    // extract the triple.
>>> -    if (&getToolChain() ==
C.getSingleOffloadToolChain<Action::OFK_Cuda>())
>>> -      AuxToolChain = C.getOffloadingHostToolChain();
>>> -    else if (&getToolChain() == C.getOffloadingHostToolChain())
>>> -      AuxToolChain = C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>> -    else
>>> -      llvm_unreachable("Can't figure out CUDA compilation mode.");
>>> -    assert(AuxToolChain != nullptr && "No aux toolchain.");
>>> +    // We have to pass the triple of the host if compiling for a CUDA
device and
>>> +    // vice-versa.
>>> +    StringRef NormalizedTriple;
>>> +    if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>> +      NormalizedTriple =
C.getSingleOffloadToolChain<Action::OFK_Host>()
>>> +                             ->getTriple()
>>> +                             .normalize();
>>> +    else
>>> +      NormalizedTriple =
C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>> +                             ->getTriple()
>>> +                             .normalize();
>>> +
>>>      CmdArgs.push_back("-aux-triple");
>>> -
CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));
>>> +    CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
>>>    }
>>>
>>>    if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm ||
>>> @@ -4718,8 +4747,7 @@ void Clang::ConstructJob(Compilation &C,
>>>    //
>>>    // FIXME: Support -fpreprocessed
>>>    if (types::getPreprocessedType(InputType) != types::TY_INVALID)
>>> -    AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs,
>>> -                            AuxToolChain);
>>> +    AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs);
>>>
>>>    // Don't warn about "clang -c -DPIC -fPIC test.i" because libtool.m4
assumes
>>>    // that "The compiler can only warn and ignore the option if not
recognized".
>>> @@ -11193,15 +11221,14 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>        static_cast<const toolchains::CudaToolChain &>(getToolChain());
>>>    assert(TC.getTriple().isNVPTX() && "Wrong platform");
>>>
>>> -  std::vector<std::string> gpu_archs =
>>> -      Args.getAllArgValues(options::OPT_march_EQ);
>>> -  assert(gpu_archs.size() == 1 && "Exactly one GPU Arch required for
ptxas.");
>>> -  const std::string& gpu_arch = gpu_archs[0];
>>> +  // Obtain architecture from the action.
>>> +  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
>>> +  assert(gpu_arch != CudaArch::UNKNOWN &&
>>> +         "Device action expected to have an architecture.");
>>>
>>>    // Check that our installation's ptxas supports gpu_arch.
>>>    if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
>>> -    TC.cudaInstallation().CheckCudaVersionSupportsArch(
>>> -        StringToCudaArch(gpu_arch));
>>> +    TC.cudaInstallation().CheckCudaVersionSupportsArch(gpu_arch);
>>>    }
>>>
>>>    ArgStringList CmdArgs;
>>> @@ -11245,7 +11272,7 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>    }
>>>
>>>    CmdArgs.push_back("--gpu-name");
>>> -  CmdArgs.push_back(Args.MakeArgString(gpu_arch));
>>> +  CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
>>>    CmdArgs.push_back("--output-file");
>>>    CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>    for (const auto& II : Inputs)
>>> @@ -11277,13 +11304,20 @@ void NVPTX::Linker::ConstructJob(Compila
>>>    CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>
>>>    for (const auto& II : Inputs) {
>>> -    auto* A = cast<const CudaDeviceAction>(II.getAction());
>>> +    auto *A = II.getAction();
>>> +    assert(A->getInputs().size() == 1 &&
>>> +           "Device offload action is expected to have a single input");
>>> +    const char *gpu_arch_str = A->getOffloadingArch();
>>> +    assert(gpu_arch_str &&
>>> +           "Device action expected to have associated a GPU
architecture!");
>>> +    CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);
>>> +
>>>      // We need to pass an Arch of the form "sm_XX" for cubin files and
>>>      // "compute_XX" for ptx.
>>>      const char *Arch =
>>>          (II.getType() == types::TY_PP_Asm)
>>> -            ?
CudaVirtualArchToString(VirtualArchForCudaArch(A->getGpuArch()))
>>> -            : CudaArchToString(A->getGpuArch());
>>> +            ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
>>> +            : gpu_arch_str;
>>>
 CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
>>>                                           Arch + ",file=" +
II.getFilename()));
>>>    }
>>>
>>> Modified: cfe/trunk/lib/Driver/Tools.h
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.h?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Driver/Tools.h (original)
>>> +++ cfe/trunk/lib/Driver/Tools.h Fri Jul 15 18:13:27 2016
>>> @@ -57,8 +57,7 @@ private:
>>>                                 const Driver &D, const
llvm::opt::ArgList &Args,
>>>                                 llvm::opt::ArgStringList &CmdArgs,
>>>                                 const InputInfo &Output,
>>> -                               const InputInfoList &Inputs,
>>> -                               const ToolChain *AuxToolChain) const;
>>> +                               const InputInfoList &Inputs) const;
>>>
>>>    void AddAArch64TargetArgs(const llvm::opt::ArgList &Args,
>>>                              llvm::opt::ArgStringList &CmdArgs) const;
>>>
>>> Modified: cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>
==============================================================================
>>> --- cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
(original)
>>> +++ cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp Fri Jul
15 18:13:27 2016
>>> @@ -60,25 +60,25 @@ clang::createInvocationFromCommandLine(A
>>>    }
>>>
>>>    // We expect to get back exactly one command job, if we didn't
something
>>> -  // failed. CUDA compilation is an exception as it creates multiple
jobs. If
>>> -  // that's the case, we proceed with the first job. If caller needs
particular
>>> -  // CUDA job, it should be controlled via --cuda-{host|device}-only
option
>>> -  // passed to the driver.
>>> +  // failed. Offload compilation is an exception as it creates
multiple jobs. If
>>> +  // that's the case, we proceed with the first job. If caller needs a
>>> +  // particular job, it should be controlled via options (e.g.
>>> +  // --cuda-{host|device}-only for CUDA) passed to the driver.
>>>    const driver::JobList &Jobs = C->getJobs();
>>> -  bool CudaCompilation = false;
>>> +  bool OffloadCompilation = false;
>>>    if (Jobs.size() > 1) {
>>>      for (auto &A : C->getActions()){
>>>        // On MacOSX real actions may end up being wrapped in
BindArchAction
>>>        if (isa<driver::BindArchAction>(A))
>>>          A = *A->input_begin();
>>> -      if (isa<driver::CudaDeviceAction>(A)) {
>>> -        CudaCompilation = true;
>>> +      if (isa<driver::OffloadAction>(A)) {
>>> +        OffloadCompilation = true;
>>>          break;
>>>        }
>>>      }
>>>    }
>>>    if (Jobs.size() == 0 || !isa<driver::Command>(*Jobs.begin()) ||
>>> -      (Jobs.size() > 1 && !CudaCompilation)) {
>>> +      (Jobs.size() > 1 && !OffloadCompilation)) {
>>>      SmallString<256> Msg;
>>>      llvm::raw_svector_ostream OS(Msg);
>>>      Jobs.Print(OS, "; ", true);
>>>
>>> Added: cfe/trunk/test/Driver/cuda_phases.cu
>>> URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/cuda_phases.cu?rev=275645&view=auto
>>>
==============================================================================
>>> --- cfe/trunk/test/Driver/cuda_phases.cu (added)
>>> +++ cfe/trunk/test/Driver/cuda_phases.cu Fri Jul 15 18:13:27 2016
>>> @@ -0,0 +1,206 @@
>>> +// Tests the phases generated for a CUDA offloading target for
different
>>> +// combinations of:
>>> +// - Number of gpu architectures;
>>> +// - Host/device-only compilation;
>>> +// - User-requested final phase - binary or assembly.
>>> +
>>> +// REQUIRES: clang-driver
>>> +// REQUIRES: powerpc-registered-target
>>> +// REQUIRES: nvptx-registered-target
>>> +
>>> +//
>>> +// Test single gpu architecture with complete compilation.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=BIN %s
>>> +// BIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// BIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// BIN: 2: compiler, {1}, ir, (host-cuda)
>>> +// BIN: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// BIN: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// BIN: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>> +// BIN: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>> +// BIN: 7: assembler, {6}, object, (device-cuda, sm_30)
>>> +// BIN: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
object
>>> +// BIN: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
assembler
>>> +// BIN: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
>>> +// BIN: 11: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
"device-cuda (nvptx64-nvidia-cuda)" {10}, ir
>>> +// BIN: 12: backend, {11}, assembler, (host-cuda)
>>> +// BIN: 13: assembler, {12}, object, (host-cuda)
>>> +// BIN: 14: linker, {13}, image, (host-cuda)
>>> +
>>> +//
>>> +// Test single gpu architecture up to the assemble phase.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=ASM %s
>>> +// ASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// ASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// ASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// ASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// ASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
assembler
>>> +// ASM: 5: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// ASM: 6: preprocessor, {5}, cuda-cpp-output, (host-cuda)
>>> +// ASM: 7: compiler, {6}, ir, (host-cuda)
>>> +// ASM: 8: backend, {7}, assembler, (host-cuda)
>>> +
>>> +//
>>> +// Test two gpu architectures with complete compilation.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=BIN2 %s
>>> +// BIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// BIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// BIN2: 2: compiler, {1}, ir, (host-cuda)
>>> +// BIN2: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// BIN2: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// BIN2: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>> +// BIN2: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>> +// BIN2: 7: assembler, {6}, object, (device-cuda, sm_30)
>>> +// BIN2: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
object
>>> +// BIN2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
assembler
>>> +// BIN2: 10: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>> +// BIN2: 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_35)
>>> +// BIN2: 12: compiler, {11}, ir, (device-cuda, sm_35)
>>> +// BIN2: 13: backend, {12}, assembler, (device-cuda, sm_35)
>>> +// BIN2: 14: assembler, {13}, object, (device-cuda, sm_35)
>>> +// BIN2: 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {14},
object
>>> +// BIN2: 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {13},
assembler
>>> +// BIN2: 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>> +// BIN2: 18: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
"device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>> +// BIN2: 19: backend, {18}, assembler, (host-cuda)
>>> +// BIN2: 20: assembler, {19}, object, (host-cuda)
>>> +// BIN2: 21: linker, {20}, image, (host-cuda)
>>> +
>>> +//
>>> +// Test two gpu architecturess up to the assemble phase.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=ASM2 %s
>>> +// ASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// ASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// ASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// ASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// ASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
assembler
>>> +// ASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>> +// ASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>> +// ASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>> +// ASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>> +// ASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
assembler
>>> +// ASM2: 10: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// ASM2: 11: preprocessor, {10}, cuda-cpp-output, (host-cuda)
>>> +// ASM2: 12: compiler, {11}, ir, (host-cuda)
>>> +// ASM2: 13: backend, {12}, assembler, (host-cuda)
>>> +
>>> +//
>>> +// Test single gpu architecture with complete compilation in host-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=HBIN %s
>>> +// HBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// HBIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// HBIN: 2: compiler, {1}, ir, (host-cuda)
>>> +// HBIN: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>> +// HBIN: 4: backend, {3}, assembler, (host-cuda)
>>> +// HBIN: 5: assembler, {4}, object, (host-cuda)
>>> +// HBIN: 6: linker, {5}, image, (host-cuda)
>>> +
>>> +//
>>> +// Test single gpu architecture up to the assemble phase in host-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=HASM %s
>>> +// HASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// HASM: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// HASM: 2: compiler, {1}, ir, (host-cuda)
>>> +// HASM: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>> +// HASM: 4: backend, {3}, assembler, (host-cuda)
>>> +
>>> +//
>>> +// Test two gpu architectures with complete compilation in host-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=HBIN2 %s
>>> +// HBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// HBIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// HBIN2: 2: compiler, {1}, ir, (host-cuda)
>>> +// HBIN2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>> +// HBIN2: 4: backend, {3}, assembler, (host-cuda)
>>> +// HBIN2: 5: assembler, {4}, object, (host-cuda)
>>> +// HBIN2: 6: linker, {5}, image, (host-cuda)
>>> +
>>> +//
>>> +// Test two gpu architectures up to the assemble phase in host-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=HASM2 %s
>>> +// HASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>> +// HASM2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>> +// HASM2: 2: compiler, {1}, ir, (host-cuda)
>>> +// HASM2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>> +// HASM2: 4: backend, {3}, assembler, (host-cuda)
>>> +
>>> +//
>>> +// Test single gpu architecture with complete compilation in
device-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=DBIN %s
>>> +// DBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// DBIN: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// DBIN: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// DBIN: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// DBIN: 4: assembler, {3}, object, (device-cuda, sm_30)
>>> +// DBIN: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
object
>>> +
>>> +//
>>> +// Test single gpu architecture up to the assemble phase in device-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=DASM %s
>>> +// DASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// DASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// DASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// DASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// DASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
assembler
>>> +
>>> +//
>>> +// Test two gpu architectures with complete compilation in device-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=DBIN2 %s
>>> +// DBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// DBIN2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// DBIN2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// DBIN2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// DBIN2: 4: assembler, {3}, object, (device-cuda, sm_30)
>>> +// DBIN2: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
object
>>> +// DBIN2: 6: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>> +// DBIN2: 7: preprocessor, {6}, cuda-cpp-output, (device-cuda, sm_35)
>>> +// DBIN2: 8: compiler, {7}, ir, (device-cuda, sm_35)
>>> +// DBIN2: 9: backend, {8}, assembler, (device-cuda, sm_35)
>>> +// DBIN2: 10: assembler, {9}, object, (device-cuda, sm_35)
>>> +// DBIN2: 11: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {10},
object
>>> +
>>> +//
>>> +// Test two gpu architectures up to the assemble phase in device-only
>>> +// compilation mode.
>>> +//
>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
--cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
>>> +// RUN: | FileCheck -check-prefix=DASM2 %s
>>> +// DASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>> +// DASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>> +// DASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>> +// DASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>> +// DASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
assembler
>>> +// DASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>> +// DASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>> +// DASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>> +// DASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>> +// DASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
assembler
>>>
>>>
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160718/71105040/attachment-0001.html>