r275645 - [CUDA][OpenMP] Create generic offload action

Mon Jul 18 16:38:32 PDT 2016

On Mon, Jul 18, 2016 at 4:28 PM, Artem Belevich <tra at google.com> wrote:

> CXX headers had to be added twice because we needed them for both host and
> device side of compilation, but only *host* toolchain knew where to find
> them. That's the part under "Add C++ include arguments."
>
> The second copy under "isIAMCU" below was added in r272883 and should
> indeed be removed.
>

Correction - r272883 is *not* the culprit. My commit r253386 is the source
of the mess.
I believe it was a copy/paste error and the intent was to
call AuxToolChain->.AddClangSystemIncludeArgs(Args, CmdArgs) for non-C++
files.

--Artem

>
> On Mon, Jul 18, 2016 at 4:03 PM, Samuel F Antao <sfantao at us.ibm.com>
> wrote:
>
>> Hi Richard,
>>
>> I agree, I don't think the second `addExtraOffloadCXXStdlibIncludeArgs`
>> is required. When I did this change my focus was to maintain functionality
>> of the existing code. I can confirm that removing that passes the existent
>> tests successfully. It is possible, however, there is some use case for the
>> existing CUDA implementation that requires C++ include paths to be included
>> for non C++  input types?
>>
>> Art, Justin can you confirm that is the case? If not, should I go ahead
>> and remove the duplicated code?
>>
>> Thanks!
>> Samuel
>>
>> On Mon, Jul 18, 2016 at 5:45 PM, Richard Smith via cfe-commits <
>> cfe-commits at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Fri, Jul 15, 2016 at 4:13 PM, Samuel Antao via cfe-commits <
>>> cfe-commits at lists.llvm.org> wrote:
>>>
>>>> Author: sfantao
>>>> Date: Fri Jul 15 18:13:27 2016
>>>> New Revision: 275645
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=275645&view=rev
>>>> Log:
>>>> [CUDA][OpenMP] Create generic offload action
>>>>
>>>> Summary:
>>>> This patch replaces the CUDA specific action by a generic offload
>>>> action. The offload action may have multiple dependences classier in “host”
>>>> and “device”. The way this generic offloading action is used is very
>>>> similar to what is done today by the CUDA implementation: it is used to set
>>>> a specific toolchain and architecture to its dependences during the
>>>> generation of jobs.
>>>>
>>>> This patch also proposes propagating the offloading information through
>>>> the action graph so that that information can be easily retrieved at any
>>>> time during the generation of commands. This allows e.g. the "clang tool”
>>>> to evaluate whether CUDA should be supported for the device or host and
>>>> ptas to easily retrieve the target architecture.
>>>>
>>>> This is an example of how the action graphs would look like
>>>> (compilation of a single CUDA file with two GPU architectures)
>>>> ```
>>>> 0: input, "cudatests.cu", cuda, (host-cuda)
>>>> 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> 2: compiler, {1}, ir, (host-cuda)
>>>> 3: input, "cudatests.cu", cuda, (device-cuda, sm_35)
>>>> 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_35)
>>>> 5: compiler, {4}, ir, (device-cuda, sm_35)
>>>> 6: backend, {5}, assembler, (device-cuda, sm_35)
>>>> 7: assembler, {6}, object, (device-cuda, sm_35)
>>>> 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {7}, object
>>>> 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {6}, assembler
>>>> 10: input, "cudatests.cu", cuda, (device-cuda, sm_37)
>>>> 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_37)
>>>> 12: compiler, {11}, ir, (device-cuda, sm_37)
>>>> 13: backend, {12}, assembler, (device-cuda, sm_37)
>>>> 14: assembler, {13}, object, (device-cuda, sm_37)
>>>> 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {14}, object
>>>> 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {13}, assembler
>>>> 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>>> 18: offload, "host-cuda (powerpc64le-unknown-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>>> 19: backend, {18}, assembler
>>>> 20: assembler, {19}, object
>>>> 21: input, "cuda", object
>>>> 22: input, "cudart", object
>>>> 23: linker, {20, 21, 22}, image
>>>> ```
>>>> The changes in this patch pass the existent regression tests (keeps the
>>>> existent functionality) and resulting binaries execute correctly in a
>>>> Power8+K40 machine.
>>>>
>>>> Reviewers: echristo, hfinkel, jlebar, ABataev, tra
>>>>
>>>> Subscribers: guansong, andreybokhanko, tcramer, mkuron, cfe-commits,
>>>> arpith-jacob, carlo.bertolli, caomhin
>>>>
>>>> Differential Revision: https://reviews.llvm.org/D18171
>>>>
>>>> Added:
>>>>     cfe/trunk/test/Driver/cuda_phases.cu
>>>> Modified:
>>>>     cfe/trunk/include/clang/Driver/Action.h
>>>>     cfe/trunk/include/clang/Driver/Compilation.h
>>>>     cfe/trunk/include/clang/Driver/Driver.h
>>>>     cfe/trunk/lib/Driver/Action.cpp
>>>>     cfe/trunk/lib/Driver/Driver.cpp
>>>>     cfe/trunk/lib/Driver/ToolChain.cpp
>>>>     cfe/trunk/lib/Driver/Tools.cpp
>>>>     cfe/trunk/lib/Driver/Tools.h
>>>>     cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Action.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Action.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Action.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Action.h Fri Jul 15 18:13:27 2016
>>>> @@ -13,6 +13,7 @@
>>>>  #include "clang/Basic/Cuda.h"
>>>>  #include "clang/Driver/Types.h"
>>>>  #include "clang/Driver/Util.h"
>>>> +#include "llvm/ADT/STLExtras.h"
>>>>  #include "llvm/ADT/SmallVector.h"
>>>>
>>>>  namespace llvm {
>>>> @@ -27,6 +28,8 @@ namespace opt {
>>>>  namespace clang {
>>>>  namespace driver {
>>>>
>>>> +class ToolChain;
>>>> +
>>>>  /// Action - Represent an abstract compilation step to perform.
>>>>  ///
>>>>  /// An action represents an edge in the compilation graph; typically
>>>> @@ -50,8 +53,7 @@ public:
>>>>    enum ActionClass {
>>>>      InputClass = 0,
>>>>      BindArchClass,
>>>> -    CudaDeviceClass,
>>>> -    CudaHostClass,
>>>> +    OffloadClass,
>>>>      PreprocessJobClass,
>>>>      PrecompileJobClass,
>>>>      AnalyzeJobClass,
>>>> @@ -65,17 +67,13 @@ public:
>>>>      VerifyDebugInfoJobClass,
>>>>      VerifyPCHJobClass,
>>>>
>>>> -    JobClassFirst=PreprocessJobClass,
>>>> -    JobClassLast=VerifyPCHJobClass
>>>> +    JobClassFirst = PreprocessJobClass,
>>>> +    JobClassLast = VerifyPCHJobClass
>>>>    };
>>>>
>>>>    // The offloading kind determines if this action is binded to a
>>>> particular
>>>>    // programming model. Each entry reserves one bit. We also have a
>>>> special kind
>>>>    // to designate the host offloading tool chain.
>>>> -  //
>>>> -  // FIXME: This is currently used to indicate that tool chains are
>>>> used in a
>>>> -  // given programming, but will be used here as well once a generic
>>>> offloading
>>>> -  // action is implemented.
>>>>    enum OffloadKind {
>>>>      OFK_None = 0x00,
>>>>      // The host offloading tool chain.
>>>> @@ -95,6 +93,19 @@ private:
>>>>    ActionList Inputs;
>>>>
>>>>  protected:
>>>> +  ///
>>>> +  /// Offload information.
>>>> +  ///
>>>> +
>>>> +  /// The host offloading kind - a combination of kinds encoded in a
>>>> mask.
>>>> +  /// Multiple programming models may be supported simultaneously by
>>>> the same
>>>> +  /// host.
>>>> +  unsigned ActiveOffloadKindMask = 0u;
>>>> +  /// Offloading kind of the device.
>>>> +  OffloadKind OffloadingDeviceKind = OFK_None;
>>>> +  /// The Offloading architecture associated with this action.
>>>> +  const char *OffloadingArch = nullptr;
>>>> +
>>>>    Action(ActionClass Kind, types::ID Type) : Action(Kind,
>>>> ActionList(), Type) {}
>>>>    Action(ActionClass Kind, Action *Input, types::ID Type)
>>>>        : Action(Kind, ActionList({Input}), Type) {}
>>>> @@ -124,6 +135,40 @@ public:
>>>>    input_const_range inputs() const {
>>>>      return input_const_range(input_begin(), input_end());
>>>>    }
>>>> +
>>>> +  /// Return a string containing the offload kind of the action.
>>>> +  std::string getOffloadingKindPrefix() const;
>>>> +  /// Return a string that can be used as prefix in order to generate
>>>> unique
>>>> +  /// files for each offloading kind.
>>>> +  std::string getOffloadingFileNamePrefix(StringRef NormalizedTriple)
>>>> const;
>>>> +
>>>> +  /// Set the device offload info of this action and propagate it to
>>>> its
>>>> +  /// dependences.
>>>> +  void propagateDeviceOffloadInfo(OffloadKind OKind, const char
>>>> *OArch);
>>>> +  /// Append the host offload info of this action and propagate it to
>>>> its
>>>> +  /// dependences.
>>>> +  void propagateHostOffloadInfo(unsigned OKinds, const char *OArch);
>>>> +  /// Set the offload info of this action to be the same as the
>>>> provided action,
>>>> +  /// and propagate it to its dependences.
>>>> +  void propagateOffloadInfo(const Action *A);
>>>> +
>>>> +  unsigned getOffloadingHostActiveKinds() const {
>>>> +    return ActiveOffloadKindMask;
>>>> +  }
>>>> +  OffloadKind getOffloadingDeviceKind() const { return
>>>> OffloadingDeviceKind; }
>>>> +  const char *getOffloadingArch() const { return OffloadingArch; }
>>>> +
>>>> +  /// Check if this action have any offload kinds. Note that host
>>>> offload kinds
>>>> +  /// are only set if the action is a dependence to a host offload
>>>> action.
>>>> +  bool isHostOffloading(OffloadKind OKind) const {
>>>> +    return ActiveOffloadKindMask & OKind;
>>>> +  }
>>>> +  bool isDeviceOffloading(OffloadKind OKind) const {
>>>> +    return OffloadingDeviceKind == OKind;
>>>> +  }
>>>> +  bool isOffloading(OffloadKind OKind) const {
>>>> +    return isHostOffloading(OKind) || isDeviceOffloading(OKind);
>>>> +  }
>>>>  };
>>>>
>>>>  class InputAction : public Action {
>>>> @@ -156,39 +201,126 @@ public:
>>>>    }
>>>>  };
>>>>
>>>> -class CudaDeviceAction : public Action {
>>>> +/// An offload action combines host or/and device actions according to
>>>> the
>>>> +/// programming model implementation needs and propagates the
>>>> offloading kind to
>>>> +/// its dependences.
>>>> +class OffloadAction final : public Action {
>>>>    virtual void anchor();
>>>>
>>>> -  const CudaArch GpuArch;
>>>> -
>>>> -  /// True when action results are not consumed by the host action
>>>> (e.g when
>>>> -  /// -fsyntax-only or --cuda-device-only options are used).
>>>> -  bool AtTopLevel;
>>>> -
>>>>  public:
>>>> -  CudaDeviceAction(Action *Input, CudaArch Arch, bool AtTopLevel);
>>>> +  /// Type used to communicate device actions. It associates bound
>>>> architecture,
>>>> +  /// toolchain, and offload kind to each action.
>>>> +  class DeviceDependences final {
>>>> +  public:
>>>> +    typedef SmallVector<const ToolChain *, 3> ToolChainList;
>>>> +    typedef SmallVector<const char *, 3> BoundArchList;
>>>> +    typedef SmallVector<OffloadKind, 3> OffloadKindList;
>>>> +
>>>> +  private:
>>>> +    // Lists that keep the information for each dependency. All the
>>>> lists are
>>>> +    // meant to be updated in sync. We are adopting separate lists
>>>> instead of a
>>>> +    // list of structs, because that simplifies forwarding the actions
>>>> list to
>>>> +    // initialize the inputs of the base Action class.
>>>> +
>>>> +    /// The dependence actions.
>>>> +    ActionList DeviceActions;
>>>> +    /// The offloading toolchains that should be used with the action.
>>>> +    ToolChainList DeviceToolChains;
>>>> +    /// The architectures that should be used with this action.
>>>> +    BoundArchList DeviceBoundArchs;
>>>> +    /// The offload kind of each dependence.
>>>> +    OffloadKindList DeviceOffloadKinds;
>>>> +
>>>> +  public:
>>>> +    /// Add a action along with the associated toolchain, bound arch,
>>>> and
>>>> +    /// offload kind.
>>>> +    void add(Action &A, const ToolChain &TC, const char *BoundArch,
>>>> +             OffloadKind OKind);
>>>> +
>>>> +    /// Get each of the individual arrays.
>>>> +    const ActionList &getActions() const { return DeviceActions; };
>>>> +    const ToolChainList &getToolChains() const { return
>>>> DeviceToolChains; };
>>>> +    const BoundArchList &getBoundArchs() const { return
>>>> DeviceBoundArchs; };
>>>> +    const OffloadKindList &getOffloadKinds() const {
>>>> +      return DeviceOffloadKinds;
>>>> +    };
>>>> +  };
>>>>
>>>> -  /// Get the CUDA GPU architecture to which this Action corresponds.
>>>> Returns
>>>> -  /// UNKNOWN if this Action corresponds to multiple architectures.
>>>> -  CudaArch getGpuArch() const { return GpuArch; }
>>>> +  /// Type used to communicate host actions. It associates bound
>>>> architecture,
>>>> +  /// toolchain, and offload kinds to the host action.
>>>> +  class HostDependence final {
>>>> +    /// The dependence action.
>>>> +    Action &HostAction;
>>>> +    /// The offloading toolchain that should be used with the action.
>>>> +    const ToolChain &HostToolChain;
>>>> +    /// The architectures that should be used with this action.
>>>> +    const char *HostBoundArch = nullptr;
>>>> +    /// The offload kind of each dependence.
>>>> +    unsigned HostOffloadKinds = 0u;
>>>> +
>>>> +  public:
>>>> +    HostDependence(Action &A, const ToolChain &TC, const char
>>>> *BoundArch,
>>>> +                   const unsigned OffloadKinds)
>>>> +        : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch),
>>>> +          HostOffloadKinds(OffloadKinds){};
>>>> +    /// Constructor version that obtains the offload kinds from the
>>>> device
>>>> +    /// dependencies.
>>>> +    HostDependence(Action &A, const ToolChain &TC, const char
>>>> *BoundArch,
>>>> +                   const DeviceDependences &DDeps);
>>>> +    Action *getAction() const { return &HostAction; };
>>>> +    const ToolChain *getToolChain() const { return &HostToolChain; };
>>>> +    const char *getBoundArch() const { return HostBoundArch; };
>>>> +    unsigned getOffloadKinds() const { return HostOffloadKinds; };
>>>> +  };
>>>>
>>>> -  bool isAtTopLevel() const { return AtTopLevel; }
>>>> +  typedef llvm::function_ref<void(Action *, const ToolChain *, const
>>>> char *)>
>>>> +      OffloadActionWorkTy;
>>>>
>>>> -  static bool classof(const Action *A) {
>>>> -    return A->getKind() == CudaDeviceClass;
>>>> -  }
>>>> -};
>>>> +private:
>>>> +  /// The host offloading toolchain that should be used with the
>>>> action.
>>>> +  const ToolChain *HostTC = nullptr;
>>>>
>>>> -class CudaHostAction : public Action {
>>>> -  virtual void anchor();
>>>> -  ActionList DeviceActions;
>>>> +  /// The tool chains associated with the list of actions.
>>>> +  DeviceDependences::ToolChainList DevToolChains;
>>>>
>>>>  public:
>>>> -  CudaHostAction(Action *Input, const ActionList &DeviceActions);
>>>> -
>>>> -  const ActionList &getDeviceActions() const { return DeviceActions; }
>>>> +  OffloadAction(const HostDependence &HDep);
>>>> +  OffloadAction(const DeviceDependences &DDeps, types::ID Ty);
>>>> +  OffloadAction(const HostDependence &HDep, const DeviceDependences
>>>> &DDeps);
>>>> +
>>>> +  /// Execute the work specified in \a Work on the host dependence.
>>>> +  void doOnHostDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> +  /// Execute the work specified in \a Work on each device dependence.
>>>> +  void doOnEachDeviceDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> +  /// Execute the work specified in \a Work on each dependence.
>>>> +  void doOnEachDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> +  /// Execute the work specified in \a Work on each host or device
>>>> dependence if
>>>> +  /// \a IsHostDependenceto is true or false, respectively.
>>>> +  void doOnEachDependence(bool IsHostDependence,
>>>> +                          const OffloadActionWorkTy &Work) const;
>>>> +
>>>> +  /// Return true if the action has a host dependence.
>>>> +  bool hasHostDependence() const;
>>>> +
>>>> +  /// Return the host dependence of this action. This function is only
>>>> expected
>>>> +  /// to be called if the host dependence exists.
>>>> +  Action *getHostDependence() const;
>>>> +
>>>> +  /// Return true if the action has a single device dependence. If \a
>>>> +  /// DoNotConsiderHostActions is set, ignore the host dependence, if
>>>> any, while
>>>> +  /// accounting for the number of dependences.
>>>> +  bool hasSingleDeviceDependence(bool DoNotConsiderHostActions =
>>>> false) const;
>>>> +
>>>> +  /// Return the single device dependence of this action. This
>>>> function is only
>>>> +  /// expected to be called if a single device dependence exists. If \a
>>>> +  /// DoNotConsiderHostActions is set, a host dependence is allowed.
>>>> +  Action *
>>>> +  getSingleDeviceDependence(bool DoNotConsiderHostActions = false)
>>>> const;
>>>>
>>>> -  static bool classof(const Action *A) { return A->getKind() ==
>>>> CudaHostClass; }
>>>> +  static bool classof(const Action *A) { return A->getKind() ==
>>>> OffloadClass; }
>>>>  };
>>>>
>>>>  class JobAction : public Action {
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Compilation.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Compilation.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Compilation.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Compilation.h Fri Jul 15 18:13:27
>>>> 2016
>>>> @@ -98,12 +98,7 @@ public:
>>>>    const Driver &getDriver() const { return TheDriver; }
>>>>
>>>>    const ToolChain &getDefaultToolChain() const { return
>>>> DefaultToolChain; }
>>>> -  const ToolChain *getOffloadingHostToolChain() const {
>>>> -    auto It = OrderedOffloadingToolchains.find(Action::OFK_Host);
>>>> -    if (It != OrderedOffloadingToolchains.end())
>>>> -      return It->second;
>>>> -    return nullptr;
>>>> -  }
>>>> +
>>>>    unsigned isOffloadingHostKind(Action::OffloadKind Kind) const {
>>>>      return ActiveOffloadMask & Kind;
>>>>    }
>>>> @@ -121,8 +116,8 @@ public:
>>>>      return OrderedOffloadingToolchains.equal_range(Kind);
>>>>    }
>>>>
>>>> -  // Return an offload toolchain of the provided kind. Only one is
>>>> expected to
>>>> -  // exist.
>>>> +  /// Return an offload toolchain of the provided kind. Only one is
>>>> expected to
>>>> +  /// exist.
>>>>    template <Action::OffloadKind Kind>
>>>>    const ToolChain *getSingleOffloadToolChain() const {
>>>>      auto TCs = getOffloadToolChains<Kind>();
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Driver.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Driver.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Driver.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Driver.h Fri Jul 15 18:13:27 2016
>>>> @@ -394,12 +394,13 @@ public:
>>>>    /// BuildJobsForAction - Construct the jobs to perform for the
>>>> action \p A and
>>>>    /// return an InputInfo for the result of running \p A.  Will only
>>>> construct
>>>>    /// jobs for a given (Action, ToolChain, BoundArch) tuple once.
>>>> -  InputInfo BuildJobsForAction(Compilation &C, const Action *A,
>>>> -                               const ToolChain *TC, const char
>>>> *BoundArch,
>>>> -                               bool AtTopLevel, bool MultipleArchs,
>>>> -                               const char *LinkingOutput,
>>>> -                               std::map<std::pair<const Action *,
>>>> std::string>,
>>>> -                                        InputInfo> &CachedResults)
>>>> const;
>>>> +  InputInfo
>>>> +  BuildJobsForAction(Compilation &C, const Action *A, const ToolChain
>>>> *TC,
>>>> +                     const char *BoundArch, bool AtTopLevel, bool
>>>> MultipleArchs,
>>>> +                     const char *LinkingOutput,
>>>> +                     std::map<std::pair<const Action *, std::string>,
>>>> InputInfo>
>>>> +                         &CachedResults,
>>>> +                     bool BuildForOffloadDevice) const;
>>>>
>>>>    /// Returns the default name for linked images (e.g., "a.out").
>>>>    const char *getDefaultImageName() const;
>>>> @@ -415,12 +416,11 @@ public:
>>>>    /// \param BoundArch - The bound architecture.
>>>>    /// \param AtTopLevel - Whether this is a "top-level" action.
>>>>    /// \param MultipleArchs - Whether multiple -arch options were
>>>> supplied.
>>>> -  const char *GetNamedOutputPath(Compilation &C,
>>>> -                                 const JobAction &JA,
>>>> -                                 const char *BaseInput,
>>>> -                                 const char *BoundArch,
>>>> -                                 bool AtTopLevel,
>>>> -                                 bool MultipleArchs) const;
>>>> +  /// \param NormalizedTriple - The normalized triple of the relevant
>>>> target.
>>>> +  const char *GetNamedOutputPath(Compilation &C, const JobAction &JA,
>>>> +                                 const char *BaseInput, const char
>>>> *BoundArch,
>>>> +                                 bool AtTopLevel, bool MultipleArchs,
>>>> +                                 StringRef NormalizedTriple) const;
>>>>
>>>>    /// GetTemporaryPath - Return the pathname of a temporary file to use
>>>>    /// as part of compilation; the file will have the given prefix and
>>>> suffix.
>>>> @@ -467,7 +467,8 @@ private:
>>>>        const char *BoundArch, bool AtTopLevel, bool MultipleArchs,
>>>>        const char *LinkingOutput,
>>>>        std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> -          &CachedResults) const;
>>>> +          &CachedResults,
>>>> +      bool BuildForOffloadDevice) const;
>>>>
>>>>  public:
>>>>    /// GetReleaseVersion - Parse (([0-9]+)(.([0-9]+)(.([0-9]+)?))?)? and
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Action.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Action.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Action.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Action.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -8,6 +8,7 @@
>>>>
>>>>  //===----------------------------------------------------------------------===//
>>>>
>>>>  #include "clang/Driver/Action.h"
>>>> +#include "clang/Driver/ToolChain.h"
>>>>  #include "llvm/ADT/StringSwitch.h"
>>>>  #include "llvm/Support/ErrorHandling.h"
>>>>  #include "llvm/Support/Regex.h"
>>>> @@ -21,8 +22,8 @@ const char *Action::getClassName(ActionC
>>>>    switch (AC) {
>>>>    case InputClass: return "input";
>>>>    case BindArchClass: return "bind-arch";
>>>> -  case CudaDeviceClass: return "cuda-device";
>>>> -  case CudaHostClass: return "cuda-host";
>>>> +  case OffloadClass:
>>>> +    return "offload";
>>>>    case PreprocessJobClass: return "preprocessor";
>>>>    case PrecompileJobClass: return "precompiler";
>>>>    case AnalyzeJobClass: return "analyzer";
>>>> @@ -40,6 +41,82 @@ const char *Action::getClassName(ActionC
>>>>    llvm_unreachable("invalid class");
>>>>  }
>>>>
>>>> +void Action::propagateDeviceOffloadInfo(OffloadKind OKind, const char
>>>> *OArch) {
>>>> +  // Offload action set its own kinds on their dependences.
>>>> +  if (Kind == OffloadClass)
>>>> +    return;
>>>> +
>>>> +  assert((OffloadingDeviceKind == OKind || OffloadingDeviceKind ==
>>>> OFK_None) &&
>>>> +         "Setting device kind to a different device??");
>>>> +  assert(!ActiveOffloadKindMask && "Setting a device kind in a host
>>>> action??");
>>>> +  OffloadingDeviceKind = OKind;
>>>> +  OffloadingArch = OArch;
>>>> +
>>>> +  for (auto *A : Inputs)
>>>> +    A->propagateDeviceOffloadInfo(OffloadingDeviceKind, OArch);
>>>> +}
>>>> +
>>>> +void Action::propagateHostOffloadInfo(unsigned OKinds, const char
>>>> *OArch) {
>>>> +  // Offload action set its own kinds on their dependences.
>>>> +  if (Kind == OffloadClass)
>>>> +    return;
>>>> +
>>>> +  assert(OffloadingDeviceKind == OFK_None &&
>>>> +         "Setting a host kind in a device action.");
>>>> +  ActiveOffloadKindMask |= OKinds;
>>>> +  OffloadingArch = OArch;
>>>> +
>>>> +  for (auto *A : Inputs)
>>>> +    A->propagateHostOffloadInfo(ActiveOffloadKindMask, OArch);
>>>> +}
>>>> +
>>>> +void Action::propagateOffloadInfo(const Action *A) {
>>>> +  if (unsigned HK = A->getOffloadingHostActiveKinds())
>>>> +    propagateHostOffloadInfo(HK, A->getOffloadingArch());
>>>> +  else
>>>> +    propagateDeviceOffloadInfo(A->getOffloadingDeviceKind(),
>>>> +                               A->getOffloadingArch());
>>>> +}
>>>> +
>>>> +std::string Action::getOffloadingKindPrefix() const {
>>>> +  switch (OffloadingDeviceKind) {
>>>> +  case OFK_None:
>>>> +    break;
>>>> +  case OFK_Host:
>>>> +    llvm_unreachable("Host kind is not an offloading device kind.");
>>>> +    break;
>>>> +  case OFK_Cuda:
>>>> +    return "device-cuda";
>>>> +
>>>> +    // TODO: Add other programming models here.
>>>> +  }
>>>> +
>>>> +  if (!ActiveOffloadKindMask)
>>>> +    return "";
>>>> +
>>>> +  std::string Res("host");
>>>> +  if (ActiveOffloadKindMask & OFK_Cuda)
>>>> +    Res += "-cuda";
>>>> +
>>>> +  // TODO: Add other programming models here.
>>>> +
>>>> +  return Res;
>>>> +}
>>>> +
>>>> +std::string
>>>> +Action::getOffloadingFileNamePrefix(StringRef NormalizedTriple) const {
>>>> +  // A file prefix is only generated for device actions and consists
>>>> of the
>>>> +  // offload kind and triple.
>>>> +  if (!OffloadingDeviceKind)
>>>> +    return "";
>>>> +
>>>> +  std::string Res("-");
>>>> +  Res += getOffloadingKindPrefix();
>>>> +  Res += "-";
>>>> +  Res += NormalizedTriple;
>>>> +  return Res;
>>>> +}
>>>> +
>>>>  void InputAction::anchor() {}
>>>>
>>>>  InputAction::InputAction(const Arg &_Input, types::ID _Type)
>>>> @@ -51,16 +128,138 @@ void BindArchAction::anchor() {}
>>>>  BindArchAction::BindArchAction(Action *Input, const char *_ArchName)
>>>>      : Action(BindArchClass, Input), ArchName(_ArchName) {}
>>>>
>>>> -void CudaDeviceAction::anchor() {}
>>>> +void OffloadAction::anchor() {}
>>>> +
>>>> +OffloadAction::OffloadAction(const HostDependence &HDep)
>>>> +    : Action(OffloadClass, HDep.getAction()),
>>>> HostTC(HDep.getToolChain()) {
>>>> +  OffloadingArch = HDep.getBoundArch();
>>>> +  ActiveOffloadKindMask = HDep.getOffloadKinds();
>>>> +  HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>>> +                                             HDep.getBoundArch());
>>>> +};
>>>> +
>>>> +OffloadAction::OffloadAction(const DeviceDependences &DDeps, types::ID
>>>> Ty)
>>>> +    : Action(OffloadClass, DDeps.getActions(), Ty),
>>>> +      DevToolChains(DDeps.getToolChains()) {
>>>> +  auto &OKinds = DDeps.getOffloadKinds();
>>>> +  auto &BArchs = DDeps.getBoundArchs();
>>>> +
>>>> +  // If all inputs agree on the same kind, use it also for this action.
>>>> +  if (llvm::all_of(OKinds, [&](OffloadKind K) { return K ==
>>>> OKinds.front(); }))
>>>> +    OffloadingDeviceKind = OKinds.front();
>>>> +
>>>> +  // If we have a single dependency, inherit the architecture from it.
>>>> +  if (OKinds.size() == 1)
>>>> +    OffloadingArch = BArchs.front();
>>>> +
>>>> +  // Propagate info to the dependencies.
>>>> +  for (unsigned i = 0, e = getInputs().size(); i != e; ++i)
>>>> +    getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);
>>>> +}
>>>> +
>>>> +OffloadAction::OffloadAction(const HostDependence &HDep,
>>>> +                             const DeviceDependences &DDeps)
>>>> +    : Action(OffloadClass, HDep.getAction()),
>>>> HostTC(HDep.getToolChain()),
>>>> +      DevToolChains(DDeps.getToolChains()) {
>>>> +  // We use the kinds of the host dependence for this action.
>>>> +  OffloadingArch = HDep.getBoundArch();
>>>> +  ActiveOffloadKindMask = HDep.getOffloadKinds();
>>>> +  HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>>> +                                             HDep.getBoundArch());
>>>> +
>>>> +  // Add device inputs and propagate info to the device actions. Do
>>>> work only if
>>>> +  // we have dependencies.
>>>> +  for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)
>>>> +    if (auto *A = DDeps.getActions()[i]) {
>>>> +      getInputs().push_back(A);
>>>> +      A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],
>>>> +                                    DDeps.getBoundArchs()[i]);
>>>> +    }
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnHostDependence(const OffloadActionWorkTy
>>>> &Work) const {
>>>> +  if (!HostTC)
>>>> +    return;
>>>> +  assert(!getInputs().empty() && "No dependencies for offload
>>>> action??");
>>>> +  auto *A = getInputs().front();
>>>> +  Work(A, HostTC, A->getOffloadingArch());
>>>> +}
>>>>
>>>> -CudaDeviceAction::CudaDeviceAction(Action *Input, clang::CudaArch Arch,
>>>> -                                   bool AtTopLevel)
>>>> -    : Action(CudaDeviceClass, Input), GpuArch(Arch),
>>>> AtTopLevel(AtTopLevel) {}
>>>> +void OffloadAction::doOnEachDeviceDependence(
>>>> +    const OffloadActionWorkTy &Work) const {
>>>> +  auto I = getInputs().begin();
>>>> +  auto E = getInputs().end();
>>>> +  if (I == E)
>>>> +    return;
>>>> +
>>>> +  // We expect to have the same number of input dependences and device
>>>> tool
>>>> +  // chains, except if we also have a host dependence. In that case we
>>>> have one
>>>> +  // more dependence than we have device tool chains.
>>>> +  assert(getInputs().size() == DevToolChains.size() + (HostTC ? 1 : 0)
>>>> &&
>>>> +         "Sizes of action dependences and toolchains are not
>>>> consistent!");
>>>> +
>>>> +  // Skip host action
>>>> +  if (HostTC)
>>>> +    ++I;
>>>> +
>>>> +  auto TI = DevToolChains.begin();
>>>> +  for (; I != E; ++I, ++TI)
>>>> +    Work(*I, *TI, (*I)->getOffloadingArch());
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnEachDependence(const OffloadActionWorkTy
>>>> &Work) const {
>>>> +  doOnHostDependence(Work);
>>>> +  doOnEachDeviceDependence(Work);
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnEachDependence(bool IsHostDependence,
>>>> +                                       const OffloadActionWorkTy
>>>> &Work) const {
>>>> +  if (IsHostDependence)
>>>> +    doOnHostDependence(Work);
>>>> +  else
>>>> +    doOnEachDeviceDependence(Work);
>>>> +}
>>>>
>>>> -void CudaHostAction::anchor() {}
>>>> +bool OffloadAction::hasHostDependence() const { return HostTC !=
>>>> nullptr; }
>>>>
>>>> -CudaHostAction::CudaHostAction(Action *Input, const ActionList
>>>> &DeviceActions)
>>>> -    : Action(CudaHostClass, Input), DeviceActions(DeviceActions) {}
>>>> +Action *OffloadAction::getHostDependence() const {
>>>> +  assert(hasHostDependence() && "Host dependence does not exist!");
>>>> +  assert(!getInputs().empty() && "No dependencies for offload
>>>> action??");
>>>> +  return HostTC ? getInputs().front() : nullptr;
>>>> +}
>>>> +
>>>> +bool OffloadAction::hasSingleDeviceDependence(
>>>> +    bool DoNotConsiderHostActions) const {
>>>> +  if (DoNotConsiderHostActions)
>>>> +    return getInputs().size() == (HostTC ? 2 : 1);
>>>> +  return !HostTC && getInputs().size() == 1;
>>>> +}
>>>> +
>>>> +Action *
>>>> +OffloadAction::getSingleDeviceDependence(bool
>>>> DoNotConsiderHostActions) const {
>>>> +  assert(hasSingleDeviceDependence(DoNotConsiderHostActions) &&
>>>> +         "Single device dependence does not exist!");
>>>> +  // The previous assert ensures the number of entries in getInputs()
>>>> is
>>>> +  // consistent with what we are doing here.
>>>> +  return HostTC ? getInputs()[1] : getInputs().front();
>>>> +}
>>>> +
>>>> +void OffloadAction::DeviceDependences::add(Action &A, const ToolChain
>>>> &TC,
>>>> +                                           const char *BoundArch,
>>>> +                                           OffloadKind OKind) {
>>>> +  DeviceActions.push_back(&A);
>>>> +  DeviceToolChains.push_back(&TC);
>>>> +  DeviceBoundArchs.push_back(BoundArch);
>>>> +  DeviceOffloadKinds.push_back(OKind);
>>>> +}
>>>> +
>>>> +OffloadAction::HostDependence::HostDependence(Action &A, const
>>>> ToolChain &TC,
>>>> +                                              const char *BoundArch,
>>>> +                                              const DeviceDependences
>>>> &DDeps)
>>>> +    : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch) {
>>>> +  for (auto K : DDeps.getOffloadKinds())
>>>> +    HostOffloadKinds |= K;
>>>> +}
>>>>
>>>>  void JobAction::anchor() {}
>>>>
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Driver.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Driver.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Driver.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Driver.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -435,7 +435,9 @@ void Driver::CreateOffloadingDeviceToolC
>>>>        })) {
>>>>      const ToolChain &TC = getToolChain(
>>>>          C.getInputArgs(),
>>>> -
>>>> llvm::Triple(C.getOffloadingHostToolChain()->getTriple().isArch64Bit()
>>>> +        llvm::Triple(C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> +                             ->getTriple()
>>>> +                             .isArch64Bit()
>>>>                           ? "nvptx64-nvidia-cuda"
>>>>                           : "nvptx-nvidia-cuda"));
>>>>      C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
>>>> @@ -1022,19 +1024,33 @@ static unsigned PrintActions1(const Comp
>>>>    } else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
>>>>      os << '"' << BIA->getArchName() << '"' << ", {"
>>>>         << PrintActions1(C, *BIA->input_begin(), Ids) << "}";
>>>> -  } else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>>> -    CudaArch Arch = CDA->getGpuArch();
>>>> -    if (Arch != CudaArch::UNKNOWN)
>>>> -      os << "'" << CudaArchToString(Arch) << "', ";
>>>> -    os << "{" << PrintActions1(C, *CDA->input_begin(), Ids) << "}";
>>>> +  } else if (OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>>> +    bool IsFirst = true;
>>>> +    OA->doOnEachDependence(
>>>> +        [&](Action *A, const ToolChain *TC, const char *BoundArch) {
>>>> +          // E.g. for two CUDA device dependences whose bound arch is
>>>> sm_20 and
>>>> +          // sm_35 this will generate:
>>>> +          // "cuda-device" (nvptx64-nvidia-cuda:sm_20) {#ID},
>>>> "cuda-device"
>>>> +          // (nvptx64-nvidia-cuda:sm_35) {#ID}
>>>> +          if (!IsFirst)
>>>> +            os << ", ";
>>>> +          os << '"';
>>>> +          if (TC)
>>>> +            os << A->getOffloadingKindPrefix();
>>>> +          else
>>>> +            os << "host";
>>>> +          os << " (";
>>>> +          os << TC->getTriple().normalize();
>>>> +
>>>> +          if (BoundArch)
>>>> +            os << ":" << BoundArch;
>>>> +          os << ")";
>>>> +          os << '"';
>>>> +          os << " {" << PrintActions1(C, A, Ids) << "}";
>>>> +          IsFirst = false;
>>>> +        });
>>>>    } else {
>>>> -    const ActionList *AL;
>>>> -    if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>>> -      os << "{" << PrintActions1(C, *CHA->input_begin(), Ids) << "}"
>>>> -         << ", gpu binaries ";
>>>> -      AL = &CHA->getDeviceActions();
>>>> -    } else
>>>> -      AL = &A->getInputs();
>>>> +    const ActionList *AL = &A->getInputs();
>>>>
>>>>      if (AL->size()) {
>>>>        const char *Prefix = "{";
>>>> @@ -1047,10 +1063,24 @@ static unsigned PrintActions1(const Comp
>>>>        os << "{}";
>>>>    }
>>>>
>>>> +  // Append offload info for all options other than the offloading
>>>> action
>>>> +  // itself (e.g. (cuda-device, sm_20) or (cuda-host)).
>>>> +  std::string offload_str;
>>>> +  llvm::raw_string_ostream offload_os(offload_str);
>>>> +  if (!isa<OffloadAction>(A)) {
>>>> +    auto S = A->getOffloadingKindPrefix();
>>>> +    if (!S.empty()) {
>>>> +      offload_os << ", (" << S;
>>>> +      if (A->getOffloadingArch())
>>>> +        offload_os << ", " << A->getOffloadingArch();
>>>> +      offload_os << ")";
>>>> +    }
>>>> +  }
>>>> +
>>>>    unsigned Id = Ids.size();
>>>>    Ids[A] = Id;
>>>>    llvm::errs() << Id << ": " << os.str() << ", "
>>>> -               << types::getTypeName(A->getType()) << "\n";
>>>> +               << types::getTypeName(A->getType()) << offload_os.str()
>>>> << "\n";
>>>>
>>>>    return Id;
>>>>  }
>>>> @@ -1378,8 +1408,12 @@ static Action *buildCudaActions(Compilat
>>>>        PartialCompilationArg &&
>>>>
>>>>  PartialCompilationArg->getOption().matches(options::OPT_cuda_device_only);
>>>>
>>>> -  if (CompileHostOnly)
>>>> -    return C.MakeAction<CudaHostAction>(HostAction, ActionList());
>>>> +  if (CompileHostOnly) {
>>>> +    OffloadAction::HostDependence HDep(
>>>> +        *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>>> +        /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>>> +    return C.MakeAction<OffloadAction>(HDep);
>>>> +  }
>>>>
>>>>    // Collect all cuda_gpu_arch parameters, removing duplicates.
>>>>    SmallVector<CudaArch, 4> GpuArchList;
>>>> @@ -1408,8 +1442,6 @@ static Action *buildCudaActions(Compilat
>>>>      CudaDeviceInputs.push_back(std::make_pair(types::TY_CUDA_DEVICE,
>>>> InputArg));
>>>>
>>>>    // Build actions for all device inputs.
>>>> -  assert(C.getSingleOffloadToolChain<Action::OFK_Cuda>() &&
>>>> -         "Missing toolchain for device-side compilation.");
>>>>    ActionList CudaDeviceActions;
>>>>    C.getDriver().BuildActions(C, Args, CudaDeviceInputs,
>>>> CudaDeviceActions);
>>>>    assert(GpuArchList.size() == CudaDeviceActions.size() &&
>>>> @@ -1421,6 +1453,8 @@ static Action *buildCudaActions(Compilat
>>>>          return a->getKind() != Action::AssembleJobClass;
>>>>        });
>>>>
>>>> +  const ToolChain *CudaTC =
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>>> +
>>>>    // Figure out what to do with device actions -- pass them as inputs
>>>> to the
>>>>    // host action or run each of them independently.
>>>>    if (PartialCompilation || CompileDeviceOnly) {
>>>> @@ -1436,10 +1470,13 @@ static Action *buildCudaActions(Compilat
>>>>        return nullptr;
>>>>      }
>>>>
>>>> -    for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
>>>> -
>>>> Actions.push_back(C.MakeAction<CudaDeviceAction>(CudaDeviceActions[I],
>>>> -                                                       GpuArchList[I],
>>>> -                                                       /* AtTopLevel
>>>> */ true));
>>>> +    for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
>>>> +      OffloadAction::DeviceDependences DDep;
>>>> +      DDep.add(*CudaDeviceActions[I], *CudaTC,
>>>> CudaArchToString(GpuArchList[I]),
>>>> +               Action::OFK_Cuda);
>>>> +      Actions.push_back(
>>>> +          C.MakeAction<OffloadAction>(DDep,
>>>> CudaDeviceActions[I]->getType()));
>>>> +    }
>>>>      // Kill host action in case of device-only compilation.
>>>>      if (CompileDeviceOnly)
>>>>        return nullptr;
>>>> @@ -1459,19 +1496,23 @@ static Action *buildCudaActions(Compilat
>>>>      Action* BackendAction = AssembleAction->getInputs()[0];
>>>>      assert(BackendAction->getType() == types::TY_PP_Asm);
>>>>
>>>> -    for (const auto& A : {AssembleAction, BackendAction}) {
>>>> -      DeviceActions.push_back(C.MakeAction<CudaDeviceAction>(
>>>> -          A, GpuArchList[I], /* AtTopLevel */ false));
>>>> +    for (auto &A : {AssembleAction, BackendAction}) {
>>>> +      OffloadAction::DeviceDependences DDep;
>>>> +      DDep.add(*A, *CudaTC, CudaArchToString(GpuArchList[I]),
>>>> Action::OFK_Cuda);
>>>> +      DeviceActions.push_back(C.MakeAction<OffloadAction>(DDep,
>>>> A->getType()));
>>>>      }
>>>>    }
>>>> -  auto FatbinAction = C.MakeAction<CudaDeviceAction>(
>>>> -      C.MakeAction<LinkJobAction>(DeviceActions,
>>>> types::TY_CUDA_FATBIN),
>>>> -      CudaArch::UNKNOWN,
>>>> -      /* AtTopLevel = */ false);
>>>> +  auto FatbinAction =
>>>> +      C.MakeAction<LinkJobAction>(DeviceActions,
>>>> types::TY_CUDA_FATBIN);
>>>> +
>>>>    // Return a new host action that incorporates original host action
>>>> and all
>>>>    // device actions.
>>>> -  return C.MakeAction<CudaHostAction>(std::move(HostAction),
>>>> -                                      ActionList({FatbinAction}));
>>>> +  OffloadAction::HostDependence HDep(
>>>> +      *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>>> +      /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>>> +  OffloadAction::DeviceDependences DDep;
>>>> +  DDep.add(*FatbinAction, *CudaTC, /*BoundArch=*/nullptr,
>>>> Action::OFK_Cuda);
>>>> +  return C.MakeAction<OffloadAction>(HDep, DDep);
>>>>  }
>>>>
>>>>  void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
>>>> @@ -1580,6 +1621,9 @@ void Driver::BuildActions(Compilation &C
>>>>      YcArg = YuArg = nullptr;
>>>>    }
>>>>
>>>> +  // Track the host offload kinds used on this compilation.
>>>> +  unsigned CompilationActiveOffloadHostKinds = 0u;
>>>> +
>>>>    // Construct the actions to perform.
>>>>    ActionList LinkerInputs;
>>>>
>>>> @@ -1648,6 +1692,9 @@ void Driver::BuildActions(Compilation &C
>>>>              ? phases::Compile
>>>>              : FinalPhase;
>>>>
>>>> +    // Track the host offload kinds used on this input.
>>>> +    unsigned InputActiveOffloadHostKinds = 0u;
>>>> +
>>>>      // Build the pipeline for this file.
>>>>      Action *Current = C.MakeAction<InputAction>(*InputArg, InputType);
>>>>      for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e =
>>>> PL.end();
>>>> @@ -1679,21 +1726,36 @@ void Driver::BuildActions(Compilation &C
>>>>          Current = buildCudaActions(C, Args, InputArg, Current,
>>>> Actions);
>>>>          if (!Current)
>>>>            break;
>>>> +
>>>> +        // We produced a CUDA action for this input, so the host has
>>>> to support
>>>> +        // CUDA.
>>>> +        InputActiveOffloadHostKinds |= Action::OFK_Cuda;
>>>> +        CompilationActiveOffloadHostKinds |= Action::OFK_Cuda;
>>>>        }
>>>>
>>>>        if (Current->getType() == types::TY_Nothing)
>>>>          break;
>>>>      }
>>>>
>>>> -    // If we ended with something, add to the output list.
>>>> -    if (Current)
>>>> +    // If we ended with something, add to the output list. Also,
>>>> propagate the
>>>> +    // offload information to the top-level host action related with
>>>> the current
>>>> +    // input.
>>>> +    if (Current) {
>>>> +      if (InputActiveOffloadHostKinds)
>>>> +        Current->propagateHostOffloadInfo(InputActiveOffloadHostKinds,
>>>> +                                          /*BoundArch=*/nullptr);
>>>>        Actions.push_back(Current);
>>>> +    }
>>>>    }
>>>>
>>>> -  // Add a link action if necessary.
>>>> -  if (!LinkerInputs.empty())
>>>> +  // Add a link action if necessary and propagate the offload
>>>> information for
>>>> +  // the current compilation.
>>>> +  if (!LinkerInputs.empty()) {
>>>>      Actions.push_back(
>>>>          C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image));
>>>> +
>>>> Actions.back()->propagateHostOffloadInfo(CompilationActiveOffloadHostKinds,
>>>> +                                             /*BoundArch=*/nullptr);
>>>> +  }
>>>>
>>>>    // If we are linking, claim any options which are obviously only
>>>> used for
>>>>    // compilation.
>>>> @@ -1829,7 +1891,8 @@ void Driver::BuildJobs(Compilation &C) c
>>>>                         /*BoundArch*/ nullptr,
>>>>                         /*AtTopLevel*/ true,
>>>>                         /*MultipleArchs*/ ArchNames.size() > 1,
>>>> -                       /*LinkingOutput*/ LinkingOutput, CachedResults);
>>>> +                       /*LinkingOutput*/ LinkingOutput, CachedResults,
>>>> +                       /*BuildForOffloadDevice*/ false);
>>>>    }
>>>>
>>>>    // If the user passed -Qunused-arguments or there were errors, don't
>>>> warn
>>>> @@ -1878,7 +1941,28 @@ void Driver::BuildJobs(Compilation &C) c
>>>>      }
>>>>    }
>>>>  }
>>>> -
>>>> +/// Collapse an offloading action looking for a job of the given type.
>>>> The input
>>>> +/// action is changed to the input of the collapsed sequence. If we
>>>> effectively
>>>> +/// had a collapse return the corresponding offloading action,
>>>> otherwise return
>>>> +/// null.
>>>> +template <typename T>
>>>> +static OffloadAction *collapseOffloadingAction(Action *&CurAction) {
>>>> +  if (!CurAction)
>>>> +    return nullptr;
>>>> +  if (auto *OA = dyn_cast<OffloadAction>(CurAction)) {
>>>> +    if (OA->hasHostDependence())
>>>> +      if (auto *HDep = dyn_cast<T>(OA->getHostDependence())) {
>>>> +        CurAction = HDep;
>>>> +        return OA;
>>>> +      }
>>>> +    if (OA->hasSingleDeviceDependence())
>>>> +      if (auto *DDep = dyn_cast<T>(OA->getSingleDeviceDependence())) {
>>>> +        CurAction = DDep;
>>>> +        return OA;
>>>> +      }
>>>> +  }
>>>> +  return nullptr;
>>>> +}
>>>>  // Returns a Tool for a given JobAction.  In case the action and its
>>>>  // predecessors can be combined, updates Inputs with the inputs of the
>>>>  // first combined action. If one of the collapsed actions is a
>>>> @@ -1888,34 +1972,39 @@ static const Tool *selectToolForJob(Comp
>>>>                                      bool EmbedBitcode, const ToolChain
>>>> *TC,
>>>>                                      const JobAction *JA,
>>>>                                      const ActionList *&Inputs,
>>>> -                                    const CudaHostAction
>>>> *&CollapsedCHA) {
>>>> +                                    ActionList
>>>> &CollapsedOffloadAction) {
>>>>    const Tool *ToolForJob = nullptr;
>>>> -  CollapsedCHA = nullptr;
>>>> +  CollapsedOffloadAction.clear();
>>>>
>>>>    // See if we should look for a compiler with an integrated
>>>> assembler. We match
>>>>    // bottom up, so what we are actually looking for is an assembler
>>>> job with a
>>>>    // compiler input.
>>>>
>>>> +  // Look through offload actions between assembler and backend
>>>> actions.
>>>> +  Action *BackendJA = (isa<AssembleJobAction>(JA) && Inputs->size() ==
>>>> 1)
>>>> +                          ? *Inputs->begin()
>>>> +                          : nullptr;
>>>> +  auto *BackendOA =
>>>> collapseOffloadingAction<BackendJobAction>(BackendJA);
>>>> +
>>>>    if (TC->useIntegratedAs() && !SaveTemps &&
>>>>        !C.getArgs().hasArg(options::OPT_via_file_asm) &&
>>>>        !C.getArgs().hasArg(options::OPT__SLASH_FA) &&
>>>> -      !C.getArgs().hasArg(options::OPT__SLASH_Fa) &&
>>>> -      isa<AssembleJobAction>(JA) && Inputs->size() == 1 &&
>>>> -      isa<BackendJobAction>(*Inputs->begin())) {
>>>> +      !C.getArgs().hasArg(options::OPT__SLASH_Fa) && BackendJA &&
>>>> +      isa<BackendJobAction>(BackendJA)) {
>>>>      // A BackendJob is always preceded by a CompileJob, and without
>>>> -save-temps
>>>>      // or -fembed-bitcode, they will always get combined together, so
>>>> instead of
>>>>      // checking the backend tool, check if the tool for the CompileJob
>>>> has an
>>>>      // integrated assembler. For -fembed-bitcode, CompileJob is still
>>>> used to
>>>>      // look up tools for BackendJob, but they need to match before we
>>>> can split
>>>>      // them.
>>>> -    const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
>>>> -    // Compile job may be wrapped in CudaHostAction, extract it if
>>>> -    // that's the case and update CollapsedCHA if we combine phases.
>>>> -    CudaHostAction *CHA =
>>>> dyn_cast<CudaHostAction>(*BackendInputs->begin());
>>>> -    JobAction *CompileJA = cast<CompileJobAction>(
>>>> -        CHA ? *CHA->input_begin() : *BackendInputs->begin());
>>>> -    assert(CompileJA && "Backend job is not preceeded by compile
>>>> job.");
>>>> -    const Tool *Compiler = TC->SelectTool(*CompileJA);
>>>> +
>>>> +    // Look through offload actions between backend and compile
>>>> actions.
>>>> +    Action *CompileJA = *BackendJA->getInputs().begin();
>>>> +    auto *CompileOA =
>>>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>>>> +
>>>> +    assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>>> +           "Backend job is not preceeded by compile job.");
>>>> +    const Tool *Compiler =
>>>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>>      if (!Compiler)
>>>>        return nullptr;
>>>>      // When using -fembed-bitcode, it is required to have the same
>>>> tool (clang)
>>>> @@ -1929,7 +2018,12 @@ static const Tool *selectToolForJob(Comp
>>>>      if (Compiler->hasIntegratedAssembler()) {
>>>>        Inputs = &CompileJA->getInputs();
>>>>        ToolForJob = Compiler;
>>>> -      CollapsedCHA = CHA;
>>>> +      // Save the collapsed offload actions because they may still
>>>> contain
>>>> +      // device actions.
>>>> +      if (CompileOA)
>>>> +        CollapsedOffloadAction.push_back(CompileOA);
>>>> +      if (BackendOA)
>>>> +        CollapsedOffloadAction.push_back(BackendOA);
>>>>      }
>>>>    }
>>>>
>>>> @@ -1939,20 +2033,23 @@ static const Tool *selectToolForJob(Comp
>>>>    if (isa<BackendJobAction>(JA)) {
>>>>      // Check if the compiler supports emitting LLVM IR.
>>>>      assert(Inputs->size() == 1);
>>>> -    // Compile job may be wrapped in CudaHostAction, extract it if
>>>> -    // that's the case and update CollapsedCHA if we combine phases.
>>>> -    CudaHostAction *CHA = dyn_cast<CudaHostAction>(*Inputs->begin());
>>>> -    JobAction *CompileJA =
>>>> -        cast<CompileJobAction>(CHA ? *CHA->input_begin() :
>>>> *Inputs->begin());
>>>> -    assert(CompileJA && "Backend job is not preceeded by compile
>>>> job.");
>>>> -    const Tool *Compiler = TC->SelectTool(*CompileJA);
>>>> +
>>>> +    // Look through offload actions between backend and compile
>>>> actions.
>>>> +    Action *CompileJA = *JA->getInputs().begin();
>>>> +    auto *CompileOA =
>>>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>>>> +
>>>> +    assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>>> +           "Backend job is not preceeded by compile job.");
>>>> +    const Tool *Compiler =
>>>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>>      if (!Compiler)
>>>>        return nullptr;
>>>>      if (!Compiler->canEmitIR() ||
>>>>          (!SaveTemps && !EmbedBitcode)) {
>>>>        Inputs = &CompileJA->getInputs();
>>>>        ToolForJob = Compiler;
>>>> -      CollapsedCHA = CHA;
>>>> +
>>>> +      if (CompileOA)
>>>> +        CollapsedOffloadAction.push_back(CompileOA);
>>>>      }
>>>>    }
>>>>
>>>> @@ -1963,12 +2060,21 @@ static const Tool *selectToolForJob(Comp
>>>>    // See if we should use an integrated preprocessor. We do so when we
>>>> have
>>>>    // exactly one input, since this is the only use case we care about
>>>>    // (irrelevant since we don't support combine yet).
>>>> -  if (Inputs->size() == 1 &&
>>>> isa<PreprocessJobAction>(*Inputs->begin()) &&
>>>> +
>>>> +  // Look through offload actions after preprocessing.
>>>> +  Action *PreprocessJA = (Inputs->size() == 1) ? *Inputs->begin() :
>>>> nullptr;
>>>> +  auto *PreprocessOA =
>>>> +      collapseOffloadingAction<PreprocessJobAction>(PreprocessJA);
>>>> +
>>>> +  if (PreprocessJA && isa<PreprocessJobAction>(PreprocessJA) &&
>>>>        !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
>>>>        !C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps
>>>> &&
>>>>        !C.getArgs().hasArg(options::OPT_rewrite_objc) &&
>>>> -      ToolForJob->hasIntegratedCPP())
>>>> -    Inputs = &(*Inputs)[0]->getInputs();
>>>> +      ToolForJob->hasIntegratedCPP()) {
>>>> +    Inputs = &PreprocessJA->getInputs();
>>>> +    if (PreprocessOA)
>>>> +      CollapsedOffloadAction.push_back(PreprocessOA);
>>>> +  }
>>>>
>>>>    return ToolForJob;
>>>>  }
>>>> @@ -1976,8 +2082,8 @@ static const Tool *selectToolForJob(Comp
>>>>  InputInfo Driver::BuildJobsForAction(
>>>>      Compilation &C, const Action *A, const ToolChain *TC, const char
>>>> *BoundArch,
>>>>      bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>>> -    std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults)
>>>> -    const {
>>>> +    std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults,
>>>> +    bool BuildForOffloadDevice) const {
>>>>    // The bound arch is not necessarily represented in the toolchain's
>>>> triple --
>>>>    // for example, armv7 and armv7s both map to the same triple -- so
>>>> we need
>>>>    // both in our map.
>>>> @@ -1991,9 +2097,9 @@ InputInfo Driver::BuildJobsForAction(
>>>>    if (CachedResult != CachedResults.end()) {
>>>>      return CachedResult->second;
>>>>    }
>>>> -  InputInfo Result =
>>>> -      BuildJobsForActionNoCache(C, A, TC, BoundArch, AtTopLevel,
>>>> MultipleArchs,
>>>> -                                LinkingOutput, CachedResults);
>>>> +  InputInfo Result = BuildJobsForActionNoCache(
>>>> +      C, A, TC, BoundArch, AtTopLevel, MultipleArchs, LinkingOutput,
>>>> +      CachedResults, BuildForOffloadDevice);
>>>>    CachedResults[ActionTC] = Result;
>>>>    return Result;
>>>>  }
>>>> @@ -2001,21 +2107,65 @@ InputInfo Driver::BuildJobsForAction(
>>>>  InputInfo Driver::BuildJobsForActionNoCache(
>>>>      Compilation &C, const Action *A, const ToolChain *TC, const char
>>>> *BoundArch,
>>>>      bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>>> -    std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults)
>>>> -    const {
>>>> +    std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults,
>>>> +    bool BuildForOffloadDevice) const {
>>>>    llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");
>>>>
>>>> -  InputInfoList CudaDeviceInputInfos;
>>>> -  if (const CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>>> -    // Append outputs of device jobs to the input list.
>>>> -    for (const Action *DA : CHA->getDeviceActions()) {
>>>> -      CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>>> -          C, DA, TC, nullptr, AtTopLevel,
>>>> -          /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>>> -    }
>>>> -    // Override current action with a real host compile action and
>>>> continue
>>>> -    // processing it.
>>>> -    A = *CHA->input_begin();
>>>> +  InputInfoList OffloadDependencesInputInfo;
>>>> +  if (const OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>>> +    // The offload action is expected to be used in four different
>>>> situations.
>>>> +    //
>>>> +    // a) Set a toolchain/architecture/kind for a host action:
>>>> +    //    Host Action 1 -> OffloadAction -> Host Action 2
>>>> +    //
>>>> +    // b) Set a toolchain/architecture/kind for a device action;
>>>> +    //    Device Action 1 -> OffloadAction -> Device Action 2
>>>> +    //
>>>> +    // c) Specify a device dependences to a host action;
>>>> +    //    Device Action 1  _
>>>> +    //                      \
>>>> +    //      Host Action 1  ---> OffloadAction -> Host Action 2
>>>> +    //
>>>> +    // d) Specify a host dependence to a device action.
>>>> +    //      Host Action 1  _
>>>> +    //                      \
>>>> +    //    Device Action 1  ---> OffloadAction -> Device Action 2
>>>> +    //
>>>> +    // For a) and b), we just return the job generated for the
>>>> dependence. For
>>>> +    // c) and d) we override the current action with the host/device
>>>> dependence
>>>> +    // if the current toolchain is host/device and set the offload
>>>> dependences
>>>> +    // info with the jobs obtained from the device/host dependence(s).
>>>> +
>>>> +    // If there is a single device option, just generate the job for
>>>> it.
>>>> +    if (OA->hasSingleDeviceDependence()) {
>>>> +      InputInfo DevA;
>>>> +      OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain
>>>> *DepTC,
>>>> +                                       const char *DepBoundArch) {
>>>> +        DevA =
>>>> +            BuildJobsForAction(C, DepA, DepTC, DepBoundArch,
>>>> AtTopLevel,
>>>> +                               /*MultipleArchs*/ !!DepBoundArch,
>>>> LinkingOutput,
>>>> +                               CachedResults,
>>>> /*BuildForOffloadDevice=*/true);
>>>> +      });
>>>> +      return DevA;
>>>> +    }
>>>> +
>>>> +    // If 'Action 2' is host, we generate jobs for the device
>>>> dependences and
>>>> +    // override the current action with the host dependence.
>>>> Otherwise, we
>>>> +    // generate the host dependences and override the action with the
>>>> device
>>>> +    // dependence. The dependences can't therefore be a top-level
>>>> action.
>>>> +    OA->doOnEachDependence(
>>>> +        /*IsHostDependence=*/BuildForOffloadDevice,
>>>> +        [&](Action *DepA, const ToolChain *DepTC, const char
>>>> *DepBoundArch) {
>>>> +          OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>>> +              C, DepA, DepTC, DepBoundArch, /*AtTopLevel=*/false,
>>>> +              /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
>>>> CachedResults,
>>>> +
>>>> /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>>> +                  Action::OFK_None));
>>>> +        });
>>>> +
>>>> +    A = BuildForOffloadDevice
>>>> +            ?
>>>> OA->getSingleDeviceDependence(/*DoNotConsiderHostActions=*/true)
>>>> +            : OA->getHostDependence();
>>>>    }
>>>>
>>>>    if (const InputAction *IA = dyn_cast<InputAction>(A)) {
>>>> @@ -2042,41 +2192,34 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>>        TC = &C.getDefaultToolChain();
>>>>
>>>>      return BuildJobsForAction(C, *BAA->input_begin(), TC, ArchName,
>>>> AtTopLevel,
>>>> -                              MultipleArchs, LinkingOutput,
>>>> CachedResults);
>>>> +                              MultipleArchs, LinkingOutput,
>>>> CachedResults,
>>>> +                              BuildForOffloadDevice);
>>>>    }
>>>>
>>>> -  if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>>> -    // Initial processing of CudaDeviceAction carries host params.
>>>> -    // Call BuildJobsForAction() again, now with correct device
>>>> parameters.
>>>> -    InputInfo II = BuildJobsForAction(
>>>> -        C, *CDA->input_begin(),
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
>>>> -        CudaArchToString(CDA->getGpuArch()), CDA->isAtTopLevel(),
>>>> -        /*MultipleArchs=*/true, LinkingOutput, CachedResults);
>>>> -    // Currently II's Action is *CDA->input_begin().  Set it to CDA
>>>> instead, so
>>>> -    // that one can retrieve II's GPU arch.
>>>> -    II.setAction(A);
>>>> -    return II;
>>>> -  }
>>>>
>>>>    const ActionList *Inputs = &A->getInputs();
>>>>
>>>>    const JobAction *JA = cast<JobAction>(A);
>>>> -  const CudaHostAction *CollapsedCHA = nullptr;
>>>> +  ActionList CollapsedOffloadActions;
>>>> +
>>>>    const Tool *T =
>>>>        selectToolForJob(C, isSaveTempsEnabled(), embedBitcodeEnabled(),
>>>> TC, JA,
>>>> -                       Inputs, CollapsedCHA);
>>>> +                       Inputs, CollapsedOffloadActions);
>>>>    if (!T)
>>>>      return InputInfo();
>>>>
>>>> -  // If we've collapsed action list that contained CudaHostAction we
>>>> -  // need to build jobs for device-side inputs it may have held.
>>>> -  if (CollapsedCHA) {
>>>> -    for (const Action *DA : CollapsedCHA->getDeviceActions()) {
>>>> -      CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>>> -          C, DA, TC, "", AtTopLevel,
>>>> -          /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>>> -    }
>>>> -  }
>>>> +  // If we've collapsed action list that contained OffloadAction we
>>>> +  // need to build jobs for host/device-side inputs it may have held.
>>>> +  for (const auto *OA : CollapsedOffloadActions)
>>>> +    cast<OffloadAction>(OA)->doOnEachDependence(
>>>> +        /*IsHostDependence=*/BuildForOffloadDevice,
>>>> +        [&](Action *DepA, const ToolChain *DepTC, const char
>>>> *DepBoundArch) {
>>>> +          OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>>> +              C, DepA, DepTC, DepBoundArch, AtTopLevel,
>>>> +              /*MultipleArchs=*/!!DepBoundArch, LinkingOutput,
>>>> CachedResults,
>>>> +
>>>> /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>>> +                  Action::OFK_None));
>>>> +        });
>>>>
>>>>    // Only use pipes when there is exactly one input.
>>>>    InputInfoList InputInfos;
>>>> @@ -2086,9 +2229,9 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>>      // FIXME: Clean this up.
>>>>      bool SubJobAtTopLevel =
>>>>          AtTopLevel && (isa<DsymutilJobAction>(A) ||
>>>> isa<VerifyJobAction>(A));
>>>> -    InputInfos.push_back(BuildJobsForAction(C, Input, TC, BoundArch,
>>>> -                                            SubJobAtTopLevel,
>>>> MultipleArchs,
>>>> -                                            LinkingOutput,
>>>> CachedResults));
>>>> +    InputInfos.push_back(BuildJobsForAction(
>>>> +        C, Input, TC, BoundArch, SubJobAtTopLevel, MultipleArchs,
>>>> LinkingOutput,
>>>> +        CachedResults, BuildForOffloadDevice));
>>>>    }
>>>>
>>>>    // Always use the first input as the base input.
>>>> @@ -2099,9 +2242,10 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>>    if (JA->getType() == types::TY_dSYM)
>>>>      BaseInput = InputInfos[0].getFilename();
>>>>
>>>> -  // Append outputs of cuda device jobs to the input list
>>>> -  if (CudaDeviceInputInfos.size())
>>>> -    InputInfos.append(CudaDeviceInputInfos.begin(),
>>>> CudaDeviceInputInfos.end());
>>>> +  // Append outputs of offload device jobs to the input list
>>>> +  if (!OffloadDependencesInputInfo.empty())
>>>> +    InputInfos.append(OffloadDependencesInputInfo.begin(),
>>>> +                      OffloadDependencesInputInfo.end());
>>>>
>>>>    // Determine the place to write output to, if any.
>>>>    InputInfo Result;
>>>> @@ -2109,7 +2253,8 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>>      Result = InputInfo(A, BaseInput);
>>>>    else
>>>>      Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput,
>>>> BoundArch,
>>>> -                                             AtTopLevel,
>>>> MultipleArchs),
>>>> +                                             AtTopLevel, MultipleArchs,
>>>> +
>>>>  TC->getTriple().normalize()),
>>>>                         BaseInput);
>>>>
>>>>    if (CCCPrintBindings && !CCGenDiagnostics) {
>>>> @@ -2169,7 +2314,8 @@ static const char *MakeCLOutputFilename(
>>>>  const char *Driver::GetNamedOutputPath(Compilation &C, const JobAction
>>>> &JA,
>>>>                                         const char *BaseInput,
>>>>                                         const char *BoundArch, bool
>>>> AtTopLevel,
>>>> -                                       bool MultipleArchs) const {
>>>> +                                       bool MultipleArchs,
>>>> +                                       StringRef NormalizedTriple)
>>>> const {
>>>>    llvm::PrettyStackTraceString CrashInfo("Computing output path");
>>>>    // Output to a user requested destination?
>>>>    if (AtTopLevel && !isa<DsymutilJobAction>(JA) &&
>>>> !isa<VerifyJobAction>(JA)) {
>>>> @@ -2255,6 +2401,7 @@ const char *Driver::GetNamedOutputPath(C
>>>>            MakeCLOutputFilename(C.getArgs(), "", BaseName,
>>>> types::TY_Image);
>>>>      } else if (MultipleArchs && BoundArch) {
>>>>        SmallString<128> Output(getDefaultImageName());
>>>> +      Output += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>>        Output += "-";
>>>>        Output.append(BoundArch);
>>>>        NamedOutput = C.getArgs().MakeArgString(Output.c_str());
>>>> @@ -2271,6 +2418,7 @@ const char *Driver::GetNamedOutputPath(C
>>>>      if (!types::appendSuffixForType(JA.getType()))
>>>>        End = BaseName.rfind('.');
>>>>      SmallString<128> Suffixed(BaseName.substr(0, End));
>>>> +    Suffixed += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>>      if (MultipleArchs && BoundArch) {
>>>>        Suffixed += "-";
>>>>        Suffixed.append(BoundArch);
>>>>
>>>> Modified: cfe/trunk/lib/Driver/ToolChain.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChain.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/ToolChain.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/ToolChain.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -248,8 +248,7 @@ Tool *ToolChain::getTool(Action::ActionC
>>>>
>>>>    case Action::InputClass:
>>>>    case Action::BindArchClass:
>>>> -  case Action::CudaDeviceClass:
>>>> -  case Action::CudaHostClass:
>>>> +  case Action::OffloadClass:
>>>>    case Action::LipoJobClass:
>>>>    case Action::DsymutilJobClass:
>>>>    case Action::VerifyDebugInfoJobClass:
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Tools.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Tools.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Tools.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -296,12 +296,45 @@ static bool forwardToGCC(const Option &O
>>>>           !O.hasFlag(options::DriverOption) &&
>>>> !O.hasFlag(options::LinkerInput);
>>>>  }
>>>>
>>>> +/// Add the C++ include args of other offloading toolchains. If this
>>>> is a host
>>>> +/// job, the device toolchains are added. If this is a device job, the
>>>> host
>>>> +/// toolchains will be added.
>>>> +static void addExtraOffloadCXXStdlibIncludeArgs(Compilation &C,
>>>> +                                                const JobAction &JA,
>>>> +                                                const ArgList &Args,
>>>> +                                                ArgStringList
>>>> &CmdArgs) {
>>>> +
>>>> +  if (JA.isHostOffloading(Action::OFK_Cuda))
>>>> +    C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>>> +        ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> +  else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> +    C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> +        ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> +
>>>> +  // TODO: Add support for other programming models here.
>>>> +}
>>>> +
>>>> +/// Add the include args that are specific of each offloading
>>>> programming model.
>>>> +static void addExtraOffloadSpecificIncludeArgs(Compilation &C,
>>>> +                                               const JobAction &JA,
>>>> +                                               const ArgList &Args,
>>>> +                                               ArgStringList &CmdArgs)
>>>> {
>>>> +
>>>> +  if (JA.isHostOffloading(Action::OFK_Cuda))
>>>> +
>>>> C.getSingleOffloadToolChain<Action::OFK_Host>()->AddCudaIncludeArgs(
>>>> +        Args, CmdArgs);
>>>> +  else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> +
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>()->AddCudaIncludeArgs(
>>>> +        Args, CmdArgs);
>>>> +
>>>> +  // TODO: Add support for other programming models here.
>>>> +}
>>>> +
>>>>  void Clang::AddPreprocessingOptions(Compilation &C, const JobAction
>>>> &JA,
>>>>                                      const Driver &D, const ArgList
>>>> &Args,
>>>>                                      ArgStringList &CmdArgs,
>>>>                                      const InputInfo &Output,
>>>> -                                    const InputInfoList &Inputs,
>>>> -                                    const ToolChain *AuxToolChain)
>>>> const {
>>>> +                                    const InputInfoList &Inputs) const
>>>> {
>>>>    Arg *A;
>>>>    const bool IsIAMCU = getToolChain().getTriple().isOSIAMCU();
>>>>
>>>> @@ -566,31 +599,27 @@ void Clang::AddPreprocessingOptions(Comp
>>>>    // OBJCPLUS_INCLUDE_PATH - system includes enabled when compiling
>>>> ObjC++.
>>>>    addDirectoryList(Args, CmdArgs, "-objcxx-isystem",
>>>> "OBJCPLUS_INCLUDE_PATH");
>>>>
>>>> -  // Optional AuxToolChain indicates that we need to include headers
>>>> -  // for more than one target. If that's the case, add include paths
>>>> -  // from AuxToolChain right after include paths of the same kind for
>>>> -  // the current target.
>>>> +  // While adding the include arguments, we also attempt to retrieve
>>>> the
>>>> +  // arguments of related offloading toolchains or arguments that are
>>>> specific
>>>> +  // of an offloading programming model.
>>>>
>>>>    // Add C++ include arguments, if needed.
>>>>    if (types::isCXX(Inputs[0].getType())) {
>>>>      getToolChain().AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> -    if (AuxToolChain)
>>>> -      AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> +    addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>>>    }
>>>>
>>>>    // Add system include arguments for all targets but IAMCU.
>>>>    if (!IsIAMCU) {
>>>>      getToolChain().AddClangSystemIncludeArgs(Args, CmdArgs);
>>>> -    if (AuxToolChain)
>>>> -      AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> +    addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>>>
>>>
>>> This doesn't make much sense to me: we already added the C++ stdlib
>>> includes a few lines above for C++ compiles. Should this be adding the
>>> (non-C++) system include args instead?
>>>
>>>
>>>>    } else {
>>>>      // For IAMCU add special include arguments.
>>>>      getToolChain().AddIAMCUIncludeArgs(Args, CmdArgs);
>>>>    }
>>>>
>>>> -  // Add CUDA include arguments, if needed.
>>>> -  if (types::isCuda(Inputs[0].getType()))
>>>> -    getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
>>>> +  // Add offload include arguments, if needed.
>>>> +  addExtraOffloadSpecificIncludeArgs(C, JA, Args, CmdArgs);
>>>>  }
>>>>
>>>>  // FIXME: Move to target hook.
>>>> @@ -3799,7 +3828,7 @@ void Clang::ConstructJob(Compilation &C,
>>>>    // CUDA compilation may have multiple inputs (source file + results
>>>> of
>>>>    // device-side compilations). All other jobs are expected to have
>>>> exactly one
>>>>    // input.
>>>> -  bool IsCuda = types::isCuda(Input.getType());
>>>> +  bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
>>>>    assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple
>>>> inputs.");
>>>>
>>>>    // C++ is not supported for IAMCU.
>>>> @@ -3815,21 +3844,21 @@ void Clang::ConstructJob(Compilation &C,
>>>>    CmdArgs.push_back("-triple");
>>>>    CmdArgs.push_back(Args.MakeArgString(TripleStr));
>>>>
>>>> -  const ToolChain *AuxToolChain = nullptr;
>>>>    if (IsCuda) {
>>>> -    // FIXME: We need a (better) way to pass information about
>>>> -    // particular compilation pass we're constructing here. For now we
>>>> -    // can check which toolchain we're using and pick the other one to
>>>> -    // extract the triple.
>>>> -    if (&getToolChain() ==
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>())
>>>> -      AuxToolChain = C.getOffloadingHostToolChain();
>>>> -    else if (&getToolChain() == C.getOffloadingHostToolChain())
>>>> -      AuxToolChain = C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>>> -    else
>>>> -      llvm_unreachable("Can't figure out CUDA compilation mode.");
>>>> -    assert(AuxToolChain != nullptr && "No aux toolchain.");
>>>> +    // We have to pass the triple of the host if compiling for a CUDA
>>>> device and
>>>> +    // vice-versa.
>>>> +    StringRef NormalizedTriple;
>>>> +    if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> +      NormalizedTriple =
>>>> C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> +                             ->getTriple()
>>>> +                             .normalize();
>>>> +    else
>>>> +      NormalizedTriple =
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>>> +                             ->getTriple()
>>>> +                             .normalize();
>>>> +
>>>>      CmdArgs.push_back("-aux-triple");
>>>> -
>>>> CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));
>>>> +    CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
>>>>    }
>>>>
>>>>    if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm ||
>>>> @@ -4718,8 +4747,7 @@ void Clang::ConstructJob(Compilation &C,
>>>>    //
>>>>    // FIXME: Support -fpreprocessed
>>>>    if (types::getPreprocessedType(InputType) != types::TY_INVALID)
>>>> -    AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs,
>>>> -                            AuxToolChain);
>>>> +    AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs);
>>>>
>>>>    // Don't warn about "clang -c -DPIC -fPIC test.i" because libtool.m4
>>>> assumes
>>>>    // that "The compiler can only warn and ignore the option if not
>>>> recognized".
>>>> @@ -11193,15 +11221,14 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>>        static_cast<const toolchains::CudaToolChain &>(getToolChain());
>>>>    assert(TC.getTriple().isNVPTX() && "Wrong platform");
>>>>
>>>> -  std::vector<std::string> gpu_archs =
>>>> -      Args.getAllArgValues(options::OPT_march_EQ);
>>>> -  assert(gpu_archs.size() == 1 && "Exactly one GPU Arch required for
>>>> ptxas.");
>>>> -  const std::string& gpu_arch = gpu_archs[0];
>>>> +  // Obtain architecture from the action.
>>>> +  CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
>>>> +  assert(gpu_arch != CudaArch::UNKNOWN &&
>>>> +         "Device action expected to have an architecture.");
>>>>
>>>>    // Check that our installation's ptxas supports gpu_arch.
>>>>    if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
>>>> -    TC.cudaInstallation().CheckCudaVersionSupportsArch(
>>>> -        StringToCudaArch(gpu_arch));
>>>> +    TC.cudaInstallation().CheckCudaVersionSupportsArch(gpu_arch);
>>>>    }
>>>>
>>>>    ArgStringList CmdArgs;
>>>> @@ -11245,7 +11272,7 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>>    }
>>>>
>>>>    CmdArgs.push_back("--gpu-name");
>>>> -  CmdArgs.push_back(Args.MakeArgString(gpu_arch));
>>>> +  CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
>>>>    CmdArgs.push_back("--output-file");
>>>>    CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>>    for (const auto& II : Inputs)
>>>> @@ -11277,13 +11304,20 @@ void NVPTX::Linker::ConstructJob(Compila
>>>>    CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>>
>>>>    for (const auto& II : Inputs) {
>>>> -    auto* A = cast<const CudaDeviceAction>(II.getAction());
>>>> +    auto *A = II.getAction();
>>>> +    assert(A->getInputs().size() == 1 &&
>>>> +           "Device offload action is expected to have a single input");
>>>> +    const char *gpu_arch_str = A->getOffloadingArch();
>>>> +    assert(gpu_arch_str &&
>>>> +           "Device action expected to have associated a GPU
>>>> architecture!");
>>>> +    CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);
>>>> +
>>>>      // We need to pass an Arch of the form "sm_XX" for cubin files and
>>>>      // "compute_XX" for ptx.
>>>>      const char *Arch =
>>>>          (II.getType() == types::TY_PP_Asm)
>>>> -            ?
>>>> CudaVirtualArchToString(VirtualArchForCudaArch(A->getGpuArch()))
>>>> -            : CudaArchToString(A->getGpuArch());
>>>> +            ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
>>>> +            : gpu_arch_str;
>>>>
>>>>  CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
>>>>                                           Arch + ",file=" +
>>>> II.getFilename()));
>>>>    }
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Tools.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Tools.h (original)
>>>> +++ cfe/trunk/lib/Driver/Tools.h Fri Jul 15 18:13:27 2016
>>>> @@ -57,8 +57,7 @@ private:
>>>>                                 const Driver &D, const
>>>> llvm::opt::ArgList &Args,
>>>>                                 llvm::opt::ArgStringList &CmdArgs,
>>>>                                 const InputInfo &Output,
>>>> -                               const InputInfoList &Inputs,
>>>> -                               const ToolChain *AuxToolChain) const;
>>>> +                               const InputInfoList &Inputs) const;
>>>>
>>>>    void AddAArch64TargetArgs(const llvm::opt::ArgList &Args,
>>>>                              llvm::opt::ArgStringList &CmdArgs) const;
>>>>
>>>> Modified: cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>> (original)
>>>> +++ cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp Fri Jul
>>>> 15 18:13:27 2016
>>>> @@ -60,25 +60,25 @@ clang::createInvocationFromCommandLine(A
>>>>    }
>>>>
>>>>    // We expect to get back exactly one command job, if we didn't
>>>> something
>>>> -  // failed. CUDA compilation is an exception as it creates multiple
>>>> jobs. If
>>>> -  // that's the case, we proceed with the first job. If caller needs
>>>> particular
>>>> -  // CUDA job, it should be controlled via --cuda-{host|device}-only
>>>> option
>>>> -  // passed to the driver.
>>>> +  // failed. Offload compilation is an exception as it creates
>>>> multiple jobs. If
>>>> +  // that's the case, we proceed with the first job. If caller needs a
>>>> +  // particular job, it should be controlled via options (e.g.
>>>> +  // --cuda-{host|device}-only for CUDA) passed to the driver.
>>>>    const driver::JobList &Jobs = C->getJobs();
>>>> -  bool CudaCompilation = false;
>>>> +  bool OffloadCompilation = false;
>>>>    if (Jobs.size() > 1) {
>>>>      for (auto &A : C->getActions()){
>>>>        // On MacOSX real actions may end up being wrapped in
>>>> BindArchAction
>>>>        if (isa<driver::BindArchAction>(A))
>>>>          A = *A->input_begin();
>>>> -      if (isa<driver::CudaDeviceAction>(A)) {
>>>> -        CudaCompilation = true;
>>>> +      if (isa<driver::OffloadAction>(A)) {
>>>> +        OffloadCompilation = true;
>>>>          break;
>>>>        }
>>>>      }
>>>>    }
>>>>    if (Jobs.size() == 0 || !isa<driver::Command>(*Jobs.begin()) ||
>>>> -      (Jobs.size() > 1 && !CudaCompilation)) {
>>>> +      (Jobs.size() > 1 && !OffloadCompilation)) {
>>>>      SmallString<256> Msg;
>>>>      llvm::raw_svector_ostream OS(Msg);
>>>>      Jobs.Print(OS, "; ", true);
>>>>
>>>> Added: cfe/trunk/test/Driver/cuda_phases.cu
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/cuda_phases.cu?rev=275645&view=auto
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/test/Driver/cuda_phases.cu (added)
>>>> +++ cfe/trunk/test/Driver/cuda_phases.cu Fri Jul 15 18:13:27 2016
>>>> @@ -0,0 +1,206 @@
>>>> +// Tests the phases generated for a CUDA offloading target for
>>>> different
>>>> +// combinations of:
>>>> +// - Number of gpu architectures;
>>>> +// - Host/device-only compilation;
>>>> +// - User-requested final phase - binary or assembly.
>>>> +
>>>> +// REQUIRES: clang-driver
>>>> +// REQUIRES: powerpc-registered-target
>>>> +// REQUIRES: nvptx-registered-target
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=BIN %s
>>>> +// BIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// BIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// BIN: 2: compiler, {1}, ir, (host-cuda)
>>>> +// BIN: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// BIN: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// BIN: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>>> +// BIN: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>>> +// BIN: 7: assembler, {6}, object, (device-cuda, sm_30)
>>>> +// BIN: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
>>>> object
>>>> +// BIN: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>>>> assembler
>>>> +// BIN: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
>>>> +// BIN: 11: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {10}, ir
>>>> +// BIN: 12: backend, {11}, assembler, (host-cuda)
>>>> +// BIN: 13: assembler, {12}, object, (host-cuda)
>>>> +// BIN: 14: linker, {13}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=ASM %s
>>>> +// ASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// ASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// ASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// ASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// ASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// ASM: 5: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// ASM: 6: preprocessor, {5}, cuda-cpp-output, (host-cuda)
>>>> +// ASM: 7: compiler, {6}, ir, (host-cuda)
>>>> +// ASM: 8: backend, {7}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=BIN2 %s
>>>> +// BIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// BIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// BIN2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// BIN2: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// BIN2: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// BIN2: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>>> +// BIN2: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>>> +// BIN2: 7: assembler, {6}, object, (device-cuda, sm_30)
>>>> +// BIN2: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
>>>> object
>>>> +// BIN2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>>>> assembler
>>>> +// BIN2: 10: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// BIN2: 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// BIN2: 12: compiler, {11}, ir, (device-cuda, sm_35)
>>>> +// BIN2: 13: backend, {12}, assembler, (device-cuda, sm_35)
>>>> +// BIN2: 14: assembler, {13}, object, (device-cuda, sm_35)
>>>> +// BIN2: 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {14},
>>>> object
>>>> +// BIN2: 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {13},
>>>> assembler
>>>> +// BIN2: 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>>> +// BIN2: 18: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>>> +// BIN2: 19: backend, {18}, assembler, (host-cuda)
>>>> +// BIN2: 20: assembler, {19}, object, (host-cuda)
>>>> +// BIN2: 21: linker, {20}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architecturess up to the assemble phase.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=ASM2 %s
>>>> +// ASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// ASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// ASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// ASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// ASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// ASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// ASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// ASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>>> +// ASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>>> +// ASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>>>> assembler
>>>> +// ASM2: 10: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// ASM2: 11: preprocessor, {10}, cuda-cpp-output, (host-cuda)
>>>> +// ASM2: 12: compiler, {11}, ir, (host-cuda)
>>>> +// ASM2: 13: backend, {12}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HBIN %s
>>>> +// HBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HBIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HBIN: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HBIN: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HBIN: 4: backend, {3}, assembler, (host-cuda)
>>>> +// HBIN: 5: assembler, {4}, object, (host-cuda)
>>>> +// HBIN: 6: linker, {5}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HASM %s
>>>> +// HASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HASM: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HASM: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HASM: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HASM: 4: backend, {3}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HBIN2 %s
>>>> +// HBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HBIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HBIN2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HBIN2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HBIN2: 4: backend, {3}, assembler, (host-cuda)
>>>> +// HBIN2: 5: assembler, {4}, object, (host-cuda)
>>>> +// HBIN2: 6: linker, {5}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures up to the assemble phase in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HASM2 %s
>>>> +// HASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HASM2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HASM2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HASM2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HASM2: 4: backend, {3}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation in
>>>> device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DBIN %s
>>>> +// DBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DBIN: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DBIN: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DBIN: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DBIN: 4: assembler, {3}, object, (device-cuda, sm_30)
>>>> +// DBIN: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>>>> object
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DASM %s
>>>> +// DASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DBIN2 %s
>>>> +// DBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DBIN2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DBIN2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DBIN2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DBIN2: 4: assembler, {3}, object, (device-cuda, sm_30)
>>>> +// DBIN2: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>>>> object
>>>> +// DBIN2: 6: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// DBIN2: 7: preprocessor, {6}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// DBIN2: 8: compiler, {7}, ir, (device-cuda, sm_35)
>>>> +// DBIN2: 9: backend, {8}, assembler, (device-cuda, sm_35)
>>>> +// DBIN2: 10: assembler, {9}, object, (device-cuda, sm_35)
>>>> +// DBIN2: 11: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {10},
>>>> object
>>>> +
>>>> +//
>>>> +// Test two gpu architectures up to the assemble phase in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DASM2 %s
>>>> +// DASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// DASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// DASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// DASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>>> +// DASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>>> +// DASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>>>> assembler
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>
>>>
>>
>
>
> --
> --Artem Belevich
>

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160718/1bb4afd5/attachment-0001.html>