r275645 - [CUDA][OpenMP] Create generic offload action
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Mon Jul 18 16:38:32 PDT 2016
On Mon, Jul 18, 2016 at 4:28 PM, Artem Belevich <tra at google.com> wrote:
> CXX headers had to be added twice because we needed them for both host and
> device side of compilation, but only *host* toolchain knew where to find
> them. That's the part under "Add C++ include arguments."
>
> The second copy under "isIAMCU" below was added in r272883 and should
> indeed be removed.
>
Correction - r272883 is *not* the culprit. My commit r253386 is the source
of the mess.
I believe it was a copy/paste error and the intent was to
call AuxToolChain->.AddClangSystemIncludeArgs(Args, CmdArgs) for non-C++
files.
--Artem
>
> On Mon, Jul 18, 2016 at 4:03 PM, Samuel F Antao <sfantao at us.ibm.com>
> wrote:
>
>> Hi Richard,
>>
>> I agree, I don't think the second `addExtraOffloadCXXStdlibIncludeArgs`
>> is required. When I did this change my focus was to maintain functionality
>> of the existing code. I can confirm that removing that passes the existent
>> tests successfully. It is possible, however, there is some use case for the
>> existing CUDA implementation that requires C++ include paths to be included
>> for non C++ input types?
>>
>> Art, Justin can you confirm that is the case? If not, should I go ahead
>> and remove the duplicated code?
>>
>> Thanks!
>> Samuel
>>
>> On Mon, Jul 18, 2016 at 5:45 PM, Richard Smith via cfe-commits <
>> cfe-commits at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Fri, Jul 15, 2016 at 4:13 PM, Samuel Antao via cfe-commits <
>>> cfe-commits at lists.llvm.org> wrote:
>>>
>>>> Author: sfantao
>>>> Date: Fri Jul 15 18:13:27 2016
>>>> New Revision: 275645
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=275645&view=rev
>>>> Log:
>>>> [CUDA][OpenMP] Create generic offload action
>>>>
>>>> Summary:
>>>> This patch replaces the CUDA specific action by a generic offload
>>>> action. The offload action may have multiple dependences classier in “host”
>>>> and “device”. The way this generic offloading action is used is very
>>>> similar to what is done today by the CUDA implementation: it is used to set
>>>> a specific toolchain and architecture to its dependences during the
>>>> generation of jobs.
>>>>
>>>> This patch also proposes propagating the offloading information through
>>>> the action graph so that that information can be easily retrieved at any
>>>> time during the generation of commands. This allows e.g. the "clang tool”
>>>> to evaluate whether CUDA should be supported for the device or host and
>>>> ptas to easily retrieve the target architecture.
>>>>
>>>> This is an example of how the action graphs would look like
>>>> (compilation of a single CUDA file with two GPU architectures)
>>>> ```
>>>> 0: input, "cudatests.cu", cuda, (host-cuda)
>>>> 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> 2: compiler, {1}, ir, (host-cuda)
>>>> 3: input, "cudatests.cu", cuda, (device-cuda, sm_35)
>>>> 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_35)
>>>> 5: compiler, {4}, ir, (device-cuda, sm_35)
>>>> 6: backend, {5}, assembler, (device-cuda, sm_35)
>>>> 7: assembler, {6}, object, (device-cuda, sm_35)
>>>> 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {7}, object
>>>> 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {6}, assembler
>>>> 10: input, "cudatests.cu", cuda, (device-cuda, sm_37)
>>>> 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_37)
>>>> 12: compiler, {11}, ir, (device-cuda, sm_37)
>>>> 13: backend, {12}, assembler, (device-cuda, sm_37)
>>>> 14: assembler, {13}, object, (device-cuda, sm_37)
>>>> 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {14}, object
>>>> 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {13}, assembler
>>>> 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>>> 18: offload, "host-cuda (powerpc64le-unknown-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>>> 19: backend, {18}, assembler
>>>> 20: assembler, {19}, object
>>>> 21: input, "cuda", object
>>>> 22: input, "cudart", object
>>>> 23: linker, {20, 21, 22}, image
>>>> ```
>>>> The changes in this patch pass the existent regression tests (keeps the
>>>> existent functionality) and resulting binaries execute correctly in a
>>>> Power8+K40 machine.
>>>>
>>>> Reviewers: echristo, hfinkel, jlebar, ABataev, tra
>>>>
>>>> Subscribers: guansong, andreybokhanko, tcramer, mkuron, cfe-commits,
>>>> arpith-jacob, carlo.bertolli, caomhin
>>>>
>>>> Differential Revision: https://reviews.llvm.org/D18171
>>>>
>>>> Added:
>>>> cfe/trunk/test/Driver/cuda_phases.cu
>>>> Modified:
>>>> cfe/trunk/include/clang/Driver/Action.h
>>>> cfe/trunk/include/clang/Driver/Compilation.h
>>>> cfe/trunk/include/clang/Driver/Driver.h
>>>> cfe/trunk/lib/Driver/Action.cpp
>>>> cfe/trunk/lib/Driver/Driver.cpp
>>>> cfe/trunk/lib/Driver/ToolChain.cpp
>>>> cfe/trunk/lib/Driver/Tools.cpp
>>>> cfe/trunk/lib/Driver/Tools.h
>>>> cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Action.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Action.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Action.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Action.h Fri Jul 15 18:13:27 2016
>>>> @@ -13,6 +13,7 @@
>>>> #include "clang/Basic/Cuda.h"
>>>> #include "clang/Driver/Types.h"
>>>> #include "clang/Driver/Util.h"
>>>> +#include "llvm/ADT/STLExtras.h"
>>>> #include "llvm/ADT/SmallVector.h"
>>>>
>>>> namespace llvm {
>>>> @@ -27,6 +28,8 @@ namespace opt {
>>>> namespace clang {
>>>> namespace driver {
>>>>
>>>> +class ToolChain;
>>>> +
>>>> /// Action - Represent an abstract compilation step to perform.
>>>> ///
>>>> /// An action represents an edge in the compilation graph; typically
>>>> @@ -50,8 +53,7 @@ public:
>>>> enum ActionClass {
>>>> InputClass = 0,
>>>> BindArchClass,
>>>> - CudaDeviceClass,
>>>> - CudaHostClass,
>>>> + OffloadClass,
>>>> PreprocessJobClass,
>>>> PrecompileJobClass,
>>>> AnalyzeJobClass,
>>>> @@ -65,17 +67,13 @@ public:
>>>> VerifyDebugInfoJobClass,
>>>> VerifyPCHJobClass,
>>>>
>>>> - JobClassFirst=PreprocessJobClass,
>>>> - JobClassLast=VerifyPCHJobClass
>>>> + JobClassFirst = PreprocessJobClass,
>>>> + JobClassLast = VerifyPCHJobClass
>>>> };
>>>>
>>>> // The offloading kind determines if this action is binded to a
>>>> particular
>>>> // programming model. Each entry reserves one bit. We also have a
>>>> special kind
>>>> // to designate the host offloading tool chain.
>>>> - //
>>>> - // FIXME: This is currently used to indicate that tool chains are
>>>> used in a
>>>> - // given programming, but will be used here as well once a generic
>>>> offloading
>>>> - // action is implemented.
>>>> enum OffloadKind {
>>>> OFK_None = 0x00,
>>>> // The host offloading tool chain.
>>>> @@ -95,6 +93,19 @@ private:
>>>> ActionList Inputs;
>>>>
>>>> protected:
>>>> + ///
>>>> + /// Offload information.
>>>> + ///
>>>> +
>>>> + /// The host offloading kind - a combination of kinds encoded in a
>>>> mask.
>>>> + /// Multiple programming models may be supported simultaneously by
>>>> the same
>>>> + /// host.
>>>> + unsigned ActiveOffloadKindMask = 0u;
>>>> + /// Offloading kind of the device.
>>>> + OffloadKind OffloadingDeviceKind = OFK_None;
>>>> + /// The Offloading architecture associated with this action.
>>>> + const char *OffloadingArch = nullptr;
>>>> +
>>>> Action(ActionClass Kind, types::ID Type) : Action(Kind,
>>>> ActionList(), Type) {}
>>>> Action(ActionClass Kind, Action *Input, types::ID Type)
>>>> : Action(Kind, ActionList({Input}), Type) {}
>>>> @@ -124,6 +135,40 @@ public:
>>>> input_const_range inputs() const {
>>>> return input_const_range(input_begin(), input_end());
>>>> }
>>>> +
>>>> + /// Return a string containing the offload kind of the action.
>>>> + std::string getOffloadingKindPrefix() const;
>>>> + /// Return a string that can be used as prefix in order to generate
>>>> unique
>>>> + /// files for each offloading kind.
>>>> + std::string getOffloadingFileNamePrefix(StringRef NormalizedTriple)
>>>> const;
>>>> +
>>>> + /// Set the device offload info of this action and propagate it to
>>>> its
>>>> + /// dependences.
>>>> + void propagateDeviceOffloadInfo(OffloadKind OKind, const char
>>>> *OArch);
>>>> + /// Append the host offload info of this action and propagate it to
>>>> its
>>>> + /// dependences.
>>>> + void propagateHostOffloadInfo(unsigned OKinds, const char *OArch);
>>>> + /// Set the offload info of this action to be the same as the
>>>> provided action,
>>>> + /// and propagate it to its dependences.
>>>> + void propagateOffloadInfo(const Action *A);
>>>> +
>>>> + unsigned getOffloadingHostActiveKinds() const {
>>>> + return ActiveOffloadKindMask;
>>>> + }
>>>> + OffloadKind getOffloadingDeviceKind() const { return
>>>> OffloadingDeviceKind; }
>>>> + const char *getOffloadingArch() const { return OffloadingArch; }
>>>> +
>>>> + /// Check if this action have any offload kinds. Note that host
>>>> offload kinds
>>>> + /// are only set if the action is a dependence to a host offload
>>>> action.
>>>> + bool isHostOffloading(OffloadKind OKind) const {
>>>> + return ActiveOffloadKindMask & OKind;
>>>> + }
>>>> + bool isDeviceOffloading(OffloadKind OKind) const {
>>>> + return OffloadingDeviceKind == OKind;
>>>> + }
>>>> + bool isOffloading(OffloadKind OKind) const {
>>>> + return isHostOffloading(OKind) || isDeviceOffloading(OKind);
>>>> + }
>>>> };
>>>>
>>>> class InputAction : public Action {
>>>> @@ -156,39 +201,126 @@ public:
>>>> }
>>>> };
>>>>
>>>> -class CudaDeviceAction : public Action {
>>>> +/// An offload action combines host or/and device actions according to
>>>> the
>>>> +/// programming model implementation needs and propagates the
>>>> offloading kind to
>>>> +/// its dependences.
>>>> +class OffloadAction final : public Action {
>>>> virtual void anchor();
>>>>
>>>> - const CudaArch GpuArch;
>>>> -
>>>> - /// True when action results are not consumed by the host action
>>>> (e.g when
>>>> - /// -fsyntax-only or --cuda-device-only options are used).
>>>> - bool AtTopLevel;
>>>> -
>>>> public:
>>>> - CudaDeviceAction(Action *Input, CudaArch Arch, bool AtTopLevel);
>>>> + /// Type used to communicate device actions. It associates bound
>>>> architecture,
>>>> + /// toolchain, and offload kind to each action.
>>>> + class DeviceDependences final {
>>>> + public:
>>>> + typedef SmallVector<const ToolChain *, 3> ToolChainList;
>>>> + typedef SmallVector<const char *, 3> BoundArchList;
>>>> + typedef SmallVector<OffloadKind, 3> OffloadKindList;
>>>> +
>>>> + private:
>>>> + // Lists that keep the information for each dependency. All the
>>>> lists are
>>>> + // meant to be updated in sync. We are adopting separate lists
>>>> instead of a
>>>> + // list of structs, because that simplifies forwarding the actions
>>>> list to
>>>> + // initialize the inputs of the base Action class.
>>>> +
>>>> + /// The dependence actions.
>>>> + ActionList DeviceActions;
>>>> + /// The offloading toolchains that should be used with the action.
>>>> + ToolChainList DeviceToolChains;
>>>> + /// The architectures that should be used with this action.
>>>> + BoundArchList DeviceBoundArchs;
>>>> + /// The offload kind of each dependence.
>>>> + OffloadKindList DeviceOffloadKinds;
>>>> +
>>>> + public:
>>>> + /// Add a action along with the associated toolchain, bound arch,
>>>> and
>>>> + /// offload kind.
>>>> + void add(Action &A, const ToolChain &TC, const char *BoundArch,
>>>> + OffloadKind OKind);
>>>> +
>>>> + /// Get each of the individual arrays.
>>>> + const ActionList &getActions() const { return DeviceActions; };
>>>> + const ToolChainList &getToolChains() const { return
>>>> DeviceToolChains; };
>>>> + const BoundArchList &getBoundArchs() const { return
>>>> DeviceBoundArchs; };
>>>> + const OffloadKindList &getOffloadKinds() const {
>>>> + return DeviceOffloadKinds;
>>>> + };
>>>> + };
>>>>
>>>> - /// Get the CUDA GPU architecture to which this Action corresponds.
>>>> Returns
>>>> - /// UNKNOWN if this Action corresponds to multiple architectures.
>>>> - CudaArch getGpuArch() const { return GpuArch; }
>>>> + /// Type used to communicate host actions. It associates bound
>>>> architecture,
>>>> + /// toolchain, and offload kinds to the host action.
>>>> + class HostDependence final {
>>>> + /// The dependence action.
>>>> + Action &HostAction;
>>>> + /// The offloading toolchain that should be used with the action.
>>>> + const ToolChain &HostToolChain;
>>>> + /// The architectures that should be used with this action.
>>>> + const char *HostBoundArch = nullptr;
>>>> + /// The offload kind of each dependence.
>>>> + unsigned HostOffloadKinds = 0u;
>>>> +
>>>> + public:
>>>> + HostDependence(Action &A, const ToolChain &TC, const char
>>>> *BoundArch,
>>>> + const unsigned OffloadKinds)
>>>> + : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch),
>>>> + HostOffloadKinds(OffloadKinds){};
>>>> + /// Constructor version that obtains the offload kinds from the
>>>> device
>>>> + /// dependencies.
>>>> + HostDependence(Action &A, const ToolChain &TC, const char
>>>> *BoundArch,
>>>> + const DeviceDependences &DDeps);
>>>> + Action *getAction() const { return &HostAction; };
>>>> + const ToolChain *getToolChain() const { return &HostToolChain; };
>>>> + const char *getBoundArch() const { return HostBoundArch; };
>>>> + unsigned getOffloadKinds() const { return HostOffloadKinds; };
>>>> + };
>>>>
>>>> - bool isAtTopLevel() const { return AtTopLevel; }
>>>> + typedef llvm::function_ref<void(Action *, const ToolChain *, const
>>>> char *)>
>>>> + OffloadActionWorkTy;
>>>>
>>>> - static bool classof(const Action *A) {
>>>> - return A->getKind() == CudaDeviceClass;
>>>> - }
>>>> -};
>>>> +private:
>>>> + /// The host offloading toolchain that should be used with the
>>>> action.
>>>> + const ToolChain *HostTC = nullptr;
>>>>
>>>> -class CudaHostAction : public Action {
>>>> - virtual void anchor();
>>>> - ActionList DeviceActions;
>>>> + /// The tool chains associated with the list of actions.
>>>> + DeviceDependences::ToolChainList DevToolChains;
>>>>
>>>> public:
>>>> - CudaHostAction(Action *Input, const ActionList &DeviceActions);
>>>> -
>>>> - const ActionList &getDeviceActions() const { return DeviceActions; }
>>>> + OffloadAction(const HostDependence &HDep);
>>>> + OffloadAction(const DeviceDependences &DDeps, types::ID Ty);
>>>> + OffloadAction(const HostDependence &HDep, const DeviceDependences
>>>> &DDeps);
>>>> +
>>>> + /// Execute the work specified in \a Work on the host dependence.
>>>> + void doOnHostDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> + /// Execute the work specified in \a Work on each device dependence.
>>>> + void doOnEachDeviceDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> + /// Execute the work specified in \a Work on each dependence.
>>>> + void doOnEachDependence(const OffloadActionWorkTy &Work) const;
>>>> +
>>>> + /// Execute the work specified in \a Work on each host or device
>>>> dependence if
>>>> + /// \a IsHostDependenceto is true or false, respectively.
>>>> + void doOnEachDependence(bool IsHostDependence,
>>>> + const OffloadActionWorkTy &Work) const;
>>>> +
>>>> + /// Return true if the action has a host dependence.
>>>> + bool hasHostDependence() const;
>>>> +
>>>> + /// Return the host dependence of this action. This function is only
>>>> expected
>>>> + /// to be called if the host dependence exists.
>>>> + Action *getHostDependence() const;
>>>> +
>>>> + /// Return true if the action has a single device dependence. If \a
>>>> + /// DoNotConsiderHostActions is set, ignore the host dependence, if
>>>> any, while
>>>> + /// accounting for the number of dependences.
>>>> + bool hasSingleDeviceDependence(bool DoNotConsiderHostActions =
>>>> false) const;
>>>> +
>>>> + /// Return the single device dependence of this action. This
>>>> function is only
>>>> + /// expected to be called if a single device dependence exists. If \a
>>>> + /// DoNotConsiderHostActions is set, a host dependence is allowed.
>>>> + Action *
>>>> + getSingleDeviceDependence(bool DoNotConsiderHostActions = false)
>>>> const;
>>>>
>>>> - static bool classof(const Action *A) { return A->getKind() ==
>>>> CudaHostClass; }
>>>> + static bool classof(const Action *A) { return A->getKind() ==
>>>> OffloadClass; }
>>>> };
>>>>
>>>> class JobAction : public Action {
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Compilation.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Compilation.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Compilation.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Compilation.h Fri Jul 15 18:13:27
>>>> 2016
>>>> @@ -98,12 +98,7 @@ public:
>>>> const Driver &getDriver() const { return TheDriver; }
>>>>
>>>> const ToolChain &getDefaultToolChain() const { return
>>>> DefaultToolChain; }
>>>> - const ToolChain *getOffloadingHostToolChain() const {
>>>> - auto It = OrderedOffloadingToolchains.find(Action::OFK_Host);
>>>> - if (It != OrderedOffloadingToolchains.end())
>>>> - return It->second;
>>>> - return nullptr;
>>>> - }
>>>> +
>>>> unsigned isOffloadingHostKind(Action::OffloadKind Kind) const {
>>>> return ActiveOffloadMask & Kind;
>>>> }
>>>> @@ -121,8 +116,8 @@ public:
>>>> return OrderedOffloadingToolchains.equal_range(Kind);
>>>> }
>>>>
>>>> - // Return an offload toolchain of the provided kind. Only one is
>>>> expected to
>>>> - // exist.
>>>> + /// Return an offload toolchain of the provided kind. Only one is
>>>> expected to
>>>> + /// exist.
>>>> template <Action::OffloadKind Kind>
>>>> const ToolChain *getSingleOffloadToolChain() const {
>>>> auto TCs = getOffloadToolChains<Kind>();
>>>>
>>>> Modified: cfe/trunk/include/clang/Driver/Driver.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Driver.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/include/clang/Driver/Driver.h (original)
>>>> +++ cfe/trunk/include/clang/Driver/Driver.h Fri Jul 15 18:13:27 2016
>>>> @@ -394,12 +394,13 @@ public:
>>>> /// BuildJobsForAction - Construct the jobs to perform for the
>>>> action \p A and
>>>> /// return an InputInfo for the result of running \p A. Will only
>>>> construct
>>>> /// jobs for a given (Action, ToolChain, BoundArch) tuple once.
>>>> - InputInfo BuildJobsForAction(Compilation &C, const Action *A,
>>>> - const ToolChain *TC, const char
>>>> *BoundArch,
>>>> - bool AtTopLevel, bool MultipleArchs,
>>>> - const char *LinkingOutput,
>>>> - std::map<std::pair<const Action *,
>>>> std::string>,
>>>> - InputInfo> &CachedResults)
>>>> const;
>>>> + InputInfo
>>>> + BuildJobsForAction(Compilation &C, const Action *A, const ToolChain
>>>> *TC,
>>>> + const char *BoundArch, bool AtTopLevel, bool
>>>> MultipleArchs,
>>>> + const char *LinkingOutput,
>>>> + std::map<std::pair<const Action *, std::string>,
>>>> InputInfo>
>>>> + &CachedResults,
>>>> + bool BuildForOffloadDevice) const;
>>>>
>>>> /// Returns the default name for linked images (e.g., "a.out").
>>>> const char *getDefaultImageName() const;
>>>> @@ -415,12 +416,11 @@ public:
>>>> /// \param BoundArch - The bound architecture.
>>>> /// \param AtTopLevel - Whether this is a "top-level" action.
>>>> /// \param MultipleArchs - Whether multiple -arch options were
>>>> supplied.
>>>> - const char *GetNamedOutputPath(Compilation &C,
>>>> - const JobAction &JA,
>>>> - const char *BaseInput,
>>>> - const char *BoundArch,
>>>> - bool AtTopLevel,
>>>> - bool MultipleArchs) const;
>>>> + /// \param NormalizedTriple - The normalized triple of the relevant
>>>> target.
>>>> + const char *GetNamedOutputPath(Compilation &C, const JobAction &JA,
>>>> + const char *BaseInput, const char
>>>> *BoundArch,
>>>> + bool AtTopLevel, bool MultipleArchs,
>>>> + StringRef NormalizedTriple) const;
>>>>
>>>> /// GetTemporaryPath - Return the pathname of a temporary file to use
>>>> /// as part of compilation; the file will have the given prefix and
>>>> suffix.
>>>> @@ -467,7 +467,8 @@ private:
>>>> const char *BoundArch, bool AtTopLevel, bool MultipleArchs,
>>>> const char *LinkingOutput,
>>>> std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> - &CachedResults) const;
>>>> + &CachedResults,
>>>> + bool BuildForOffloadDevice) const;
>>>>
>>>> public:
>>>> /// GetReleaseVersion - Parse (([0-9]+)(.([0-9]+)(.([0-9]+)?))?)? and
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Action.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Action.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Action.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Action.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -8,6 +8,7 @@
>>>>
>>>> //===----------------------------------------------------------------------===//
>>>>
>>>> #include "clang/Driver/Action.h"
>>>> +#include "clang/Driver/ToolChain.h"
>>>> #include "llvm/ADT/StringSwitch.h"
>>>> #include "llvm/Support/ErrorHandling.h"
>>>> #include "llvm/Support/Regex.h"
>>>> @@ -21,8 +22,8 @@ const char *Action::getClassName(ActionC
>>>> switch (AC) {
>>>> case InputClass: return "input";
>>>> case BindArchClass: return "bind-arch";
>>>> - case CudaDeviceClass: return "cuda-device";
>>>> - case CudaHostClass: return "cuda-host";
>>>> + case OffloadClass:
>>>> + return "offload";
>>>> case PreprocessJobClass: return "preprocessor";
>>>> case PrecompileJobClass: return "precompiler";
>>>> case AnalyzeJobClass: return "analyzer";
>>>> @@ -40,6 +41,82 @@ const char *Action::getClassName(ActionC
>>>> llvm_unreachable("invalid class");
>>>> }
>>>>
>>>> +void Action::propagateDeviceOffloadInfo(OffloadKind OKind, const char
>>>> *OArch) {
>>>> + // Offload action set its own kinds on their dependences.
>>>> + if (Kind == OffloadClass)
>>>> + return;
>>>> +
>>>> + assert((OffloadingDeviceKind == OKind || OffloadingDeviceKind ==
>>>> OFK_None) &&
>>>> + "Setting device kind to a different device??");
>>>> + assert(!ActiveOffloadKindMask && "Setting a device kind in a host
>>>> action??");
>>>> + OffloadingDeviceKind = OKind;
>>>> + OffloadingArch = OArch;
>>>> +
>>>> + for (auto *A : Inputs)
>>>> + A->propagateDeviceOffloadInfo(OffloadingDeviceKind, OArch);
>>>> +}
>>>> +
>>>> +void Action::propagateHostOffloadInfo(unsigned OKinds, const char
>>>> *OArch) {
>>>> + // Offload action set its own kinds on their dependences.
>>>> + if (Kind == OffloadClass)
>>>> + return;
>>>> +
>>>> + assert(OffloadingDeviceKind == OFK_None &&
>>>> + "Setting a host kind in a device action.");
>>>> + ActiveOffloadKindMask |= OKinds;
>>>> + OffloadingArch = OArch;
>>>> +
>>>> + for (auto *A : Inputs)
>>>> + A->propagateHostOffloadInfo(ActiveOffloadKindMask, OArch);
>>>> +}
>>>> +
>>>> +void Action::propagateOffloadInfo(const Action *A) {
>>>> + if (unsigned HK = A->getOffloadingHostActiveKinds())
>>>> + propagateHostOffloadInfo(HK, A->getOffloadingArch());
>>>> + else
>>>> + propagateDeviceOffloadInfo(A->getOffloadingDeviceKind(),
>>>> + A->getOffloadingArch());
>>>> +}
>>>> +
>>>> +std::string Action::getOffloadingKindPrefix() const {
>>>> + switch (OffloadingDeviceKind) {
>>>> + case OFK_None:
>>>> + break;
>>>> + case OFK_Host:
>>>> + llvm_unreachable("Host kind is not an offloading device kind.");
>>>> + break;
>>>> + case OFK_Cuda:
>>>> + return "device-cuda";
>>>> +
>>>> + // TODO: Add other programming models here.
>>>> + }
>>>> +
>>>> + if (!ActiveOffloadKindMask)
>>>> + return "";
>>>> +
>>>> + std::string Res("host");
>>>> + if (ActiveOffloadKindMask & OFK_Cuda)
>>>> + Res += "-cuda";
>>>> +
>>>> + // TODO: Add other programming models here.
>>>> +
>>>> + return Res;
>>>> +}
>>>> +
>>>> +std::string
>>>> +Action::getOffloadingFileNamePrefix(StringRef NormalizedTriple) const {
>>>> + // A file prefix is only generated for device actions and consists
>>>> of the
>>>> + // offload kind and triple.
>>>> + if (!OffloadingDeviceKind)
>>>> + return "";
>>>> +
>>>> + std::string Res("-");
>>>> + Res += getOffloadingKindPrefix();
>>>> + Res += "-";
>>>> + Res += NormalizedTriple;
>>>> + return Res;
>>>> +}
>>>> +
>>>> void InputAction::anchor() {}
>>>>
>>>> InputAction::InputAction(const Arg &_Input, types::ID _Type)
>>>> @@ -51,16 +128,138 @@ void BindArchAction::anchor() {}
>>>> BindArchAction::BindArchAction(Action *Input, const char *_ArchName)
>>>> : Action(BindArchClass, Input), ArchName(_ArchName) {}
>>>>
>>>> -void CudaDeviceAction::anchor() {}
>>>> +void OffloadAction::anchor() {}
>>>> +
>>>> +OffloadAction::OffloadAction(const HostDependence &HDep)
>>>> + : Action(OffloadClass, HDep.getAction()),
>>>> HostTC(HDep.getToolChain()) {
>>>> + OffloadingArch = HDep.getBoundArch();
>>>> + ActiveOffloadKindMask = HDep.getOffloadKinds();
>>>> + HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>>> + HDep.getBoundArch());
>>>> +};
>>>> +
>>>> +OffloadAction::OffloadAction(const DeviceDependences &DDeps, types::ID
>>>> Ty)
>>>> + : Action(OffloadClass, DDeps.getActions(), Ty),
>>>> + DevToolChains(DDeps.getToolChains()) {
>>>> + auto &OKinds = DDeps.getOffloadKinds();
>>>> + auto &BArchs = DDeps.getBoundArchs();
>>>> +
>>>> + // If all inputs agree on the same kind, use it also for this action.
>>>> + if (llvm::all_of(OKinds, [&](OffloadKind K) { return K ==
>>>> OKinds.front(); }))
>>>> + OffloadingDeviceKind = OKinds.front();
>>>> +
>>>> + // If we have a single dependency, inherit the architecture from it.
>>>> + if (OKinds.size() == 1)
>>>> + OffloadingArch = BArchs.front();
>>>> +
>>>> + // Propagate info to the dependencies.
>>>> + for (unsigned i = 0, e = getInputs().size(); i != e; ++i)
>>>> + getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);
>>>> +}
>>>> +
>>>> +OffloadAction::OffloadAction(const HostDependence &HDep,
>>>> + const DeviceDependences &DDeps)
>>>> + : Action(OffloadClass, HDep.getAction()),
>>>> HostTC(HDep.getToolChain()),
>>>> + DevToolChains(DDeps.getToolChains()) {
>>>> + // We use the kinds of the host dependence for this action.
>>>> + OffloadingArch = HDep.getBoundArch();
>>>> + ActiveOffloadKindMask = HDep.getOffloadKinds();
>>>> + HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>>>> + HDep.getBoundArch());
>>>> +
>>>> + // Add device inputs and propagate info to the device actions. Do
>>>> work only if
>>>> + // we have dependencies.
>>>> + for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)
>>>> + if (auto *A = DDeps.getActions()[i]) {
>>>> + getInputs().push_back(A);
>>>> + A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],
>>>> + DDeps.getBoundArchs()[i]);
>>>> + }
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnHostDependence(const OffloadActionWorkTy
>>>> &Work) const {
>>>> + if (!HostTC)
>>>> + return;
>>>> + assert(!getInputs().empty() && "No dependencies for offload
>>>> action??");
>>>> + auto *A = getInputs().front();
>>>> + Work(A, HostTC, A->getOffloadingArch());
>>>> +}
>>>>
>>>> -CudaDeviceAction::CudaDeviceAction(Action *Input, clang::CudaArch Arch,
>>>> - bool AtTopLevel)
>>>> - : Action(CudaDeviceClass, Input), GpuArch(Arch),
>>>> AtTopLevel(AtTopLevel) {}
>>>> +void OffloadAction::doOnEachDeviceDependence(
>>>> + const OffloadActionWorkTy &Work) const {
>>>> + auto I = getInputs().begin();
>>>> + auto E = getInputs().end();
>>>> + if (I == E)
>>>> + return;
>>>> +
>>>> + // We expect to have the same number of input dependences and device
>>>> tool
>>>> + // chains, except if we also have a host dependence. In that case we
>>>> have one
>>>> + // more dependence than we have device tool chains.
>>>> + assert(getInputs().size() == DevToolChains.size() + (HostTC ? 1 : 0)
>>>> &&
>>>> + "Sizes of action dependences and toolchains are not
>>>> consistent!");
>>>> +
>>>> + // Skip host action
>>>> + if (HostTC)
>>>> + ++I;
>>>> +
>>>> + auto TI = DevToolChains.begin();
>>>> + for (; I != E; ++I, ++TI)
>>>> + Work(*I, *TI, (*I)->getOffloadingArch());
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnEachDependence(const OffloadActionWorkTy
>>>> &Work) const {
>>>> + doOnHostDependence(Work);
>>>> + doOnEachDeviceDependence(Work);
>>>> +}
>>>> +
>>>> +void OffloadAction::doOnEachDependence(bool IsHostDependence,
>>>> + const OffloadActionWorkTy
>>>> &Work) const {
>>>> + if (IsHostDependence)
>>>> + doOnHostDependence(Work);
>>>> + else
>>>> + doOnEachDeviceDependence(Work);
>>>> +}
>>>>
>>>> -void CudaHostAction::anchor() {}
>>>> +bool OffloadAction::hasHostDependence() const { return HostTC !=
>>>> nullptr; }
>>>>
>>>> -CudaHostAction::CudaHostAction(Action *Input, const ActionList
>>>> &DeviceActions)
>>>> - : Action(CudaHostClass, Input), DeviceActions(DeviceActions) {}
>>>> +Action *OffloadAction::getHostDependence() const {
>>>> + assert(hasHostDependence() && "Host dependence does not exist!");
>>>> + assert(!getInputs().empty() && "No dependencies for offload
>>>> action??");
>>>> + return HostTC ? getInputs().front() : nullptr;
>>>> +}
>>>> +
>>>> +bool OffloadAction::hasSingleDeviceDependence(
>>>> + bool DoNotConsiderHostActions) const {
>>>> + if (DoNotConsiderHostActions)
>>>> + return getInputs().size() == (HostTC ? 2 : 1);
>>>> + return !HostTC && getInputs().size() == 1;
>>>> +}
>>>> +
>>>> +Action *
>>>> +OffloadAction::getSingleDeviceDependence(bool
>>>> DoNotConsiderHostActions) const {
>>>> + assert(hasSingleDeviceDependence(DoNotConsiderHostActions) &&
>>>> + "Single device dependence does not exist!");
>>>> + // The previous assert ensures the number of entries in getInputs()
>>>> is
>>>> + // consistent with what we are doing here.
>>>> + return HostTC ? getInputs()[1] : getInputs().front();
>>>> +}
>>>> +
>>>> +void OffloadAction::DeviceDependences::add(Action &A, const ToolChain
>>>> &TC,
>>>> + const char *BoundArch,
>>>> + OffloadKind OKind) {
>>>> + DeviceActions.push_back(&A);
>>>> + DeviceToolChains.push_back(&TC);
>>>> + DeviceBoundArchs.push_back(BoundArch);
>>>> + DeviceOffloadKinds.push_back(OKind);
>>>> +}
>>>> +
>>>> +OffloadAction::HostDependence::HostDependence(Action &A, const
>>>> ToolChain &TC,
>>>> + const char *BoundArch,
>>>> + const DeviceDependences
>>>> &DDeps)
>>>> + : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch) {
>>>> + for (auto K : DDeps.getOffloadKinds())
>>>> + HostOffloadKinds |= K;
>>>> +}
>>>>
>>>> void JobAction::anchor() {}
>>>>
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Driver.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Driver.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Driver.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Driver.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -435,7 +435,9 @@ void Driver::CreateOffloadingDeviceToolC
>>>> })) {
>>>> const ToolChain &TC = getToolChain(
>>>> C.getInputArgs(),
>>>> -
>>>> llvm::Triple(C.getOffloadingHostToolChain()->getTriple().isArch64Bit()
>>>> + llvm::Triple(C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> + ->getTriple()
>>>> + .isArch64Bit()
>>>> ? "nvptx64-nvidia-cuda"
>>>> : "nvptx-nvidia-cuda"));
>>>> C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
>>>> @@ -1022,19 +1024,33 @@ static unsigned PrintActions1(const Comp
>>>> } else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
>>>> os << '"' << BIA->getArchName() << '"' << ", {"
>>>> << PrintActions1(C, *BIA->input_begin(), Ids) << "}";
>>>> - } else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>>> - CudaArch Arch = CDA->getGpuArch();
>>>> - if (Arch != CudaArch::UNKNOWN)
>>>> - os << "'" << CudaArchToString(Arch) << "', ";
>>>> - os << "{" << PrintActions1(C, *CDA->input_begin(), Ids) << "}";
>>>> + } else if (OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>>> + bool IsFirst = true;
>>>> + OA->doOnEachDependence(
>>>> + [&](Action *A, const ToolChain *TC, const char *BoundArch) {
>>>> + // E.g. for two CUDA device dependences whose bound arch is
>>>> sm_20 and
>>>> + // sm_35 this will generate:
>>>> + // "cuda-device" (nvptx64-nvidia-cuda:sm_20) {#ID},
>>>> "cuda-device"
>>>> + // (nvptx64-nvidia-cuda:sm_35) {#ID}
>>>> + if (!IsFirst)
>>>> + os << ", ";
>>>> + os << '"';
>>>> + if (TC)
>>>> + os << A->getOffloadingKindPrefix();
>>>> + else
>>>> + os << "host";
>>>> + os << " (";
>>>> + os << TC->getTriple().normalize();
>>>> +
>>>> + if (BoundArch)
>>>> + os << ":" << BoundArch;
>>>> + os << ")";
>>>> + os << '"';
>>>> + os << " {" << PrintActions1(C, A, Ids) << "}";
>>>> + IsFirst = false;
>>>> + });
>>>> } else {
>>>> - const ActionList *AL;
>>>> - if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>>> - os << "{" << PrintActions1(C, *CHA->input_begin(), Ids) << "}"
>>>> - << ", gpu binaries ";
>>>> - AL = &CHA->getDeviceActions();
>>>> - } else
>>>> - AL = &A->getInputs();
>>>> + const ActionList *AL = &A->getInputs();
>>>>
>>>> if (AL->size()) {
>>>> const char *Prefix = "{";
>>>> @@ -1047,10 +1063,24 @@ static unsigned PrintActions1(const Comp
>>>> os << "{}";
>>>> }
>>>>
>>>> + // Append offload info for all options other than the offloading
>>>> action
>>>> + // itself (e.g. (cuda-device, sm_20) or (cuda-host)).
>>>> + std::string offload_str;
>>>> + llvm::raw_string_ostream offload_os(offload_str);
>>>> + if (!isa<OffloadAction>(A)) {
>>>> + auto S = A->getOffloadingKindPrefix();
>>>> + if (!S.empty()) {
>>>> + offload_os << ", (" << S;
>>>> + if (A->getOffloadingArch())
>>>> + offload_os << ", " << A->getOffloadingArch();
>>>> + offload_os << ")";
>>>> + }
>>>> + }
>>>> +
>>>> unsigned Id = Ids.size();
>>>> Ids[A] = Id;
>>>> llvm::errs() << Id << ": " << os.str() << ", "
>>>> - << types::getTypeName(A->getType()) << "\n";
>>>> + << types::getTypeName(A->getType()) << offload_os.str()
>>>> << "\n";
>>>>
>>>> return Id;
>>>> }
>>>> @@ -1378,8 +1408,12 @@ static Action *buildCudaActions(Compilat
>>>> PartialCompilationArg &&
>>>>
>>>> PartialCompilationArg->getOption().matches(options::OPT_cuda_device_only);
>>>>
>>>> - if (CompileHostOnly)
>>>> - return C.MakeAction<CudaHostAction>(HostAction, ActionList());
>>>> + if (CompileHostOnly) {
>>>> + OffloadAction::HostDependence HDep(
>>>> + *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>>> + /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>>> + return C.MakeAction<OffloadAction>(HDep);
>>>> + }
>>>>
>>>> // Collect all cuda_gpu_arch parameters, removing duplicates.
>>>> SmallVector<CudaArch, 4> GpuArchList;
>>>> @@ -1408,8 +1442,6 @@ static Action *buildCudaActions(Compilat
>>>> CudaDeviceInputs.push_back(std::make_pair(types::TY_CUDA_DEVICE,
>>>> InputArg));
>>>>
>>>> // Build actions for all device inputs.
>>>> - assert(C.getSingleOffloadToolChain<Action::OFK_Cuda>() &&
>>>> - "Missing toolchain for device-side compilation.");
>>>> ActionList CudaDeviceActions;
>>>> C.getDriver().BuildActions(C, Args, CudaDeviceInputs,
>>>> CudaDeviceActions);
>>>> assert(GpuArchList.size() == CudaDeviceActions.size() &&
>>>> @@ -1421,6 +1453,8 @@ static Action *buildCudaActions(Compilat
>>>> return a->getKind() != Action::AssembleJobClass;
>>>> });
>>>>
>>>> + const ToolChain *CudaTC =
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>>> +
>>>> // Figure out what to do with device actions -- pass them as inputs
>>>> to the
>>>> // host action or run each of them independently.
>>>> if (PartialCompilation || CompileDeviceOnly) {
>>>> @@ -1436,10 +1470,13 @@ static Action *buildCudaActions(Compilat
>>>> return nullptr;
>>>> }
>>>>
>>>> - for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
>>>> -
>>>> Actions.push_back(C.MakeAction<CudaDeviceAction>(CudaDeviceActions[I],
>>>> - GpuArchList[I],
>>>> - /* AtTopLevel
>>>> */ true));
>>>> + for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
>>>> + OffloadAction::DeviceDependences DDep;
>>>> + DDep.add(*CudaDeviceActions[I], *CudaTC,
>>>> CudaArchToString(GpuArchList[I]),
>>>> + Action::OFK_Cuda);
>>>> + Actions.push_back(
>>>> + C.MakeAction<OffloadAction>(DDep,
>>>> CudaDeviceActions[I]->getType()));
>>>> + }
>>>> // Kill host action in case of device-only compilation.
>>>> if (CompileDeviceOnly)
>>>> return nullptr;
>>>> @@ -1459,19 +1496,23 @@ static Action *buildCudaActions(Compilat
>>>> Action* BackendAction = AssembleAction->getInputs()[0];
>>>> assert(BackendAction->getType() == types::TY_PP_Asm);
>>>>
>>>> - for (const auto& A : {AssembleAction, BackendAction}) {
>>>> - DeviceActions.push_back(C.MakeAction<CudaDeviceAction>(
>>>> - A, GpuArchList[I], /* AtTopLevel */ false));
>>>> + for (auto &A : {AssembleAction, BackendAction}) {
>>>> + OffloadAction::DeviceDependences DDep;
>>>> + DDep.add(*A, *CudaTC, CudaArchToString(GpuArchList[I]),
>>>> Action::OFK_Cuda);
>>>> + DeviceActions.push_back(C.MakeAction<OffloadAction>(DDep,
>>>> A->getType()));
>>>> }
>>>> }
>>>> - auto FatbinAction = C.MakeAction<CudaDeviceAction>(
>>>> - C.MakeAction<LinkJobAction>(DeviceActions,
>>>> types::TY_CUDA_FATBIN),
>>>> - CudaArch::UNKNOWN,
>>>> - /* AtTopLevel = */ false);
>>>> + auto FatbinAction =
>>>> + C.MakeAction<LinkJobAction>(DeviceActions,
>>>> types::TY_CUDA_FATBIN);
>>>> +
>>>> // Return a new host action that incorporates original host action
>>>> and all
>>>> // device actions.
>>>> - return C.MakeAction<CudaHostAction>(std::move(HostAction),
>>>> - ActionList({FatbinAction}));
>>>> + OffloadAction::HostDependence HDep(
>>>> + *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>>>> + /*BoundArch=*/nullptr, Action::OFK_Cuda);
>>>> + OffloadAction::DeviceDependences DDep;
>>>> + DDep.add(*FatbinAction, *CudaTC, /*BoundArch=*/nullptr,
>>>> Action::OFK_Cuda);
>>>> + return C.MakeAction<OffloadAction>(HDep, DDep);
>>>> }
>>>>
>>>> void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
>>>> @@ -1580,6 +1621,9 @@ void Driver::BuildActions(Compilation &C
>>>> YcArg = YuArg = nullptr;
>>>> }
>>>>
>>>> + // Track the host offload kinds used on this compilation.
>>>> + unsigned CompilationActiveOffloadHostKinds = 0u;
>>>> +
>>>> // Construct the actions to perform.
>>>> ActionList LinkerInputs;
>>>>
>>>> @@ -1648,6 +1692,9 @@ void Driver::BuildActions(Compilation &C
>>>> ? phases::Compile
>>>> : FinalPhase;
>>>>
>>>> + // Track the host offload kinds used on this input.
>>>> + unsigned InputActiveOffloadHostKinds = 0u;
>>>> +
>>>> // Build the pipeline for this file.
>>>> Action *Current = C.MakeAction<InputAction>(*InputArg, InputType);
>>>> for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e =
>>>> PL.end();
>>>> @@ -1679,21 +1726,36 @@ void Driver::BuildActions(Compilation &C
>>>> Current = buildCudaActions(C, Args, InputArg, Current,
>>>> Actions);
>>>> if (!Current)
>>>> break;
>>>> +
>>>> + // We produced a CUDA action for this input, so the host has
>>>> to support
>>>> + // CUDA.
>>>> + InputActiveOffloadHostKinds |= Action::OFK_Cuda;
>>>> + CompilationActiveOffloadHostKinds |= Action::OFK_Cuda;
>>>> }
>>>>
>>>> if (Current->getType() == types::TY_Nothing)
>>>> break;
>>>> }
>>>>
>>>> - // If we ended with something, add to the output list.
>>>> - if (Current)
>>>> + // If we ended with something, add to the output list. Also,
>>>> propagate the
>>>> + // offload information to the top-level host action related with
>>>> the current
>>>> + // input.
>>>> + if (Current) {
>>>> + if (InputActiveOffloadHostKinds)
>>>> + Current->propagateHostOffloadInfo(InputActiveOffloadHostKinds,
>>>> + /*BoundArch=*/nullptr);
>>>> Actions.push_back(Current);
>>>> + }
>>>> }
>>>>
>>>> - // Add a link action if necessary.
>>>> - if (!LinkerInputs.empty())
>>>> + // Add a link action if necessary and propagate the offload
>>>> information for
>>>> + // the current compilation.
>>>> + if (!LinkerInputs.empty()) {
>>>> Actions.push_back(
>>>> C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image));
>>>> +
>>>> Actions.back()->propagateHostOffloadInfo(CompilationActiveOffloadHostKinds,
>>>> + /*BoundArch=*/nullptr);
>>>> + }
>>>>
>>>> // If we are linking, claim any options which are obviously only
>>>> used for
>>>> // compilation.
>>>> @@ -1829,7 +1891,8 @@ void Driver::BuildJobs(Compilation &C) c
>>>> /*BoundArch*/ nullptr,
>>>> /*AtTopLevel*/ true,
>>>> /*MultipleArchs*/ ArchNames.size() > 1,
>>>> - /*LinkingOutput*/ LinkingOutput, CachedResults);
>>>> + /*LinkingOutput*/ LinkingOutput, CachedResults,
>>>> + /*BuildForOffloadDevice*/ false);
>>>> }
>>>>
>>>> // If the user passed -Qunused-arguments or there were errors, don't
>>>> warn
>>>> @@ -1878,7 +1941,28 @@ void Driver::BuildJobs(Compilation &C) c
>>>> }
>>>> }
>>>> }
>>>> -
>>>> +/// Collapse an offloading action looking for a job of the given type.
>>>> The input
>>>> +/// action is changed to the input of the collapsed sequence. If we
>>>> effectively
>>>> +/// had a collapse return the corresponding offloading action,
>>>> otherwise return
>>>> +/// null.
>>>> +template <typename T>
>>>> +static OffloadAction *collapseOffloadingAction(Action *&CurAction) {
>>>> + if (!CurAction)
>>>> + return nullptr;
>>>> + if (auto *OA = dyn_cast<OffloadAction>(CurAction)) {
>>>> + if (OA->hasHostDependence())
>>>> + if (auto *HDep = dyn_cast<T>(OA->getHostDependence())) {
>>>> + CurAction = HDep;
>>>> + return OA;
>>>> + }
>>>> + if (OA->hasSingleDeviceDependence())
>>>> + if (auto *DDep = dyn_cast<T>(OA->getSingleDeviceDependence())) {
>>>> + CurAction = DDep;
>>>> + return OA;
>>>> + }
>>>> + }
>>>> + return nullptr;
>>>> +}
>>>> // Returns a Tool for a given JobAction. In case the action and its
>>>> // predecessors can be combined, updates Inputs with the inputs of the
>>>> // first combined action. If one of the collapsed actions is a
>>>> @@ -1888,34 +1972,39 @@ static const Tool *selectToolForJob(Comp
>>>> bool EmbedBitcode, const ToolChain
>>>> *TC,
>>>> const JobAction *JA,
>>>> const ActionList *&Inputs,
>>>> - const CudaHostAction
>>>> *&CollapsedCHA) {
>>>> + ActionList
>>>> &CollapsedOffloadAction) {
>>>> const Tool *ToolForJob = nullptr;
>>>> - CollapsedCHA = nullptr;
>>>> + CollapsedOffloadAction.clear();
>>>>
>>>> // See if we should look for a compiler with an integrated
>>>> assembler. We match
>>>> // bottom up, so what we are actually looking for is an assembler
>>>> job with a
>>>> // compiler input.
>>>>
>>>> + // Look through offload actions between assembler and backend
>>>> actions.
>>>> + Action *BackendJA = (isa<AssembleJobAction>(JA) && Inputs->size() ==
>>>> 1)
>>>> + ? *Inputs->begin()
>>>> + : nullptr;
>>>> + auto *BackendOA =
>>>> collapseOffloadingAction<BackendJobAction>(BackendJA);
>>>> +
>>>> if (TC->useIntegratedAs() && !SaveTemps &&
>>>> !C.getArgs().hasArg(options::OPT_via_file_asm) &&
>>>> !C.getArgs().hasArg(options::OPT__SLASH_FA) &&
>>>> - !C.getArgs().hasArg(options::OPT__SLASH_Fa) &&
>>>> - isa<AssembleJobAction>(JA) && Inputs->size() == 1 &&
>>>> - isa<BackendJobAction>(*Inputs->begin())) {
>>>> + !C.getArgs().hasArg(options::OPT__SLASH_Fa) && BackendJA &&
>>>> + isa<BackendJobAction>(BackendJA)) {
>>>> // A BackendJob is always preceded by a CompileJob, and without
>>>> -save-temps
>>>> // or -fembed-bitcode, they will always get combined together, so
>>>> instead of
>>>> // checking the backend tool, check if the tool for the CompileJob
>>>> has an
>>>> // integrated assembler. For -fembed-bitcode, CompileJob is still
>>>> used to
>>>> // look up tools for BackendJob, but they need to match before we
>>>> can split
>>>> // them.
>>>> - const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
>>>> - // Compile job may be wrapped in CudaHostAction, extract it if
>>>> - // that's the case and update CollapsedCHA if we combine phases.
>>>> - CudaHostAction *CHA =
>>>> dyn_cast<CudaHostAction>(*BackendInputs->begin());
>>>> - JobAction *CompileJA = cast<CompileJobAction>(
>>>> - CHA ? *CHA->input_begin() : *BackendInputs->begin());
>>>> - assert(CompileJA && "Backend job is not preceeded by compile
>>>> job.");
>>>> - const Tool *Compiler = TC->SelectTool(*CompileJA);
>>>> +
>>>> + // Look through offload actions between backend and compile
>>>> actions.
>>>> + Action *CompileJA = *BackendJA->getInputs().begin();
>>>> + auto *CompileOA =
>>>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>>>> +
>>>> + assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>>> + "Backend job is not preceeded by compile job.");
>>>> + const Tool *Compiler =
>>>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>> if (!Compiler)
>>>> return nullptr;
>>>> // When using -fembed-bitcode, it is required to have the same
>>>> tool (clang)
>>>> @@ -1929,7 +2018,12 @@ static const Tool *selectToolForJob(Comp
>>>> if (Compiler->hasIntegratedAssembler()) {
>>>> Inputs = &CompileJA->getInputs();
>>>> ToolForJob = Compiler;
>>>> - CollapsedCHA = CHA;
>>>> + // Save the collapsed offload actions because they may still
>>>> contain
>>>> + // device actions.
>>>> + if (CompileOA)
>>>> + CollapsedOffloadAction.push_back(CompileOA);
>>>> + if (BackendOA)
>>>> + CollapsedOffloadAction.push_back(BackendOA);
>>>> }
>>>> }
>>>>
>>>> @@ -1939,20 +2033,23 @@ static const Tool *selectToolForJob(Comp
>>>> if (isa<BackendJobAction>(JA)) {
>>>> // Check if the compiler supports emitting LLVM IR.
>>>> assert(Inputs->size() == 1);
>>>> - // Compile job may be wrapped in CudaHostAction, extract it if
>>>> - // that's the case and update CollapsedCHA if we combine phases.
>>>> - CudaHostAction *CHA = dyn_cast<CudaHostAction>(*Inputs->begin());
>>>> - JobAction *CompileJA =
>>>> - cast<CompileJobAction>(CHA ? *CHA->input_begin() :
>>>> *Inputs->begin());
>>>> - assert(CompileJA && "Backend job is not preceeded by compile
>>>> job.");
>>>> - const Tool *Compiler = TC->SelectTool(*CompileJA);
>>>> +
>>>> + // Look through offload actions between backend and compile
>>>> actions.
>>>> + Action *CompileJA = *JA->getInputs().begin();
>>>> + auto *CompileOA =
>>>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>>>> +
>>>> + assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>>>> + "Backend job is not preceeded by compile job.");
>>>> + const Tool *Compiler =
>>>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>>>> if (!Compiler)
>>>> return nullptr;
>>>> if (!Compiler->canEmitIR() ||
>>>> (!SaveTemps && !EmbedBitcode)) {
>>>> Inputs = &CompileJA->getInputs();
>>>> ToolForJob = Compiler;
>>>> - CollapsedCHA = CHA;
>>>> +
>>>> + if (CompileOA)
>>>> + CollapsedOffloadAction.push_back(CompileOA);
>>>> }
>>>> }
>>>>
>>>> @@ -1963,12 +2060,21 @@ static const Tool *selectToolForJob(Comp
>>>> // See if we should use an integrated preprocessor. We do so when we
>>>> have
>>>> // exactly one input, since this is the only use case we care about
>>>> // (irrelevant since we don't support combine yet).
>>>> - if (Inputs->size() == 1 &&
>>>> isa<PreprocessJobAction>(*Inputs->begin()) &&
>>>> +
>>>> + // Look through offload actions after preprocessing.
>>>> + Action *PreprocessJA = (Inputs->size() == 1) ? *Inputs->begin() :
>>>> nullptr;
>>>> + auto *PreprocessOA =
>>>> + collapseOffloadingAction<PreprocessJobAction>(PreprocessJA);
>>>> +
>>>> + if (PreprocessJA && isa<PreprocessJobAction>(PreprocessJA) &&
>>>> !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
>>>> !C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps
>>>> &&
>>>> !C.getArgs().hasArg(options::OPT_rewrite_objc) &&
>>>> - ToolForJob->hasIntegratedCPP())
>>>> - Inputs = &(*Inputs)[0]->getInputs();
>>>> + ToolForJob->hasIntegratedCPP()) {
>>>> + Inputs = &PreprocessJA->getInputs();
>>>> + if (PreprocessOA)
>>>> + CollapsedOffloadAction.push_back(PreprocessOA);
>>>> + }
>>>>
>>>> return ToolForJob;
>>>> }
>>>> @@ -1976,8 +2082,8 @@ static const Tool *selectToolForJob(Comp
>>>> InputInfo Driver::BuildJobsForAction(
>>>> Compilation &C, const Action *A, const ToolChain *TC, const char
>>>> *BoundArch,
>>>> bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>>> - std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults)
>>>> - const {
>>>> + std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults,
>>>> + bool BuildForOffloadDevice) const {
>>>> // The bound arch is not necessarily represented in the toolchain's
>>>> triple --
>>>> // for example, armv7 and armv7s both map to the same triple -- so
>>>> we need
>>>> // both in our map.
>>>> @@ -1991,9 +2097,9 @@ InputInfo Driver::BuildJobsForAction(
>>>> if (CachedResult != CachedResults.end()) {
>>>> return CachedResult->second;
>>>> }
>>>> - InputInfo Result =
>>>> - BuildJobsForActionNoCache(C, A, TC, BoundArch, AtTopLevel,
>>>> MultipleArchs,
>>>> - LinkingOutput, CachedResults);
>>>> + InputInfo Result = BuildJobsForActionNoCache(
>>>> + C, A, TC, BoundArch, AtTopLevel, MultipleArchs, LinkingOutput,
>>>> + CachedResults, BuildForOffloadDevice);
>>>> CachedResults[ActionTC] = Result;
>>>> return Result;
>>>> }
>>>> @@ -2001,21 +2107,65 @@ InputInfo Driver::BuildJobsForAction(
>>>> InputInfo Driver::BuildJobsForActionNoCache(
>>>> Compilation &C, const Action *A, const ToolChain *TC, const char
>>>> *BoundArch,
>>>> bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>>>> - std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults)
>>>> - const {
>>>> + std::map<std::pair<const Action *, std::string>, InputInfo>
>>>> &CachedResults,
>>>> + bool BuildForOffloadDevice) const {
>>>> llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");
>>>>
>>>> - InputInfoList CudaDeviceInputInfos;
>>>> - if (const CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>>>> - // Append outputs of device jobs to the input list.
>>>> - for (const Action *DA : CHA->getDeviceActions()) {
>>>> - CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>>> - C, DA, TC, nullptr, AtTopLevel,
>>>> - /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>>> - }
>>>> - // Override current action with a real host compile action and
>>>> continue
>>>> - // processing it.
>>>> - A = *CHA->input_begin();
>>>> + InputInfoList OffloadDependencesInputInfo;
>>>> + if (const OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>>>> + // The offload action is expected to be used in four different
>>>> situations.
>>>> + //
>>>> + // a) Set a toolchain/architecture/kind for a host action:
>>>> + // Host Action 1 -> OffloadAction -> Host Action 2
>>>> + //
>>>> + // b) Set a toolchain/architecture/kind for a device action;
>>>> + // Device Action 1 -> OffloadAction -> Device Action 2
>>>> + //
>>>> + // c) Specify a device dependences to a host action;
>>>> + // Device Action 1 _
>>>> + // \
>>>> + // Host Action 1 ---> OffloadAction -> Host Action 2
>>>> + //
>>>> + // d) Specify a host dependence to a device action.
>>>> + // Host Action 1 _
>>>> + // \
>>>> + // Device Action 1 ---> OffloadAction -> Device Action 2
>>>> + //
>>>> + // For a) and b), we just return the job generated for the
>>>> dependence. For
>>>> + // c) and d) we override the current action with the host/device
>>>> dependence
>>>> + // if the current toolchain is host/device and set the offload
>>>> dependences
>>>> + // info with the jobs obtained from the device/host dependence(s).
>>>> +
>>>> + // If there is a single device option, just generate the job for
>>>> it.
>>>> + if (OA->hasSingleDeviceDependence()) {
>>>> + InputInfo DevA;
>>>> + OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain
>>>> *DepTC,
>>>> + const char *DepBoundArch) {
>>>> + DevA =
>>>> + BuildJobsForAction(C, DepA, DepTC, DepBoundArch,
>>>> AtTopLevel,
>>>> + /*MultipleArchs*/ !!DepBoundArch,
>>>> LinkingOutput,
>>>> + CachedResults,
>>>> /*BuildForOffloadDevice=*/true);
>>>> + });
>>>> + return DevA;
>>>> + }
>>>> +
>>>> + // If 'Action 2' is host, we generate jobs for the device
>>>> dependences and
>>>> + // override the current action with the host dependence.
>>>> Otherwise, we
>>>> + // generate the host dependences and override the action with the
>>>> device
>>>> + // dependence. The dependences can't therefore be a top-level
>>>> action.
>>>> + OA->doOnEachDependence(
>>>> + /*IsHostDependence=*/BuildForOffloadDevice,
>>>> + [&](Action *DepA, const ToolChain *DepTC, const char
>>>> *DepBoundArch) {
>>>> + OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>>> + C, DepA, DepTC, DepBoundArch, /*AtTopLevel=*/false,
>>>> + /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
>>>> CachedResults,
>>>> +
>>>> /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>>> + Action::OFK_None));
>>>> + });
>>>> +
>>>> + A = BuildForOffloadDevice
>>>> + ?
>>>> OA->getSingleDeviceDependence(/*DoNotConsiderHostActions=*/true)
>>>> + : OA->getHostDependence();
>>>> }
>>>>
>>>> if (const InputAction *IA = dyn_cast<InputAction>(A)) {
>>>> @@ -2042,41 +2192,34 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>> TC = &C.getDefaultToolChain();
>>>>
>>>> return BuildJobsForAction(C, *BAA->input_begin(), TC, ArchName,
>>>> AtTopLevel,
>>>> - MultipleArchs, LinkingOutput,
>>>> CachedResults);
>>>> + MultipleArchs, LinkingOutput,
>>>> CachedResults,
>>>> + BuildForOffloadDevice);
>>>> }
>>>>
>>>> - if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>>>> - // Initial processing of CudaDeviceAction carries host params.
>>>> - // Call BuildJobsForAction() again, now with correct device
>>>> parameters.
>>>> - InputInfo II = BuildJobsForAction(
>>>> - C, *CDA->input_begin(),
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
>>>> - CudaArchToString(CDA->getGpuArch()), CDA->isAtTopLevel(),
>>>> - /*MultipleArchs=*/true, LinkingOutput, CachedResults);
>>>> - // Currently II's Action is *CDA->input_begin(). Set it to CDA
>>>> instead, so
>>>> - // that one can retrieve II's GPU arch.
>>>> - II.setAction(A);
>>>> - return II;
>>>> - }
>>>>
>>>> const ActionList *Inputs = &A->getInputs();
>>>>
>>>> const JobAction *JA = cast<JobAction>(A);
>>>> - const CudaHostAction *CollapsedCHA = nullptr;
>>>> + ActionList CollapsedOffloadActions;
>>>> +
>>>> const Tool *T =
>>>> selectToolForJob(C, isSaveTempsEnabled(), embedBitcodeEnabled(),
>>>> TC, JA,
>>>> - Inputs, CollapsedCHA);
>>>> + Inputs, CollapsedOffloadActions);
>>>> if (!T)
>>>> return InputInfo();
>>>>
>>>> - // If we've collapsed action list that contained CudaHostAction we
>>>> - // need to build jobs for device-side inputs it may have held.
>>>> - if (CollapsedCHA) {
>>>> - for (const Action *DA : CollapsedCHA->getDeviceActions()) {
>>>> - CudaDeviceInputInfos.push_back(BuildJobsForAction(
>>>> - C, DA, TC, "", AtTopLevel,
>>>> - /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>>>> - }
>>>> - }
>>>> + // If we've collapsed action list that contained OffloadAction we
>>>> + // need to build jobs for host/device-side inputs it may have held.
>>>> + for (const auto *OA : CollapsedOffloadActions)
>>>> + cast<OffloadAction>(OA)->doOnEachDependence(
>>>> + /*IsHostDependence=*/BuildForOffloadDevice,
>>>> + [&](Action *DepA, const ToolChain *DepTC, const char
>>>> *DepBoundArch) {
>>>> + OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>>>> + C, DepA, DepTC, DepBoundArch, AtTopLevel,
>>>> + /*MultipleArchs=*/!!DepBoundArch, LinkingOutput,
>>>> CachedResults,
>>>> +
>>>> /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind() !=
>>>> + Action::OFK_None));
>>>> + });
>>>>
>>>> // Only use pipes when there is exactly one input.
>>>> InputInfoList InputInfos;
>>>> @@ -2086,9 +2229,9 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>> // FIXME: Clean this up.
>>>> bool SubJobAtTopLevel =
>>>> AtTopLevel && (isa<DsymutilJobAction>(A) ||
>>>> isa<VerifyJobAction>(A));
>>>> - InputInfos.push_back(BuildJobsForAction(C, Input, TC, BoundArch,
>>>> - SubJobAtTopLevel,
>>>> MultipleArchs,
>>>> - LinkingOutput,
>>>> CachedResults));
>>>> + InputInfos.push_back(BuildJobsForAction(
>>>> + C, Input, TC, BoundArch, SubJobAtTopLevel, MultipleArchs,
>>>> LinkingOutput,
>>>> + CachedResults, BuildForOffloadDevice));
>>>> }
>>>>
>>>> // Always use the first input as the base input.
>>>> @@ -2099,9 +2242,10 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>> if (JA->getType() == types::TY_dSYM)
>>>> BaseInput = InputInfos[0].getFilename();
>>>>
>>>> - // Append outputs of cuda device jobs to the input list
>>>> - if (CudaDeviceInputInfos.size())
>>>> - InputInfos.append(CudaDeviceInputInfos.begin(),
>>>> CudaDeviceInputInfos.end());
>>>> + // Append outputs of offload device jobs to the input list
>>>> + if (!OffloadDependencesInputInfo.empty())
>>>> + InputInfos.append(OffloadDependencesInputInfo.begin(),
>>>> + OffloadDependencesInputInfo.end());
>>>>
>>>> // Determine the place to write output to, if any.
>>>> InputInfo Result;
>>>> @@ -2109,7 +2253,8 @@ InputInfo Driver::BuildJobsForActionNoCa
>>>> Result = InputInfo(A, BaseInput);
>>>> else
>>>> Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput,
>>>> BoundArch,
>>>> - AtTopLevel,
>>>> MultipleArchs),
>>>> + AtTopLevel, MultipleArchs,
>>>> +
>>>> TC->getTriple().normalize()),
>>>> BaseInput);
>>>>
>>>> if (CCCPrintBindings && !CCGenDiagnostics) {
>>>> @@ -2169,7 +2314,8 @@ static const char *MakeCLOutputFilename(
>>>> const char *Driver::GetNamedOutputPath(Compilation &C, const JobAction
>>>> &JA,
>>>> const char *BaseInput,
>>>> const char *BoundArch, bool
>>>> AtTopLevel,
>>>> - bool MultipleArchs) const {
>>>> + bool MultipleArchs,
>>>> + StringRef NormalizedTriple)
>>>> const {
>>>> llvm::PrettyStackTraceString CrashInfo("Computing output path");
>>>> // Output to a user requested destination?
>>>> if (AtTopLevel && !isa<DsymutilJobAction>(JA) &&
>>>> !isa<VerifyJobAction>(JA)) {
>>>> @@ -2255,6 +2401,7 @@ const char *Driver::GetNamedOutputPath(C
>>>> MakeCLOutputFilename(C.getArgs(), "", BaseName,
>>>> types::TY_Image);
>>>> } else if (MultipleArchs && BoundArch) {
>>>> SmallString<128> Output(getDefaultImageName());
>>>> + Output += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>> Output += "-";
>>>> Output.append(BoundArch);
>>>> NamedOutput = C.getArgs().MakeArgString(Output.c_str());
>>>> @@ -2271,6 +2418,7 @@ const char *Driver::GetNamedOutputPath(C
>>>> if (!types::appendSuffixForType(JA.getType()))
>>>> End = BaseName.rfind('.');
>>>> SmallString<128> Suffixed(BaseName.substr(0, End));
>>>> + Suffixed += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>>>> if (MultipleArchs && BoundArch) {
>>>> Suffixed += "-";
>>>> Suffixed.append(BoundArch);
>>>>
>>>> Modified: cfe/trunk/lib/Driver/ToolChain.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChain.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/ToolChain.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/ToolChain.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -248,8 +248,7 @@ Tool *ToolChain::getTool(Action::ActionC
>>>>
>>>> case Action::InputClass:
>>>> case Action::BindArchClass:
>>>> - case Action::CudaDeviceClass:
>>>> - case Action::CudaHostClass:
>>>> + case Action::OffloadClass:
>>>> case Action::LipoJobClass:
>>>> case Action::DsymutilJobClass:
>>>> case Action::VerifyDebugInfoJobClass:
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Tools.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Tools.cpp (original)
>>>> +++ cfe/trunk/lib/Driver/Tools.cpp Fri Jul 15 18:13:27 2016
>>>> @@ -296,12 +296,45 @@ static bool forwardToGCC(const Option &O
>>>> !O.hasFlag(options::DriverOption) &&
>>>> !O.hasFlag(options::LinkerInput);
>>>> }
>>>>
>>>> +/// Add the C++ include args of other offloading toolchains. If this
>>>> is a host
>>>> +/// job, the device toolchains are added. If this is a device job, the
>>>> host
>>>> +/// toolchains will be added.
>>>> +static void addExtraOffloadCXXStdlibIncludeArgs(Compilation &C,
>>>> + const JobAction &JA,
>>>> + const ArgList &Args,
>>>> + ArgStringList
>>>> &CmdArgs) {
>>>> +
>>>> + if (JA.isHostOffloading(Action::OFK_Cuda))
>>>> + C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>>> + ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> + else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> + C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> + ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> +
>>>> + // TODO: Add support for other programming models here.
>>>> +}
>>>> +
>>>> +/// Add the include args that are specific of each offloading
>>>> programming model.
>>>> +static void addExtraOffloadSpecificIncludeArgs(Compilation &C,
>>>> + const JobAction &JA,
>>>> + const ArgList &Args,
>>>> + ArgStringList &CmdArgs)
>>>> {
>>>> +
>>>> + if (JA.isHostOffloading(Action::OFK_Cuda))
>>>> +
>>>> C.getSingleOffloadToolChain<Action::OFK_Host>()->AddCudaIncludeArgs(
>>>> + Args, CmdArgs);
>>>> + else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> +
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>()->AddCudaIncludeArgs(
>>>> + Args, CmdArgs);
>>>> +
>>>> + // TODO: Add support for other programming models here.
>>>> +}
>>>> +
>>>> void Clang::AddPreprocessingOptions(Compilation &C, const JobAction
>>>> &JA,
>>>> const Driver &D, const ArgList
>>>> &Args,
>>>> ArgStringList &CmdArgs,
>>>> const InputInfo &Output,
>>>> - const InputInfoList &Inputs,
>>>> - const ToolChain *AuxToolChain)
>>>> const {
>>>> + const InputInfoList &Inputs) const
>>>> {
>>>> Arg *A;
>>>> const bool IsIAMCU = getToolChain().getTriple().isOSIAMCU();
>>>>
>>>> @@ -566,31 +599,27 @@ void Clang::AddPreprocessingOptions(Comp
>>>> // OBJCPLUS_INCLUDE_PATH - system includes enabled when compiling
>>>> ObjC++.
>>>> addDirectoryList(Args, CmdArgs, "-objcxx-isystem",
>>>> "OBJCPLUS_INCLUDE_PATH");
>>>>
>>>> - // Optional AuxToolChain indicates that we need to include headers
>>>> - // for more than one target. If that's the case, add include paths
>>>> - // from AuxToolChain right after include paths of the same kind for
>>>> - // the current target.
>>>> + // While adding the include arguments, we also attempt to retrieve
>>>> the
>>>> + // arguments of related offloading toolchains or arguments that are
>>>> specific
>>>> + // of an offloading programming model.
>>>>
>>>> // Add C++ include arguments, if needed.
>>>> if (types::isCXX(Inputs[0].getType())) {
>>>> getToolChain().AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> - if (AuxToolChain)
>>>> - AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> + addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>>> }
>>>>
>>>> // Add system include arguments for all targets but IAMCU.
>>>> if (!IsIAMCU) {
>>>> getToolChain().AddClangSystemIncludeArgs(Args, CmdArgs);
>>>> - if (AuxToolChain)
>>>> - AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>>>> + addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>>>
>>>
>>> This doesn't make much sense to me: we already added the C++ stdlib
>>> includes a few lines above for C++ compiles. Should this be adding the
>>> (non-C++) system include args instead?
>>>
>>>
>>>> } else {
>>>> // For IAMCU add special include arguments.
>>>> getToolChain().AddIAMCUIncludeArgs(Args, CmdArgs);
>>>> }
>>>>
>>>> - // Add CUDA include arguments, if needed.
>>>> - if (types::isCuda(Inputs[0].getType()))
>>>> - getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
>>>> + // Add offload include arguments, if needed.
>>>> + addExtraOffloadSpecificIncludeArgs(C, JA, Args, CmdArgs);
>>>> }
>>>>
>>>> // FIXME: Move to target hook.
>>>> @@ -3799,7 +3828,7 @@ void Clang::ConstructJob(Compilation &C,
>>>> // CUDA compilation may have multiple inputs (source file + results
>>>> of
>>>> // device-side compilations). All other jobs are expected to have
>>>> exactly one
>>>> // input.
>>>> - bool IsCuda = types::isCuda(Input.getType());
>>>> + bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
>>>> assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple
>>>> inputs.");
>>>>
>>>> // C++ is not supported for IAMCU.
>>>> @@ -3815,21 +3844,21 @@ void Clang::ConstructJob(Compilation &C,
>>>> CmdArgs.push_back("-triple");
>>>> CmdArgs.push_back(Args.MakeArgString(TripleStr));
>>>>
>>>> - const ToolChain *AuxToolChain = nullptr;
>>>> if (IsCuda) {
>>>> - // FIXME: We need a (better) way to pass information about
>>>> - // particular compilation pass we're constructing here. For now we
>>>> - // can check which toolchain we're using and pick the other one to
>>>> - // extract the triple.
>>>> - if (&getToolChain() ==
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>())
>>>> - AuxToolChain = C.getOffloadingHostToolChain();
>>>> - else if (&getToolChain() == C.getOffloadingHostToolChain())
>>>> - AuxToolChain = C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>>>> - else
>>>> - llvm_unreachable("Can't figure out CUDA compilation mode.");
>>>> - assert(AuxToolChain != nullptr && "No aux toolchain.");
>>>> + // We have to pass the triple of the host if compiling for a CUDA
>>>> device and
>>>> + // vice-versa.
>>>> + StringRef NormalizedTriple;
>>>> + if (JA.isDeviceOffloading(Action::OFK_Cuda))
>>>> + NormalizedTriple =
>>>> C.getSingleOffloadToolChain<Action::OFK_Host>()
>>>> + ->getTriple()
>>>> + .normalize();
>>>> + else
>>>> + NormalizedTriple =
>>>> C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>>>> + ->getTriple()
>>>> + .normalize();
>>>> +
>>>> CmdArgs.push_back("-aux-triple");
>>>> -
>>>> CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));
>>>> + CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
>>>> }
>>>>
>>>> if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm ||
>>>> @@ -4718,8 +4747,7 @@ void Clang::ConstructJob(Compilation &C,
>>>> //
>>>> // FIXME: Support -fpreprocessed
>>>> if (types::getPreprocessedType(InputType) != types::TY_INVALID)
>>>> - AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs,
>>>> - AuxToolChain);
>>>> + AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs);
>>>>
>>>> // Don't warn about "clang -c -DPIC -fPIC test.i" because libtool.m4
>>>> assumes
>>>> // that "The compiler can only warn and ignore the option if not
>>>> recognized".
>>>> @@ -11193,15 +11221,14 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>> static_cast<const toolchains::CudaToolChain &>(getToolChain());
>>>> assert(TC.getTriple().isNVPTX() && "Wrong platform");
>>>>
>>>> - std::vector<std::string> gpu_archs =
>>>> - Args.getAllArgValues(options::OPT_march_EQ);
>>>> - assert(gpu_archs.size() == 1 && "Exactly one GPU Arch required for
>>>> ptxas.");
>>>> - const std::string& gpu_arch = gpu_archs[0];
>>>> + // Obtain architecture from the action.
>>>> + CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
>>>> + assert(gpu_arch != CudaArch::UNKNOWN &&
>>>> + "Device action expected to have an architecture.");
>>>>
>>>> // Check that our installation's ptxas supports gpu_arch.
>>>> if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
>>>> - TC.cudaInstallation().CheckCudaVersionSupportsArch(
>>>> - StringToCudaArch(gpu_arch));
>>>> + TC.cudaInstallation().CheckCudaVersionSupportsArch(gpu_arch);
>>>> }
>>>>
>>>> ArgStringList CmdArgs;
>>>> @@ -11245,7 +11272,7 @@ void NVPTX::Assembler::ConstructJob(Comp
>>>> }
>>>>
>>>> CmdArgs.push_back("--gpu-name");
>>>> - CmdArgs.push_back(Args.MakeArgString(gpu_arch));
>>>> + CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
>>>> CmdArgs.push_back("--output-file");
>>>> CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>> for (const auto& II : Inputs)
>>>> @@ -11277,13 +11304,20 @@ void NVPTX::Linker::ConstructJob(Compila
>>>> CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>>>
>>>> for (const auto& II : Inputs) {
>>>> - auto* A = cast<const CudaDeviceAction>(II.getAction());
>>>> + auto *A = II.getAction();
>>>> + assert(A->getInputs().size() == 1 &&
>>>> + "Device offload action is expected to have a single input");
>>>> + const char *gpu_arch_str = A->getOffloadingArch();
>>>> + assert(gpu_arch_str &&
>>>> + "Device action expected to have associated a GPU
>>>> architecture!");
>>>> + CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);
>>>> +
>>>> // We need to pass an Arch of the form "sm_XX" for cubin files and
>>>> // "compute_XX" for ptx.
>>>> const char *Arch =
>>>> (II.getType() == types::TY_PP_Asm)
>>>> - ?
>>>> CudaVirtualArchToString(VirtualArchForCudaArch(A->getGpuArch()))
>>>> - : CudaArchToString(A->getGpuArch());
>>>> + ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
>>>> + : gpu_arch_str;
>>>>
>>>> CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
>>>> Arch + ",file=" +
>>>> II.getFilename()));
>>>> }
>>>>
>>>> Modified: cfe/trunk/lib/Driver/Tools.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.h?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Driver/Tools.h (original)
>>>> +++ cfe/trunk/lib/Driver/Tools.h Fri Jul 15 18:13:27 2016
>>>> @@ -57,8 +57,7 @@ private:
>>>> const Driver &D, const
>>>> llvm::opt::ArgList &Args,
>>>> llvm::opt::ArgStringList &CmdArgs,
>>>> const InputInfo &Output,
>>>> - const InputInfoList &Inputs,
>>>> - const ToolChain *AuxToolChain) const;
>>>> + const InputInfoList &Inputs) const;
>>>>
>>>> void AddAArch64TargetArgs(const llvm::opt::ArgList &Args,
>>>> llvm::opt::ArgStringList &CmdArgs) const;
>>>>
>>>> Modified: cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>>> (original)
>>>> +++ cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp Fri Jul
>>>> 15 18:13:27 2016
>>>> @@ -60,25 +60,25 @@ clang::createInvocationFromCommandLine(A
>>>> }
>>>>
>>>> // We expect to get back exactly one command job, if we didn't
>>>> something
>>>> - // failed. CUDA compilation is an exception as it creates multiple
>>>> jobs. If
>>>> - // that's the case, we proceed with the first job. If caller needs
>>>> particular
>>>> - // CUDA job, it should be controlled via --cuda-{host|device}-only
>>>> option
>>>> - // passed to the driver.
>>>> + // failed. Offload compilation is an exception as it creates
>>>> multiple jobs. If
>>>> + // that's the case, we proceed with the first job. If caller needs a
>>>> + // particular job, it should be controlled via options (e.g.
>>>> + // --cuda-{host|device}-only for CUDA) passed to the driver.
>>>> const driver::JobList &Jobs = C->getJobs();
>>>> - bool CudaCompilation = false;
>>>> + bool OffloadCompilation = false;
>>>> if (Jobs.size() > 1) {
>>>> for (auto &A : C->getActions()){
>>>> // On MacOSX real actions may end up being wrapped in
>>>> BindArchAction
>>>> if (isa<driver::BindArchAction>(A))
>>>> A = *A->input_begin();
>>>> - if (isa<driver::CudaDeviceAction>(A)) {
>>>> - CudaCompilation = true;
>>>> + if (isa<driver::OffloadAction>(A)) {
>>>> + OffloadCompilation = true;
>>>> break;
>>>> }
>>>> }
>>>> }
>>>> if (Jobs.size() == 0 || !isa<driver::Command>(*Jobs.begin()) ||
>>>> - (Jobs.size() > 1 && !CudaCompilation)) {
>>>> + (Jobs.size() > 1 && !OffloadCompilation)) {
>>>> SmallString<256> Msg;
>>>> llvm::raw_svector_ostream OS(Msg);
>>>> Jobs.Print(OS, "; ", true);
>>>>
>>>> Added: cfe/trunk/test/Driver/cuda_phases.cu
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/cuda_phases.cu?rev=275645&view=auto
>>>>
>>>> ==============================================================================
>>>> --- cfe/trunk/test/Driver/cuda_phases.cu (added)
>>>> +++ cfe/trunk/test/Driver/cuda_phases.cu Fri Jul 15 18:13:27 2016
>>>> @@ -0,0 +1,206 @@
>>>> +// Tests the phases generated for a CUDA offloading target for
>>>> different
>>>> +// combinations of:
>>>> +// - Number of gpu architectures;
>>>> +// - Host/device-only compilation;
>>>> +// - User-requested final phase - binary or assembly.
>>>> +
>>>> +// REQUIRES: clang-driver
>>>> +// REQUIRES: powerpc-registered-target
>>>> +// REQUIRES: nvptx-registered-target
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=BIN %s
>>>> +// BIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// BIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// BIN: 2: compiler, {1}, ir, (host-cuda)
>>>> +// BIN: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// BIN: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// BIN: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>>> +// BIN: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>>> +// BIN: 7: assembler, {6}, object, (device-cuda, sm_30)
>>>> +// BIN: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
>>>> object
>>>> +// BIN: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>>>> assembler
>>>> +// BIN: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
>>>> +// BIN: 11: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {10}, ir
>>>> +// BIN: 12: backend, {11}, assembler, (host-cuda)
>>>> +// BIN: 13: assembler, {12}, object, (host-cuda)
>>>> +// BIN: 14: linker, {13}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=ASM %s
>>>> +// ASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// ASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// ASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// ASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// ASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// ASM: 5: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// ASM: 6: preprocessor, {5}, cuda-cpp-output, (host-cuda)
>>>> +// ASM: 7: compiler, {6}, ir, (host-cuda)
>>>> +// ASM: 8: backend, {7}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=BIN2 %s
>>>> +// BIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// BIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// BIN2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// BIN2: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// BIN2: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// BIN2: 5: compiler, {4}, ir, (device-cuda, sm_30)
>>>> +// BIN2: 6: backend, {5}, assembler, (device-cuda, sm_30)
>>>> +// BIN2: 7: assembler, {6}, object, (device-cuda, sm_30)
>>>> +// BIN2: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
>>>> object
>>>> +// BIN2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>>>> assembler
>>>> +// BIN2: 10: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// BIN2: 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// BIN2: 12: compiler, {11}, ir, (device-cuda, sm_35)
>>>> +// BIN2: 13: backend, {12}, assembler, (device-cuda, sm_35)
>>>> +// BIN2: 14: assembler, {13}, object, (device-cuda, sm_35)
>>>> +// BIN2: 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {14},
>>>> object
>>>> +// BIN2: 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {13},
>>>> assembler
>>>> +// BIN2: 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>>>> +// BIN2: 18: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>>>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>>>> +// BIN2: 19: backend, {18}, assembler, (host-cuda)
>>>> +// BIN2: 20: assembler, {19}, object, (host-cuda)
>>>> +// BIN2: 21: linker, {20}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architecturess up to the assemble phase.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=ASM2 %s
>>>> +// ASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// ASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// ASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// ASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// ASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// ASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// ASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// ASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>>> +// ASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>>> +// ASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>>>> assembler
>>>> +// ASM2: 10: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// ASM2: 11: preprocessor, {10}, cuda-cpp-output, (host-cuda)
>>>> +// ASM2: 12: compiler, {11}, ir, (host-cuda)
>>>> +// ASM2: 13: backend, {12}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HBIN %s
>>>> +// HBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HBIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HBIN: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HBIN: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HBIN: 4: backend, {3}, assembler, (host-cuda)
>>>> +// HBIN: 5: assembler, {4}, object, (host-cuda)
>>>> +// HBIN: 6: linker, {5}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HASM %s
>>>> +// HASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HASM: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HASM: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HASM: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HASM: 4: backend, {3}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HBIN2 %s
>>>> +// HBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HBIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HBIN2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HBIN2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HBIN2: 4: backend, {3}, assembler, (host-cuda)
>>>> +// HBIN2: 5: assembler, {4}, object, (host-cuda)
>>>> +// HBIN2: 6: linker, {5}, image, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test two gpu architectures up to the assemble phase in host-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=HASM2 %s
>>>> +// HASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>>>> +// HASM2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>>>> +// HASM2: 2: compiler, {1}, ir, (host-cuda)
>>>> +// HASM2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>>>> +// HASM2: 4: backend, {3}, assembler, (host-cuda)
>>>> +
>>>> +//
>>>> +// Test single gpu architecture with complete compilation in
>>>> device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DBIN %s
>>>> +// DBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DBIN: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DBIN: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DBIN: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DBIN: 4: assembler, {3}, object, (device-cuda, sm_30)
>>>> +// DBIN: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>>>> object
>>>> +
>>>> +//
>>>> +// Test single gpu architecture up to the assemble phase in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DASM %s
>>>> +// DASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +
>>>> +//
>>>> +// Test two gpu architectures with complete compilation in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DBIN2 %s
>>>> +// DBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DBIN2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DBIN2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DBIN2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DBIN2: 4: assembler, {3}, object, (device-cuda, sm_30)
>>>> +// DBIN2: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>>>> object
>>>> +// DBIN2: 6: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// DBIN2: 7: preprocessor, {6}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// DBIN2: 8: compiler, {7}, ir, (device-cuda, sm_35)
>>>> +// DBIN2: 9: backend, {8}, assembler, (device-cuda, sm_35)
>>>> +// DBIN2: 10: assembler, {9}, object, (device-cuda, sm_35)
>>>> +// DBIN2: 11: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {10},
>>>> object
>>>> +
>>>> +//
>>>> +// Test two gpu architectures up to the assemble phase in device-only
>>>> +// compilation mode.
>>>> +//
>>>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>>>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
>>>> +// RUN: | FileCheck -check-prefix=DASM2 %s
>>>> +// DASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>>>> +// DASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>>>> +// DASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>>>> +// DASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>>>> +// DASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>>>> assembler
>>>> +// DASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>>>> +// DASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>>>> +// DASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>>>> +// DASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>>>> +// DASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>>>> assembler
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>
>>>
>>
>
>
> --
> --Artem Belevich
>
--
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160718/1bb4afd5/attachment-0001.html>
More information about the cfe-commits
mailing list