r275645 - [CUDA][OpenMP] Create generic offload action
Samuel F Antao via cfe-commits
cfe-commits at lists.llvm.org
Mon Jul 18 16:03:37 PDT 2016
Hi Richard,
I agree, I don't think the second `addExtraOffloadCXXStdlibIncludeArgs` is
required. When I did this change my focus was to maintain functionality of
the existing code. I can confirm that removing that passes the existent
tests successfully. It is possible, however, there is some use case for the
existing CUDA implementation that requires C++ include paths to be included
for non C++ input types?
Art, Justin can you confirm that is the case? If not, should I go ahead and
remove the duplicated code?
Thanks!
Samuel
On Mon, Jul 18, 2016 at 5:45 PM, Richard Smith via cfe-commits <
cfe-commits at lists.llvm.org> wrote:
>
>
> On Fri, Jul 15, 2016 at 4:13 PM, Samuel Antao via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
>> Author: sfantao
>> Date: Fri Jul 15 18:13:27 2016
>> New Revision: 275645
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=275645&view=rev
>> Log:
>> [CUDA][OpenMP] Create generic offload action
>>
>> Summary:
>> This patch replaces the CUDA specific action by a generic offload action.
>> The offload action may have multiple dependences classier in “host” and
>> “device”. The way this generic offloading action is used is very similar to
>> what is done today by the CUDA implementation: it is used to set a specific
>> toolchain and architecture to its dependences during the generation of jobs.
>>
>> This patch also proposes propagating the offloading information through
>> the action graph so that that information can be easily retrieved at any
>> time during the generation of commands. This allows e.g. the "clang tool”
>> to evaluate whether CUDA should be supported for the device or host and
>> ptas to easily retrieve the target architecture.
>>
>> This is an example of how the action graphs would look like (compilation
>> of a single CUDA file with two GPU architectures)
>> ```
>> 0: input, "cudatests.cu", cuda, (host-cuda)
>> 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> 2: compiler, {1}, ir, (host-cuda)
>> 3: input, "cudatests.cu", cuda, (device-cuda, sm_35)
>> 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_35)
>> 5: compiler, {4}, ir, (device-cuda, sm_35)
>> 6: backend, {5}, assembler, (device-cuda, sm_35)
>> 7: assembler, {6}, object, (device-cuda, sm_35)
>> 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {7}, object
>> 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {6}, assembler
>> 10: input, "cudatests.cu", cuda, (device-cuda, sm_37)
>> 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_37)
>> 12: compiler, {11}, ir, (device-cuda, sm_37)
>> 13: backend, {12}, assembler, (device-cuda, sm_37)
>> 14: assembler, {13}, object, (device-cuda, sm_37)
>> 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {14}, object
>> 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {13}, assembler
>> 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>> 18: offload, "host-cuda (powerpc64le-unknown-linux-gnu)" {2},
>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>> 19: backend, {18}, assembler
>> 20: assembler, {19}, object
>> 21: input, "cuda", object
>> 22: input, "cudart", object
>> 23: linker, {20, 21, 22}, image
>> ```
>> The changes in this patch pass the existent regression tests (keeps the
>> existent functionality) and resulting binaries execute correctly in a
>> Power8+K40 machine.
>>
>> Reviewers: echristo, hfinkel, jlebar, ABataev, tra
>>
>> Subscribers: guansong, andreybokhanko, tcramer, mkuron, cfe-commits,
>> arpith-jacob, carlo.bertolli, caomhin
>>
>> Differential Revision: https://reviews.llvm.org/D18171
>>
>> Added:
>> cfe/trunk/test/Driver/cuda_phases.cu
>> Modified:
>> cfe/trunk/include/clang/Driver/Action.h
>> cfe/trunk/include/clang/Driver/Compilation.h
>> cfe/trunk/include/clang/Driver/Driver.h
>> cfe/trunk/lib/Driver/Action.cpp
>> cfe/trunk/lib/Driver/Driver.cpp
>> cfe/trunk/lib/Driver/ToolChain.cpp
>> cfe/trunk/lib/Driver/Tools.cpp
>> cfe/trunk/lib/Driver/Tools.h
>> cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>>
>> Modified: cfe/trunk/include/clang/Driver/Action.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Action.h?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Driver/Action.h (original)
>> +++ cfe/trunk/include/clang/Driver/Action.h Fri Jul 15 18:13:27 2016
>> @@ -13,6 +13,7 @@
>> #include "clang/Basic/Cuda.h"
>> #include "clang/Driver/Types.h"
>> #include "clang/Driver/Util.h"
>> +#include "llvm/ADT/STLExtras.h"
>> #include "llvm/ADT/SmallVector.h"
>>
>> namespace llvm {
>> @@ -27,6 +28,8 @@ namespace opt {
>> namespace clang {
>> namespace driver {
>>
>> +class ToolChain;
>> +
>> /// Action - Represent an abstract compilation step to perform.
>> ///
>> /// An action represents an edge in the compilation graph; typically
>> @@ -50,8 +53,7 @@ public:
>> enum ActionClass {
>> InputClass = 0,
>> BindArchClass,
>> - CudaDeviceClass,
>> - CudaHostClass,
>> + OffloadClass,
>> PreprocessJobClass,
>> PrecompileJobClass,
>> AnalyzeJobClass,
>> @@ -65,17 +67,13 @@ public:
>> VerifyDebugInfoJobClass,
>> VerifyPCHJobClass,
>>
>> - JobClassFirst=PreprocessJobClass,
>> - JobClassLast=VerifyPCHJobClass
>> + JobClassFirst = PreprocessJobClass,
>> + JobClassLast = VerifyPCHJobClass
>> };
>>
>> // The offloading kind determines if this action is binded to a
>> particular
>> // programming model. Each entry reserves one bit. We also have a
>> special kind
>> // to designate the host offloading tool chain.
>> - //
>> - // FIXME: This is currently used to indicate that tool chains are used
>> in a
>> - // given programming, but will be used here as well once a generic
>> offloading
>> - // action is implemented.
>> enum OffloadKind {
>> OFK_None = 0x00,
>> // The host offloading tool chain.
>> @@ -95,6 +93,19 @@ private:
>> ActionList Inputs;
>>
>> protected:
>> + ///
>> + /// Offload information.
>> + ///
>> +
>> + /// The host offloading kind - a combination of kinds encoded in a
>> mask.
>> + /// Multiple programming models may be supported simultaneously by the
>> same
>> + /// host.
>> + unsigned ActiveOffloadKindMask = 0u;
>> + /// Offloading kind of the device.
>> + OffloadKind OffloadingDeviceKind = OFK_None;
>> + /// The Offloading architecture associated with this action.
>> + const char *OffloadingArch = nullptr;
>> +
>> Action(ActionClass Kind, types::ID Type) : Action(Kind, ActionList(),
>> Type) {}
>> Action(ActionClass Kind, Action *Input, types::ID Type)
>> : Action(Kind, ActionList({Input}), Type) {}
>> @@ -124,6 +135,40 @@ public:
>> input_const_range inputs() const {
>> return input_const_range(input_begin(), input_end());
>> }
>> +
>> + /// Return a string containing the offload kind of the action.
>> + std::string getOffloadingKindPrefix() const;
>> + /// Return a string that can be used as prefix in order to generate
>> unique
>> + /// files for each offloading kind.
>> + std::string getOffloadingFileNamePrefix(StringRef NormalizedTriple)
>> const;
>> +
>> + /// Set the device offload info of this action and propagate it to its
>> + /// dependences.
>> + void propagateDeviceOffloadInfo(OffloadKind OKind, const char *OArch);
>> + /// Append the host offload info of this action and propagate it to its
>> + /// dependences.
>> + void propagateHostOffloadInfo(unsigned OKinds, const char *OArch);
>> + /// Set the offload info of this action to be the same as the provided
>> action,
>> + /// and propagate it to its dependences.
>> + void propagateOffloadInfo(const Action *A);
>> +
>> + unsigned getOffloadingHostActiveKinds() const {
>> + return ActiveOffloadKindMask;
>> + }
>> + OffloadKind getOffloadingDeviceKind() const { return
>> OffloadingDeviceKind; }
>> + const char *getOffloadingArch() const { return OffloadingArch; }
>> +
>> + /// Check if this action have any offload kinds. Note that host
>> offload kinds
>> + /// are only set if the action is a dependence to a host offload
>> action.
>> + bool isHostOffloading(OffloadKind OKind) const {
>> + return ActiveOffloadKindMask & OKind;
>> + }
>> + bool isDeviceOffloading(OffloadKind OKind) const {
>> + return OffloadingDeviceKind == OKind;
>> + }
>> + bool isOffloading(OffloadKind OKind) const {
>> + return isHostOffloading(OKind) || isDeviceOffloading(OKind);
>> + }
>> };
>>
>> class InputAction : public Action {
>> @@ -156,39 +201,126 @@ public:
>> }
>> };
>>
>> -class CudaDeviceAction : public Action {
>> +/// An offload action combines host or/and device actions according to
>> the
>> +/// programming model implementation needs and propagates the offloading
>> kind to
>> +/// its dependences.
>> +class OffloadAction final : public Action {
>> virtual void anchor();
>>
>> - const CudaArch GpuArch;
>> -
>> - /// True when action results are not consumed by the host action (e.g
>> when
>> - /// -fsyntax-only or --cuda-device-only options are used).
>> - bool AtTopLevel;
>> -
>> public:
>> - CudaDeviceAction(Action *Input, CudaArch Arch, bool AtTopLevel);
>> + /// Type used to communicate device actions. It associates bound
>> architecture,
>> + /// toolchain, and offload kind to each action.
>> + class DeviceDependences final {
>> + public:
>> + typedef SmallVector<const ToolChain *, 3> ToolChainList;
>> + typedef SmallVector<const char *, 3> BoundArchList;
>> + typedef SmallVector<OffloadKind, 3> OffloadKindList;
>> +
>> + private:
>> + // Lists that keep the information for each dependency. All the
>> lists are
>> + // meant to be updated in sync. We are adopting separate lists
>> instead of a
>> + // list of structs, because that simplifies forwarding the actions
>> list to
>> + // initialize the inputs of the base Action class.
>> +
>> + /// The dependence actions.
>> + ActionList DeviceActions;
>> + /// The offloading toolchains that should be used with the action.
>> + ToolChainList DeviceToolChains;
>> + /// The architectures that should be used with this action.
>> + BoundArchList DeviceBoundArchs;
>> + /// The offload kind of each dependence.
>> + OffloadKindList DeviceOffloadKinds;
>> +
>> + public:
>> + /// Add a action along with the associated toolchain, bound arch, and
>> + /// offload kind.
>> + void add(Action &A, const ToolChain &TC, const char *BoundArch,
>> + OffloadKind OKind);
>> +
>> + /// Get each of the individual arrays.
>> + const ActionList &getActions() const { return DeviceActions; };
>> + const ToolChainList &getToolChains() const { return
>> DeviceToolChains; };
>> + const BoundArchList &getBoundArchs() const { return
>> DeviceBoundArchs; };
>> + const OffloadKindList &getOffloadKinds() const {
>> + return DeviceOffloadKinds;
>> + };
>> + };
>>
>> - /// Get the CUDA GPU architecture to which this Action corresponds.
>> Returns
>> - /// UNKNOWN if this Action corresponds to multiple architectures.
>> - CudaArch getGpuArch() const { return GpuArch; }
>> + /// Type used to communicate host actions. It associates bound
>> architecture,
>> + /// toolchain, and offload kinds to the host action.
>> + class HostDependence final {
>> + /// The dependence action.
>> + Action &HostAction;
>> + /// The offloading toolchain that should be used with the action.
>> + const ToolChain &HostToolChain;
>> + /// The architectures that should be used with this action.
>> + const char *HostBoundArch = nullptr;
>> + /// The offload kind of each dependence.
>> + unsigned HostOffloadKinds = 0u;
>> +
>> + public:
>> + HostDependence(Action &A, const ToolChain &TC, const char *BoundArch,
>> + const unsigned OffloadKinds)
>> + : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch),
>> + HostOffloadKinds(OffloadKinds){};
>> + /// Constructor version that obtains the offload kinds from the
>> device
>> + /// dependencies.
>> + HostDependence(Action &A, const ToolChain &TC, const char *BoundArch,
>> + const DeviceDependences &DDeps);
>> + Action *getAction() const { return &HostAction; };
>> + const ToolChain *getToolChain() const { return &HostToolChain; };
>> + const char *getBoundArch() const { return HostBoundArch; };
>> + unsigned getOffloadKinds() const { return HostOffloadKinds; };
>> + };
>>
>> - bool isAtTopLevel() const { return AtTopLevel; }
>> + typedef llvm::function_ref<void(Action *, const ToolChain *, const
>> char *)>
>> + OffloadActionWorkTy;
>>
>> - static bool classof(const Action *A) {
>> - return A->getKind() == CudaDeviceClass;
>> - }
>> -};
>> +private:
>> + /// The host offloading toolchain that should be used with the action.
>> + const ToolChain *HostTC = nullptr;
>>
>> -class CudaHostAction : public Action {
>> - virtual void anchor();
>> - ActionList DeviceActions;
>> + /// The tool chains associated with the list of actions.
>> + DeviceDependences::ToolChainList DevToolChains;
>>
>> public:
>> - CudaHostAction(Action *Input, const ActionList &DeviceActions);
>> -
>> - const ActionList &getDeviceActions() const { return DeviceActions; }
>> + OffloadAction(const HostDependence &HDep);
>> + OffloadAction(const DeviceDependences &DDeps, types::ID Ty);
>> + OffloadAction(const HostDependence &HDep, const DeviceDependences
>> &DDeps);
>> +
>> + /// Execute the work specified in \a Work on the host dependence.
>> + void doOnHostDependence(const OffloadActionWorkTy &Work) const;
>> +
>> + /// Execute the work specified in \a Work on each device dependence.
>> + void doOnEachDeviceDependence(const OffloadActionWorkTy &Work) const;
>> +
>> + /// Execute the work specified in \a Work on each dependence.
>> + void doOnEachDependence(const OffloadActionWorkTy &Work) const;
>> +
>> + /// Execute the work specified in \a Work on each host or device
>> dependence if
>> + /// \a IsHostDependenceto is true or false, respectively.
>> + void doOnEachDependence(bool IsHostDependence,
>> + const OffloadActionWorkTy &Work) const;
>> +
>> + /// Return true if the action has a host dependence.
>> + bool hasHostDependence() const;
>> +
>> + /// Return the host dependence of this action. This function is only
>> expected
>> + /// to be called if the host dependence exists.
>> + Action *getHostDependence() const;
>> +
>> + /// Return true if the action has a single device dependence. If \a
>> + /// DoNotConsiderHostActions is set, ignore the host dependence, if
>> any, while
>> + /// accounting for the number of dependences.
>> + bool hasSingleDeviceDependence(bool DoNotConsiderHostActions = false)
>> const;
>> +
>> + /// Return the single device dependence of this action. This function
>> is only
>> + /// expected to be called if a single device dependence exists. If \a
>> + /// DoNotConsiderHostActions is set, a host dependence is allowed.
>> + Action *
>> + getSingleDeviceDependence(bool DoNotConsiderHostActions = false) const;
>>
>> - static bool classof(const Action *A) { return A->getKind() ==
>> CudaHostClass; }
>> + static bool classof(const Action *A) { return A->getKind() ==
>> OffloadClass; }
>> };
>>
>> class JobAction : public Action {
>>
>> Modified: cfe/trunk/include/clang/Driver/Compilation.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Compilation.h?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Driver/Compilation.h (original)
>> +++ cfe/trunk/include/clang/Driver/Compilation.h Fri Jul 15 18:13:27 2016
>> @@ -98,12 +98,7 @@ public:
>> const Driver &getDriver() const { return TheDriver; }
>>
>> const ToolChain &getDefaultToolChain() const { return
>> DefaultToolChain; }
>> - const ToolChain *getOffloadingHostToolChain() const {
>> - auto It = OrderedOffloadingToolchains.find(Action::OFK_Host);
>> - if (It != OrderedOffloadingToolchains.end())
>> - return It->second;
>> - return nullptr;
>> - }
>> +
>> unsigned isOffloadingHostKind(Action::OffloadKind Kind) const {
>> return ActiveOffloadMask & Kind;
>> }
>> @@ -121,8 +116,8 @@ public:
>> return OrderedOffloadingToolchains.equal_range(Kind);
>> }
>>
>> - // Return an offload toolchain of the provided kind. Only one is
>> expected to
>> - // exist.
>> + /// Return an offload toolchain of the provided kind. Only one is
>> expected to
>> + /// exist.
>> template <Action::OffloadKind Kind>
>> const ToolChain *getSingleOffloadToolChain() const {
>> auto TCs = getOffloadToolChains<Kind>();
>>
>> Modified: cfe/trunk/include/clang/Driver/Driver.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Driver.h?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Driver/Driver.h (original)
>> +++ cfe/trunk/include/clang/Driver/Driver.h Fri Jul 15 18:13:27 2016
>> @@ -394,12 +394,13 @@ public:
>> /// BuildJobsForAction - Construct the jobs to perform for the action
>> \p A and
>> /// return an InputInfo for the result of running \p A. Will only
>> construct
>> /// jobs for a given (Action, ToolChain, BoundArch) tuple once.
>> - InputInfo BuildJobsForAction(Compilation &C, const Action *A,
>> - const ToolChain *TC, const char
>> *BoundArch,
>> - bool AtTopLevel, bool MultipleArchs,
>> - const char *LinkingOutput,
>> - std::map<std::pair<const Action *,
>> std::string>,
>> - InputInfo> &CachedResults) const;
>> + InputInfo
>> + BuildJobsForAction(Compilation &C, const Action *A, const ToolChain
>> *TC,
>> + const char *BoundArch, bool AtTopLevel, bool
>> MultipleArchs,
>> + const char *LinkingOutput,
>> + std::map<std::pair<const Action *, std::string>,
>> InputInfo>
>> + &CachedResults,
>> + bool BuildForOffloadDevice) const;
>>
>> /// Returns the default name for linked images (e.g., "a.out").
>> const char *getDefaultImageName() const;
>> @@ -415,12 +416,11 @@ public:
>> /// \param BoundArch - The bound architecture.
>> /// \param AtTopLevel - Whether this is a "top-level" action.
>> /// \param MultipleArchs - Whether multiple -arch options were
>> supplied.
>> - const char *GetNamedOutputPath(Compilation &C,
>> - const JobAction &JA,
>> - const char *BaseInput,
>> - const char *BoundArch,
>> - bool AtTopLevel,
>> - bool MultipleArchs) const;
>> + /// \param NormalizedTriple - The normalized triple of the relevant
>> target.
>> + const char *GetNamedOutputPath(Compilation &C, const JobAction &JA,
>> + const char *BaseInput, const char
>> *BoundArch,
>> + bool AtTopLevel, bool MultipleArchs,
>> + StringRef NormalizedTriple) const;
>>
>> /// GetTemporaryPath - Return the pathname of a temporary file to use
>> /// as part of compilation; the file will have the given prefix and
>> suffix.
>> @@ -467,7 +467,8 @@ private:
>> const char *BoundArch, bool AtTopLevel, bool MultipleArchs,
>> const char *LinkingOutput,
>> std::map<std::pair<const Action *, std::string>, InputInfo>
>> - &CachedResults) const;
>> + &CachedResults,
>> + bool BuildForOffloadDevice) const;
>>
>> public:
>> /// GetReleaseVersion - Parse (([0-9]+)(.([0-9]+)(.([0-9]+)?))?)? and
>>
>> Modified: cfe/trunk/lib/Driver/Action.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Action.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Driver/Action.cpp (original)
>> +++ cfe/trunk/lib/Driver/Action.cpp Fri Jul 15 18:13:27 2016
>> @@ -8,6 +8,7 @@
>>
>> //===----------------------------------------------------------------------===//
>>
>> #include "clang/Driver/Action.h"
>> +#include "clang/Driver/ToolChain.h"
>> #include "llvm/ADT/StringSwitch.h"
>> #include "llvm/Support/ErrorHandling.h"
>> #include "llvm/Support/Regex.h"
>> @@ -21,8 +22,8 @@ const char *Action::getClassName(ActionC
>> switch (AC) {
>> case InputClass: return "input";
>> case BindArchClass: return "bind-arch";
>> - case CudaDeviceClass: return "cuda-device";
>> - case CudaHostClass: return "cuda-host";
>> + case OffloadClass:
>> + return "offload";
>> case PreprocessJobClass: return "preprocessor";
>> case PrecompileJobClass: return "precompiler";
>> case AnalyzeJobClass: return "analyzer";
>> @@ -40,6 +41,82 @@ const char *Action::getClassName(ActionC
>> llvm_unreachable("invalid class");
>> }
>>
>> +void Action::propagateDeviceOffloadInfo(OffloadKind OKind, const char
>> *OArch) {
>> + // Offload action set its own kinds on their dependences.
>> + if (Kind == OffloadClass)
>> + return;
>> +
>> + assert((OffloadingDeviceKind == OKind || OffloadingDeviceKind ==
>> OFK_None) &&
>> + "Setting device kind to a different device??");
>> + assert(!ActiveOffloadKindMask && "Setting a device kind in a host
>> action??");
>> + OffloadingDeviceKind = OKind;
>> + OffloadingArch = OArch;
>> +
>> + for (auto *A : Inputs)
>> + A->propagateDeviceOffloadInfo(OffloadingDeviceKind, OArch);
>> +}
>> +
>> +void Action::propagateHostOffloadInfo(unsigned OKinds, const char
>> *OArch) {
>> + // Offload action set its own kinds on their dependences.
>> + if (Kind == OffloadClass)
>> + return;
>> +
>> + assert(OffloadingDeviceKind == OFK_None &&
>> + "Setting a host kind in a device action.");
>> + ActiveOffloadKindMask |= OKinds;
>> + OffloadingArch = OArch;
>> +
>> + for (auto *A : Inputs)
>> + A->propagateHostOffloadInfo(ActiveOffloadKindMask, OArch);
>> +}
>> +
>> +void Action::propagateOffloadInfo(const Action *A) {
>> + if (unsigned HK = A->getOffloadingHostActiveKinds())
>> + propagateHostOffloadInfo(HK, A->getOffloadingArch());
>> + else
>> + propagateDeviceOffloadInfo(A->getOffloadingDeviceKind(),
>> + A->getOffloadingArch());
>> +}
>> +
>> +std::string Action::getOffloadingKindPrefix() const {
>> + switch (OffloadingDeviceKind) {
>> + case OFK_None:
>> + break;
>> + case OFK_Host:
>> + llvm_unreachable("Host kind is not an offloading device kind.");
>> + break;
>> + case OFK_Cuda:
>> + return "device-cuda";
>> +
>> + // TODO: Add other programming models here.
>> + }
>> +
>> + if (!ActiveOffloadKindMask)
>> + return "";
>> +
>> + std::string Res("host");
>> + if (ActiveOffloadKindMask & OFK_Cuda)
>> + Res += "-cuda";
>> +
>> + // TODO: Add other programming models here.
>> +
>> + return Res;
>> +}
>> +
>> +std::string
>> +Action::getOffloadingFileNamePrefix(StringRef NormalizedTriple) const {
>> + // A file prefix is only generated for device actions and consists of
>> the
>> + // offload kind and triple.
>> + if (!OffloadingDeviceKind)
>> + return "";
>> +
>> + std::string Res("-");
>> + Res += getOffloadingKindPrefix();
>> + Res += "-";
>> + Res += NormalizedTriple;
>> + return Res;
>> +}
>> +
>> void InputAction::anchor() {}
>>
>> InputAction::InputAction(const Arg &_Input, types::ID _Type)
>> @@ -51,16 +128,138 @@ void BindArchAction::anchor() {}
>> BindArchAction::BindArchAction(Action *Input, const char *_ArchName)
>> : Action(BindArchClass, Input), ArchName(_ArchName) {}
>>
>> -void CudaDeviceAction::anchor() {}
>> +void OffloadAction::anchor() {}
>> +
>> +OffloadAction::OffloadAction(const HostDependence &HDep)
>> + : Action(OffloadClass, HDep.getAction()),
>> HostTC(HDep.getToolChain()) {
>> + OffloadingArch = HDep.getBoundArch();
>> + ActiveOffloadKindMask = HDep.getOffloadKinds();
>> + HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>> + HDep.getBoundArch());
>> +};
>> +
>> +OffloadAction::OffloadAction(const DeviceDependences &DDeps, types::ID
>> Ty)
>> + : Action(OffloadClass, DDeps.getActions(), Ty),
>> + DevToolChains(DDeps.getToolChains()) {
>> + auto &OKinds = DDeps.getOffloadKinds();
>> + auto &BArchs = DDeps.getBoundArchs();
>> +
>> + // If all inputs agree on the same kind, use it also for this action.
>> + if (llvm::all_of(OKinds, [&](OffloadKind K) { return K ==
>> OKinds.front(); }))
>> + OffloadingDeviceKind = OKinds.front();
>> +
>> + // If we have a single dependency, inherit the architecture from it.
>> + if (OKinds.size() == 1)
>> + OffloadingArch = BArchs.front();
>> +
>> + // Propagate info to the dependencies.
>> + for (unsigned i = 0, e = getInputs().size(); i != e; ++i)
>> + getInputs()[i]->propagateDeviceOffloadInfo(OKinds[i], BArchs[i]);
>> +}
>> +
>> +OffloadAction::OffloadAction(const HostDependence &HDep,
>> + const DeviceDependences &DDeps)
>> + : Action(OffloadClass, HDep.getAction()),
>> HostTC(HDep.getToolChain()),
>> + DevToolChains(DDeps.getToolChains()) {
>> + // We use the kinds of the host dependence for this action.
>> + OffloadingArch = HDep.getBoundArch();
>> + ActiveOffloadKindMask = HDep.getOffloadKinds();
>> + HDep.getAction()->propagateHostOffloadInfo(HDep.getOffloadKinds(),
>> + HDep.getBoundArch());
>> +
>> + // Add device inputs and propagate info to the device actions. Do work
>> only if
>> + // we have dependencies.
>> + for (unsigned i = 0, e = DDeps.getActions().size(); i != e; ++i)
>> + if (auto *A = DDeps.getActions()[i]) {
>> + getInputs().push_back(A);
>> + A->propagateDeviceOffloadInfo(DDeps.getOffloadKinds()[i],
>> + DDeps.getBoundArchs()[i]);
>> + }
>> +}
>> +
>> +void OffloadAction::doOnHostDependence(const OffloadActionWorkTy &Work)
>> const {
>> + if (!HostTC)
>> + return;
>> + assert(!getInputs().empty() && "No dependencies for offload action??");
>> + auto *A = getInputs().front();
>> + Work(A, HostTC, A->getOffloadingArch());
>> +}
>>
>> -CudaDeviceAction::CudaDeviceAction(Action *Input, clang::CudaArch Arch,
>> - bool AtTopLevel)
>> - : Action(CudaDeviceClass, Input), GpuArch(Arch),
>> AtTopLevel(AtTopLevel) {}
>> +void OffloadAction::doOnEachDeviceDependence(
>> + const OffloadActionWorkTy &Work) const {
>> + auto I = getInputs().begin();
>> + auto E = getInputs().end();
>> + if (I == E)
>> + return;
>> +
>> + // We expect to have the same number of input dependences and device
>> tool
>> + // chains, except if we also have a host dependence. In that case we
>> have one
>> + // more dependence than we have device tool chains.
>> + assert(getInputs().size() == DevToolChains.size() + (HostTC ? 1 : 0) &&
>> + "Sizes of action dependences and toolchains are not
>> consistent!");
>> +
>> + // Skip host action
>> + if (HostTC)
>> + ++I;
>> +
>> + auto TI = DevToolChains.begin();
>> + for (; I != E; ++I, ++TI)
>> + Work(*I, *TI, (*I)->getOffloadingArch());
>> +}
>> +
>> +void OffloadAction::doOnEachDependence(const OffloadActionWorkTy &Work)
>> const {
>> + doOnHostDependence(Work);
>> + doOnEachDeviceDependence(Work);
>> +}
>> +
>> +void OffloadAction::doOnEachDependence(bool IsHostDependence,
>> + const OffloadActionWorkTy &Work)
>> const {
>> + if (IsHostDependence)
>> + doOnHostDependence(Work);
>> + else
>> + doOnEachDeviceDependence(Work);
>> +}
>>
>> -void CudaHostAction::anchor() {}
>> +bool OffloadAction::hasHostDependence() const { return HostTC !=
>> nullptr; }
>>
>> -CudaHostAction::CudaHostAction(Action *Input, const ActionList
>> &DeviceActions)
>> - : Action(CudaHostClass, Input), DeviceActions(DeviceActions) {}
>> +Action *OffloadAction::getHostDependence() const {
>> + assert(hasHostDependence() && "Host dependence does not exist!");
>> + assert(!getInputs().empty() && "No dependencies for offload action??");
>> + return HostTC ? getInputs().front() : nullptr;
>> +}
>> +
>> +bool OffloadAction::hasSingleDeviceDependence(
>> + bool DoNotConsiderHostActions) const {
>> + if (DoNotConsiderHostActions)
>> + return getInputs().size() == (HostTC ? 2 : 1);
>> + return !HostTC && getInputs().size() == 1;
>> +}
>> +
>> +Action *
>> +OffloadAction::getSingleDeviceDependence(bool DoNotConsiderHostActions)
>> const {
>> + assert(hasSingleDeviceDependence(DoNotConsiderHostActions) &&
>> + "Single device dependence does not exist!");
>> + // The previous assert ensures the number of entries in getInputs() is
>> + // consistent with what we are doing here.
>> + return HostTC ? getInputs()[1] : getInputs().front();
>> +}
>> +
>> +void OffloadAction::DeviceDependences::add(Action &A, const ToolChain
>> &TC,
>> + const char *BoundArch,
>> + OffloadKind OKind) {
>> + DeviceActions.push_back(&A);
>> + DeviceToolChains.push_back(&TC);
>> + DeviceBoundArchs.push_back(BoundArch);
>> + DeviceOffloadKinds.push_back(OKind);
>> +}
>> +
>> +OffloadAction::HostDependence::HostDependence(Action &A, const ToolChain
>> &TC,
>> + const char *BoundArch,
>> + const DeviceDependences
>> &DDeps)
>> + : HostAction(A), HostToolChain(TC), HostBoundArch(BoundArch) {
>> + for (auto K : DDeps.getOffloadKinds())
>> + HostOffloadKinds |= K;
>> +}
>>
>> void JobAction::anchor() {}
>>
>>
>> Modified: cfe/trunk/lib/Driver/Driver.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Driver.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Driver/Driver.cpp (original)
>> +++ cfe/trunk/lib/Driver/Driver.cpp Fri Jul 15 18:13:27 2016
>> @@ -435,7 +435,9 @@ void Driver::CreateOffloadingDeviceToolC
>> })) {
>> const ToolChain &TC = getToolChain(
>> C.getInputArgs(),
>> -
>> llvm::Triple(C.getOffloadingHostToolChain()->getTriple().isArch64Bit()
>> + llvm::Triple(C.getSingleOffloadToolChain<Action::OFK_Host>()
>> + ->getTriple()
>> + .isArch64Bit()
>> ? "nvptx64-nvidia-cuda"
>> : "nvptx-nvidia-cuda"));
>> C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
>> @@ -1022,19 +1024,33 @@ static unsigned PrintActions1(const Comp
>> } else if (BindArchAction *BIA = dyn_cast<BindArchAction>(A)) {
>> os << '"' << BIA->getArchName() << '"' << ", {"
>> << PrintActions1(C, *BIA->input_begin(), Ids) << "}";
>> - } else if (CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>> - CudaArch Arch = CDA->getGpuArch();
>> - if (Arch != CudaArch::UNKNOWN)
>> - os << "'" << CudaArchToString(Arch) << "', ";
>> - os << "{" << PrintActions1(C, *CDA->input_begin(), Ids) << "}";
>> + } else if (OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>> + bool IsFirst = true;
>> + OA->doOnEachDependence(
>> + [&](Action *A, const ToolChain *TC, const char *BoundArch) {
>> + // E.g. for two CUDA device dependences whose bound arch is
>> sm_20 and
>> + // sm_35 this will generate:
>> + // "cuda-device" (nvptx64-nvidia-cuda:sm_20) {#ID},
>> "cuda-device"
>> + // (nvptx64-nvidia-cuda:sm_35) {#ID}
>> + if (!IsFirst)
>> + os << ", ";
>> + os << '"';
>> + if (TC)
>> + os << A->getOffloadingKindPrefix();
>> + else
>> + os << "host";
>> + os << " (";
>> + os << TC->getTriple().normalize();
>> +
>> + if (BoundArch)
>> + os << ":" << BoundArch;
>> + os << ")";
>> + os << '"';
>> + os << " {" << PrintActions1(C, A, Ids) << "}";
>> + IsFirst = false;
>> + });
>> } else {
>> - const ActionList *AL;
>> - if (CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>> - os << "{" << PrintActions1(C, *CHA->input_begin(), Ids) << "}"
>> - << ", gpu binaries ";
>> - AL = &CHA->getDeviceActions();
>> - } else
>> - AL = &A->getInputs();
>> + const ActionList *AL = &A->getInputs();
>>
>> if (AL->size()) {
>> const char *Prefix = "{";
>> @@ -1047,10 +1063,24 @@ static unsigned PrintActions1(const Comp
>> os << "{}";
>> }
>>
>> + // Append offload info for all options other than the offloading action
>> + // itself (e.g. (cuda-device, sm_20) or (cuda-host)).
>> + std::string offload_str;
>> + llvm::raw_string_ostream offload_os(offload_str);
>> + if (!isa<OffloadAction>(A)) {
>> + auto S = A->getOffloadingKindPrefix();
>> + if (!S.empty()) {
>> + offload_os << ", (" << S;
>> + if (A->getOffloadingArch())
>> + offload_os << ", " << A->getOffloadingArch();
>> + offload_os << ")";
>> + }
>> + }
>> +
>> unsigned Id = Ids.size();
>> Ids[A] = Id;
>> llvm::errs() << Id << ": " << os.str() << ", "
>> - << types::getTypeName(A->getType()) << "\n";
>> + << types::getTypeName(A->getType()) << offload_os.str()
>> << "\n";
>>
>> return Id;
>> }
>> @@ -1378,8 +1408,12 @@ static Action *buildCudaActions(Compilat
>> PartialCompilationArg &&
>>
>> PartialCompilationArg->getOption().matches(options::OPT_cuda_device_only);
>>
>> - if (CompileHostOnly)
>> - return C.MakeAction<CudaHostAction>(HostAction, ActionList());
>> + if (CompileHostOnly) {
>> + OffloadAction::HostDependence HDep(
>> + *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>> + /*BoundArch=*/nullptr, Action::OFK_Cuda);
>> + return C.MakeAction<OffloadAction>(HDep);
>> + }
>>
>> // Collect all cuda_gpu_arch parameters, removing duplicates.
>> SmallVector<CudaArch, 4> GpuArchList;
>> @@ -1408,8 +1442,6 @@ static Action *buildCudaActions(Compilat
>> CudaDeviceInputs.push_back(std::make_pair(types::TY_CUDA_DEVICE,
>> InputArg));
>>
>> // Build actions for all device inputs.
>> - assert(C.getSingleOffloadToolChain<Action::OFK_Cuda>() &&
>> - "Missing toolchain for device-side compilation.");
>> ActionList CudaDeviceActions;
>> C.getDriver().BuildActions(C, Args, CudaDeviceInputs,
>> CudaDeviceActions);
>> assert(GpuArchList.size() == CudaDeviceActions.size() &&
>> @@ -1421,6 +1453,8 @@ static Action *buildCudaActions(Compilat
>> return a->getKind() != Action::AssembleJobClass;
>> });
>>
>> + const ToolChain *CudaTC =
>> C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>> +
>> // Figure out what to do with device actions -- pass them as inputs to
>> the
>> // host action or run each of them independently.
>> if (PartialCompilation || CompileDeviceOnly) {
>> @@ -1436,10 +1470,13 @@ static Action *buildCudaActions(Compilat
>> return nullptr;
>> }
>>
>> - for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
>> -
>> Actions.push_back(C.MakeAction<CudaDeviceAction>(CudaDeviceActions[I],
>> - GpuArchList[I],
>> - /* AtTopLevel */
>> true));
>> + for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
>> + OffloadAction::DeviceDependences DDep;
>> + DDep.add(*CudaDeviceActions[I], *CudaTC,
>> CudaArchToString(GpuArchList[I]),
>> + Action::OFK_Cuda);
>> + Actions.push_back(
>> + C.MakeAction<OffloadAction>(DDep,
>> CudaDeviceActions[I]->getType()));
>> + }
>> // Kill host action in case of device-only compilation.
>> if (CompileDeviceOnly)
>> return nullptr;
>> @@ -1459,19 +1496,23 @@ static Action *buildCudaActions(Compilat
>> Action* BackendAction = AssembleAction->getInputs()[0];
>> assert(BackendAction->getType() == types::TY_PP_Asm);
>>
>> - for (const auto& A : {AssembleAction, BackendAction}) {
>> - DeviceActions.push_back(C.MakeAction<CudaDeviceAction>(
>> - A, GpuArchList[I], /* AtTopLevel */ false));
>> + for (auto &A : {AssembleAction, BackendAction}) {
>> + OffloadAction::DeviceDependences DDep;
>> + DDep.add(*A, *CudaTC, CudaArchToString(GpuArchList[I]),
>> Action::OFK_Cuda);
>> + DeviceActions.push_back(C.MakeAction<OffloadAction>(DDep,
>> A->getType()));
>> }
>> }
>> - auto FatbinAction = C.MakeAction<CudaDeviceAction>(
>> - C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN),
>> - CudaArch::UNKNOWN,
>> - /* AtTopLevel = */ false);
>> + auto FatbinAction =
>> + C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN);
>> +
>> // Return a new host action that incorporates original host action and
>> all
>> // device actions.
>> - return C.MakeAction<CudaHostAction>(std::move(HostAction),
>> - ActionList({FatbinAction}));
>> + OffloadAction::HostDependence HDep(
>> + *HostAction, *C.getSingleOffloadToolChain<Action::OFK_Host>(),
>> + /*BoundArch=*/nullptr, Action::OFK_Cuda);
>> + OffloadAction::DeviceDependences DDep;
>> + DDep.add(*FatbinAction, *CudaTC, /*BoundArch=*/nullptr,
>> Action::OFK_Cuda);
>> + return C.MakeAction<OffloadAction>(HDep, DDep);
>> }
>>
>> void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
>> @@ -1580,6 +1621,9 @@ void Driver::BuildActions(Compilation &C
>> YcArg = YuArg = nullptr;
>> }
>>
>> + // Track the host offload kinds used on this compilation.
>> + unsigned CompilationActiveOffloadHostKinds = 0u;
>> +
>> // Construct the actions to perform.
>> ActionList LinkerInputs;
>>
>> @@ -1648,6 +1692,9 @@ void Driver::BuildActions(Compilation &C
>> ? phases::Compile
>> : FinalPhase;
>>
>> + // Track the host offload kinds used on this input.
>> + unsigned InputActiveOffloadHostKinds = 0u;
>> +
>> // Build the pipeline for this file.
>> Action *Current = C.MakeAction<InputAction>(*InputArg, InputType);
>> for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e =
>> PL.end();
>> @@ -1679,21 +1726,36 @@ void Driver::BuildActions(Compilation &C
>> Current = buildCudaActions(C, Args, InputArg, Current, Actions);
>> if (!Current)
>> break;
>> +
>> + // We produced a CUDA action for this input, so the host has to
>> support
>> + // CUDA.
>> + InputActiveOffloadHostKinds |= Action::OFK_Cuda;
>> + CompilationActiveOffloadHostKinds |= Action::OFK_Cuda;
>> }
>>
>> if (Current->getType() == types::TY_Nothing)
>> break;
>> }
>>
>> - // If we ended with something, add to the output list.
>> - if (Current)
>> + // If we ended with something, add to the output list. Also,
>> propagate the
>> + // offload information to the top-level host action related with the
>> current
>> + // input.
>> + if (Current) {
>> + if (InputActiveOffloadHostKinds)
>> + Current->propagateHostOffloadInfo(InputActiveOffloadHostKinds,
>> + /*BoundArch=*/nullptr);
>> Actions.push_back(Current);
>> + }
>> }
>>
>> - // Add a link action if necessary.
>> - if (!LinkerInputs.empty())
>> + // Add a link action if necessary and propagate the offload
>> information for
>> + // the current compilation.
>> + if (!LinkerInputs.empty()) {
>> Actions.push_back(
>> C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image));
>> +
>> Actions.back()->propagateHostOffloadInfo(CompilationActiveOffloadHostKinds,
>> + /*BoundArch=*/nullptr);
>> + }
>>
>> // If we are linking, claim any options which are obviously only used
>> for
>> // compilation.
>> @@ -1829,7 +1891,8 @@ void Driver::BuildJobs(Compilation &C) c
>> /*BoundArch*/ nullptr,
>> /*AtTopLevel*/ true,
>> /*MultipleArchs*/ ArchNames.size() > 1,
>> - /*LinkingOutput*/ LinkingOutput, CachedResults);
>> + /*LinkingOutput*/ LinkingOutput, CachedResults,
>> + /*BuildForOffloadDevice*/ false);
>> }
>>
>> // If the user passed -Qunused-arguments or there were errors, don't
>> warn
>> @@ -1878,7 +1941,28 @@ void Driver::BuildJobs(Compilation &C) c
>> }
>> }
>> }
>> -
>> +/// Collapse an offloading action looking for a job of the given type.
>> The input
>> +/// action is changed to the input of the collapsed sequence. If we
>> effectively
>> +/// had a collapse return the corresponding offloading action, otherwise
>> return
>> +/// null.
>> +template <typename T>
>> +static OffloadAction *collapseOffloadingAction(Action *&CurAction) {
>> + if (!CurAction)
>> + return nullptr;
>> + if (auto *OA = dyn_cast<OffloadAction>(CurAction)) {
>> + if (OA->hasHostDependence())
>> + if (auto *HDep = dyn_cast<T>(OA->getHostDependence())) {
>> + CurAction = HDep;
>> + return OA;
>> + }
>> + if (OA->hasSingleDeviceDependence())
>> + if (auto *DDep = dyn_cast<T>(OA->getSingleDeviceDependence())) {
>> + CurAction = DDep;
>> + return OA;
>> + }
>> + }
>> + return nullptr;
>> +}
>> // Returns a Tool for a given JobAction. In case the action and its
>> // predecessors can be combined, updates Inputs with the inputs of the
>> // first combined action. If one of the collapsed actions is a
>> @@ -1888,34 +1972,39 @@ static const Tool *selectToolForJob(Comp
>> bool EmbedBitcode, const ToolChain
>> *TC,
>> const JobAction *JA,
>> const ActionList *&Inputs,
>> - const CudaHostAction *&CollapsedCHA)
>> {
>> + ActionList &CollapsedOffloadAction) {
>> const Tool *ToolForJob = nullptr;
>> - CollapsedCHA = nullptr;
>> + CollapsedOffloadAction.clear();
>>
>> // See if we should look for a compiler with an integrated assembler.
>> We match
>> // bottom up, so what we are actually looking for is an assembler job
>> with a
>> // compiler input.
>>
>> + // Look through offload actions between assembler and backend actions.
>> + Action *BackendJA = (isa<AssembleJobAction>(JA) && Inputs->size() == 1)
>> + ? *Inputs->begin()
>> + : nullptr;
>> + auto *BackendOA =
>> collapseOffloadingAction<BackendJobAction>(BackendJA);
>> +
>> if (TC->useIntegratedAs() && !SaveTemps &&
>> !C.getArgs().hasArg(options::OPT_via_file_asm) &&
>> !C.getArgs().hasArg(options::OPT__SLASH_FA) &&
>> - !C.getArgs().hasArg(options::OPT__SLASH_Fa) &&
>> - isa<AssembleJobAction>(JA) && Inputs->size() == 1 &&
>> - isa<BackendJobAction>(*Inputs->begin())) {
>> + !C.getArgs().hasArg(options::OPT__SLASH_Fa) && BackendJA &&
>> + isa<BackendJobAction>(BackendJA)) {
>> // A BackendJob is always preceded by a CompileJob, and without
>> -save-temps
>> // or -fembed-bitcode, they will always get combined together, so
>> instead of
>> // checking the backend tool, check if the tool for the CompileJob
>> has an
>> // integrated assembler. For -fembed-bitcode, CompileJob is still
>> used to
>> // look up tools for BackendJob, but they need to match before we
>> can split
>> // them.
>> - const ActionList *BackendInputs = &(*Inputs)[0]->getInputs();
>> - // Compile job may be wrapped in CudaHostAction, extract it if
>> - // that's the case and update CollapsedCHA if we combine phases.
>> - CudaHostAction *CHA =
>> dyn_cast<CudaHostAction>(*BackendInputs->begin());
>> - JobAction *CompileJA = cast<CompileJobAction>(
>> - CHA ? *CHA->input_begin() : *BackendInputs->begin());
>> - assert(CompileJA && "Backend job is not preceeded by compile job.");
>> - const Tool *Compiler = TC->SelectTool(*CompileJA);
>> +
>> + // Look through offload actions between backend and compile actions.
>> + Action *CompileJA = *BackendJA->getInputs().begin();
>> + auto *CompileOA =
>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>> +
>> + assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>> + "Backend job is not preceeded by compile job.");
>> + const Tool *Compiler =
>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>> if (!Compiler)
>> return nullptr;
>> // When using -fembed-bitcode, it is required to have the same tool
>> (clang)
>> @@ -1929,7 +2018,12 @@ static const Tool *selectToolForJob(Comp
>> if (Compiler->hasIntegratedAssembler()) {
>> Inputs = &CompileJA->getInputs();
>> ToolForJob = Compiler;
>> - CollapsedCHA = CHA;
>> + // Save the collapsed offload actions because they may still
>> contain
>> + // device actions.
>> + if (CompileOA)
>> + CollapsedOffloadAction.push_back(CompileOA);
>> + if (BackendOA)
>> + CollapsedOffloadAction.push_back(BackendOA);
>> }
>> }
>>
>> @@ -1939,20 +2033,23 @@ static const Tool *selectToolForJob(Comp
>> if (isa<BackendJobAction>(JA)) {
>> // Check if the compiler supports emitting LLVM IR.
>> assert(Inputs->size() == 1);
>> - // Compile job may be wrapped in CudaHostAction, extract it if
>> - // that's the case and update CollapsedCHA if we combine phases.
>> - CudaHostAction *CHA = dyn_cast<CudaHostAction>(*Inputs->begin());
>> - JobAction *CompileJA =
>> - cast<CompileJobAction>(CHA ? *CHA->input_begin() :
>> *Inputs->begin());
>> - assert(CompileJA && "Backend job is not preceeded by compile job.");
>> - const Tool *Compiler = TC->SelectTool(*CompileJA);
>> +
>> + // Look through offload actions between backend and compile actions.
>> + Action *CompileJA = *JA->getInputs().begin();
>> + auto *CompileOA =
>> collapseOffloadingAction<CompileJobAction>(CompileJA);
>> +
>> + assert(CompileJA && isa<CompileJobAction>(CompileJA) &&
>> + "Backend job is not preceeded by compile job.");
>> + const Tool *Compiler =
>> TC->SelectTool(*cast<CompileJobAction>(CompileJA));
>> if (!Compiler)
>> return nullptr;
>> if (!Compiler->canEmitIR() ||
>> (!SaveTemps && !EmbedBitcode)) {
>> Inputs = &CompileJA->getInputs();
>> ToolForJob = Compiler;
>> - CollapsedCHA = CHA;
>> +
>> + if (CompileOA)
>> + CollapsedOffloadAction.push_back(CompileOA);
>> }
>> }
>>
>> @@ -1963,12 +2060,21 @@ static const Tool *selectToolForJob(Comp
>> // See if we should use an integrated preprocessor. We do so when we
>> have
>> // exactly one input, since this is the only use case we care about
>> // (irrelevant since we don't support combine yet).
>> - if (Inputs->size() == 1 && isa<PreprocessJobAction>(*Inputs->begin())
>> &&
>> +
>> + // Look through offload actions after preprocessing.
>> + Action *PreprocessJA = (Inputs->size() == 1) ? *Inputs->begin() :
>> nullptr;
>> + auto *PreprocessOA =
>> + collapseOffloadingAction<PreprocessJobAction>(PreprocessJA);
>> +
>> + if (PreprocessJA && isa<PreprocessJobAction>(PreprocessJA) &&
>> !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
>> !C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps &&
>> !C.getArgs().hasArg(options::OPT_rewrite_objc) &&
>> - ToolForJob->hasIntegratedCPP())
>> - Inputs = &(*Inputs)[0]->getInputs();
>> + ToolForJob->hasIntegratedCPP()) {
>> + Inputs = &PreprocessJA->getInputs();
>> + if (PreprocessOA)
>> + CollapsedOffloadAction.push_back(PreprocessOA);
>> + }
>>
>> return ToolForJob;
>> }
>> @@ -1976,8 +2082,8 @@ static const Tool *selectToolForJob(Comp
>> InputInfo Driver::BuildJobsForAction(
>> Compilation &C, const Action *A, const ToolChain *TC, const char
>> *BoundArch,
>> bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>> - std::map<std::pair<const Action *, std::string>, InputInfo>
>> &CachedResults)
>> - const {
>> + std::map<std::pair<const Action *, std::string>, InputInfo>
>> &CachedResults,
>> + bool BuildForOffloadDevice) const {
>> // The bound arch is not necessarily represented in the toolchain's
>> triple --
>> // for example, armv7 and armv7s both map to the same triple -- so we
>> need
>> // both in our map.
>> @@ -1991,9 +2097,9 @@ InputInfo Driver::BuildJobsForAction(
>> if (CachedResult != CachedResults.end()) {
>> return CachedResult->second;
>> }
>> - InputInfo Result =
>> - BuildJobsForActionNoCache(C, A, TC, BoundArch, AtTopLevel,
>> MultipleArchs,
>> - LinkingOutput, CachedResults);
>> + InputInfo Result = BuildJobsForActionNoCache(
>> + C, A, TC, BoundArch, AtTopLevel, MultipleArchs, LinkingOutput,
>> + CachedResults, BuildForOffloadDevice);
>> CachedResults[ActionTC] = Result;
>> return Result;
>> }
>> @@ -2001,21 +2107,65 @@ InputInfo Driver::BuildJobsForAction(
>> InputInfo Driver::BuildJobsForActionNoCache(
>> Compilation &C, const Action *A, const ToolChain *TC, const char
>> *BoundArch,
>> bool AtTopLevel, bool MultipleArchs, const char *LinkingOutput,
>> - std::map<std::pair<const Action *, std::string>, InputInfo>
>> &CachedResults)
>> - const {
>> + std::map<std::pair<const Action *, std::string>, InputInfo>
>> &CachedResults,
>> + bool BuildForOffloadDevice) const {
>> llvm::PrettyStackTraceString CrashInfo("Building compilation jobs");
>>
>> - InputInfoList CudaDeviceInputInfos;
>> - if (const CudaHostAction *CHA = dyn_cast<CudaHostAction>(A)) {
>> - // Append outputs of device jobs to the input list.
>> - for (const Action *DA : CHA->getDeviceActions()) {
>> - CudaDeviceInputInfos.push_back(BuildJobsForAction(
>> - C, DA, TC, nullptr, AtTopLevel,
>> - /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>> - }
>> - // Override current action with a real host compile action and
>> continue
>> - // processing it.
>> - A = *CHA->input_begin();
>> + InputInfoList OffloadDependencesInputInfo;
>> + if (const OffloadAction *OA = dyn_cast<OffloadAction>(A)) {
>> + // The offload action is expected to be used in four different
>> situations.
>> + //
>> + // a) Set a toolchain/architecture/kind for a host action:
>> + // Host Action 1 -> OffloadAction -> Host Action 2
>> + //
>> + // b) Set a toolchain/architecture/kind for a device action;
>> + // Device Action 1 -> OffloadAction -> Device Action 2
>> + //
>> + // c) Specify a device dependences to a host action;
>> + // Device Action 1 _
>> + // \
>> + // Host Action 1 ---> OffloadAction -> Host Action 2
>> + //
>> + // d) Specify a host dependence to a device action.
>> + // Host Action 1 _
>> + // \
>> + // Device Action 1 ---> OffloadAction -> Device Action 2
>> + //
>> + // For a) and b), we just return the job generated for the
>> dependence. For
>> + // c) and d) we override the current action with the host/device
>> dependence
>> + // if the current toolchain is host/device and set the offload
>> dependences
>> + // info with the jobs obtained from the device/host dependence(s).
>> +
>> + // If there is a single device option, just generate the job for it.
>> + if (OA->hasSingleDeviceDependence()) {
>> + InputInfo DevA;
>> + OA->doOnEachDeviceDependence([&](Action *DepA, const ToolChain
>> *DepTC,
>> + const char *DepBoundArch) {
>> + DevA =
>> + BuildJobsForAction(C, DepA, DepTC, DepBoundArch, AtTopLevel,
>> + /*MultipleArchs*/ !!DepBoundArch,
>> LinkingOutput,
>> + CachedResults,
>> /*BuildForOffloadDevice=*/true);
>> + });
>> + return DevA;
>> + }
>> +
>> + // If 'Action 2' is host, we generate jobs for the device
>> dependences and
>> + // override the current action with the host dependence. Otherwise,
>> we
>> + // generate the host dependences and override the action with the
>> device
>> + // dependence. The dependences can't therefore be a top-level action.
>> + OA->doOnEachDependence(
>> + /*IsHostDependence=*/BuildForOffloadDevice,
>> + [&](Action *DepA, const ToolChain *DepTC, const char
>> *DepBoundArch) {
>> + OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>> + C, DepA, DepTC, DepBoundArch, /*AtTopLevel=*/false,
>> + /*MultipleArchs*/ !!DepBoundArch, LinkingOutput,
>> CachedResults,
>> + /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind()
>> !=
>> + Action::OFK_None));
>> + });
>> +
>> + A = BuildForOffloadDevice
>> + ?
>> OA->getSingleDeviceDependence(/*DoNotConsiderHostActions=*/true)
>> + : OA->getHostDependence();
>> }
>>
>> if (const InputAction *IA = dyn_cast<InputAction>(A)) {
>> @@ -2042,41 +2192,34 @@ InputInfo Driver::BuildJobsForActionNoCa
>> TC = &C.getDefaultToolChain();
>>
>> return BuildJobsForAction(C, *BAA->input_begin(), TC, ArchName,
>> AtTopLevel,
>> - MultipleArchs, LinkingOutput,
>> CachedResults);
>> + MultipleArchs, LinkingOutput,
>> CachedResults,
>> + BuildForOffloadDevice);
>> }
>>
>> - if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
>> - // Initial processing of CudaDeviceAction carries host params.
>> - // Call BuildJobsForAction() again, now with correct device
>> parameters.
>> - InputInfo II = BuildJobsForAction(
>> - C, *CDA->input_begin(),
>> C.getSingleOffloadToolChain<Action::OFK_Cuda>(),
>> - CudaArchToString(CDA->getGpuArch()), CDA->isAtTopLevel(),
>> - /*MultipleArchs=*/true, LinkingOutput, CachedResults);
>> - // Currently II's Action is *CDA->input_begin(). Set it to CDA
>> instead, so
>> - // that one can retrieve II's GPU arch.
>> - II.setAction(A);
>> - return II;
>> - }
>>
>> const ActionList *Inputs = &A->getInputs();
>>
>> const JobAction *JA = cast<JobAction>(A);
>> - const CudaHostAction *CollapsedCHA = nullptr;
>> + ActionList CollapsedOffloadActions;
>> +
>> const Tool *T =
>> selectToolForJob(C, isSaveTempsEnabled(), embedBitcodeEnabled(),
>> TC, JA,
>> - Inputs, CollapsedCHA);
>> + Inputs, CollapsedOffloadActions);
>> if (!T)
>> return InputInfo();
>>
>> - // If we've collapsed action list that contained CudaHostAction we
>> - // need to build jobs for device-side inputs it may have held.
>> - if (CollapsedCHA) {
>> - for (const Action *DA : CollapsedCHA->getDeviceActions()) {
>> - CudaDeviceInputInfos.push_back(BuildJobsForAction(
>> - C, DA, TC, "", AtTopLevel,
>> - /*MultipleArchs*/ false, LinkingOutput, CachedResults));
>> - }
>> - }
>> + // If we've collapsed action list that contained OffloadAction we
>> + // need to build jobs for host/device-side inputs it may have held.
>> + for (const auto *OA : CollapsedOffloadActions)
>> + cast<OffloadAction>(OA)->doOnEachDependence(
>> + /*IsHostDependence=*/BuildForOffloadDevice,
>> + [&](Action *DepA, const ToolChain *DepTC, const char
>> *DepBoundArch) {
>> + OffloadDependencesInputInfo.push_back(BuildJobsForAction(
>> + C, DepA, DepTC, DepBoundArch, AtTopLevel,
>> + /*MultipleArchs=*/!!DepBoundArch, LinkingOutput,
>> CachedResults,
>> + /*BuildForOffloadDevice=*/DepA->getOffloadingDeviceKind()
>> !=
>> + Action::OFK_None));
>> + });
>>
>> // Only use pipes when there is exactly one input.
>> InputInfoList InputInfos;
>> @@ -2086,9 +2229,9 @@ InputInfo Driver::BuildJobsForActionNoCa
>> // FIXME: Clean this up.
>> bool SubJobAtTopLevel =
>> AtTopLevel && (isa<DsymutilJobAction>(A) ||
>> isa<VerifyJobAction>(A));
>> - InputInfos.push_back(BuildJobsForAction(C, Input, TC, BoundArch,
>> - SubJobAtTopLevel,
>> MultipleArchs,
>> - LinkingOutput,
>> CachedResults));
>> + InputInfos.push_back(BuildJobsForAction(
>> + C, Input, TC, BoundArch, SubJobAtTopLevel, MultipleArchs,
>> LinkingOutput,
>> + CachedResults, BuildForOffloadDevice));
>> }
>>
>> // Always use the first input as the base input.
>> @@ -2099,9 +2242,10 @@ InputInfo Driver::BuildJobsForActionNoCa
>> if (JA->getType() == types::TY_dSYM)
>> BaseInput = InputInfos[0].getFilename();
>>
>> - // Append outputs of cuda device jobs to the input list
>> - if (CudaDeviceInputInfos.size())
>> - InputInfos.append(CudaDeviceInputInfos.begin(),
>> CudaDeviceInputInfos.end());
>> + // Append outputs of offload device jobs to the input list
>> + if (!OffloadDependencesInputInfo.empty())
>> + InputInfos.append(OffloadDependencesInputInfo.begin(),
>> + OffloadDependencesInputInfo.end());
>>
>> // Determine the place to write output to, if any.
>> InputInfo Result;
>> @@ -2109,7 +2253,8 @@ InputInfo Driver::BuildJobsForActionNoCa
>> Result = InputInfo(A, BaseInput);
>> else
>> Result = InputInfo(A, GetNamedOutputPath(C, *JA, BaseInput,
>> BoundArch,
>> - AtTopLevel, MultipleArchs),
>> + AtTopLevel, MultipleArchs,
>> +
>> TC->getTriple().normalize()),
>> BaseInput);
>>
>> if (CCCPrintBindings && !CCGenDiagnostics) {
>> @@ -2169,7 +2314,8 @@ static const char *MakeCLOutputFilename(
>> const char *Driver::GetNamedOutputPath(Compilation &C, const JobAction
>> &JA,
>> const char *BaseInput,
>> const char *BoundArch, bool
>> AtTopLevel,
>> - bool MultipleArchs) const {
>> + bool MultipleArchs,
>> + StringRef NormalizedTriple) const
>> {
>> llvm::PrettyStackTraceString CrashInfo("Computing output path");
>> // Output to a user requested destination?
>> if (AtTopLevel && !isa<DsymutilJobAction>(JA) &&
>> !isa<VerifyJobAction>(JA)) {
>> @@ -2255,6 +2401,7 @@ const char *Driver::GetNamedOutputPath(C
>> MakeCLOutputFilename(C.getArgs(), "", BaseName,
>> types::TY_Image);
>> } else if (MultipleArchs && BoundArch) {
>> SmallString<128> Output(getDefaultImageName());
>> + Output += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>> Output += "-";
>> Output.append(BoundArch);
>> NamedOutput = C.getArgs().MakeArgString(Output.c_str());
>> @@ -2271,6 +2418,7 @@ const char *Driver::GetNamedOutputPath(C
>> if (!types::appendSuffixForType(JA.getType()))
>> End = BaseName.rfind('.');
>> SmallString<128> Suffixed(BaseName.substr(0, End));
>> + Suffixed += JA.getOffloadingFileNamePrefix(NormalizedTriple);
>> if (MultipleArchs && BoundArch) {
>> Suffixed += "-";
>> Suffixed.append(BoundArch);
>>
>> Modified: cfe/trunk/lib/Driver/ToolChain.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChain.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Driver/ToolChain.cpp (original)
>> +++ cfe/trunk/lib/Driver/ToolChain.cpp Fri Jul 15 18:13:27 2016
>> @@ -248,8 +248,7 @@ Tool *ToolChain::getTool(Action::ActionC
>>
>> case Action::InputClass:
>> case Action::BindArchClass:
>> - case Action::CudaDeviceClass:
>> - case Action::CudaHostClass:
>> + case Action::OffloadClass:
>> case Action::LipoJobClass:
>> case Action::DsymutilJobClass:
>> case Action::VerifyDebugInfoJobClass:
>>
>> Modified: cfe/trunk/lib/Driver/Tools.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Driver/Tools.cpp (original)
>> +++ cfe/trunk/lib/Driver/Tools.cpp Fri Jul 15 18:13:27 2016
>> @@ -296,12 +296,45 @@ static bool forwardToGCC(const Option &O
>> !O.hasFlag(options::DriverOption) &&
>> !O.hasFlag(options::LinkerInput);
>> }
>>
>> +/// Add the C++ include args of other offloading toolchains. If this is
>> a host
>> +/// job, the device toolchains are added. If this is a device job, the
>> host
>> +/// toolchains will be added.
>> +static void addExtraOffloadCXXStdlibIncludeArgs(Compilation &C,
>> + const JobAction &JA,
>> + const ArgList &Args,
>> + ArgStringList &CmdArgs) {
>> +
>> + if (JA.isHostOffloading(Action::OFK_Cuda))
>> + C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>> + ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>> + else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>> + C.getSingleOffloadToolChain<Action::OFK_Host>()
>> + ->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>> +
>> + // TODO: Add support for other programming models here.
>> +}
>> +
>> +/// Add the include args that are specific of each offloading
>> programming model.
>> +static void addExtraOffloadSpecificIncludeArgs(Compilation &C,
>> + const JobAction &JA,
>> + const ArgList &Args,
>> + ArgStringList &CmdArgs) {
>> +
>> + if (JA.isHostOffloading(Action::OFK_Cuda))
>> + C.getSingleOffloadToolChain<Action::OFK_Host>()->AddCudaIncludeArgs(
>> + Args, CmdArgs);
>> + else if (JA.isDeviceOffloading(Action::OFK_Cuda))
>> + C.getSingleOffloadToolChain<Action::OFK_Cuda>()->AddCudaIncludeArgs(
>> + Args, CmdArgs);
>> +
>> + // TODO: Add support for other programming models here.
>> +}
>> +
>> void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,
>> const Driver &D, const ArgList &Args,
>> ArgStringList &CmdArgs,
>> const InputInfo &Output,
>> - const InputInfoList &Inputs,
>> - const ToolChain *AuxToolChain) const
>> {
>> + const InputInfoList &Inputs) const {
>> Arg *A;
>> const bool IsIAMCU = getToolChain().getTriple().isOSIAMCU();
>>
>> @@ -566,31 +599,27 @@ void Clang::AddPreprocessingOptions(Comp
>> // OBJCPLUS_INCLUDE_PATH - system includes enabled when compiling
>> ObjC++.
>> addDirectoryList(Args, CmdArgs, "-objcxx-isystem",
>> "OBJCPLUS_INCLUDE_PATH");
>>
>> - // Optional AuxToolChain indicates that we need to include headers
>> - // for more than one target. If that's the case, add include paths
>> - // from AuxToolChain right after include paths of the same kind for
>> - // the current target.
>> + // While adding the include arguments, we also attempt to retrieve the
>> + // arguments of related offloading toolchains or arguments that are
>> specific
>> + // of an offloading programming model.
>>
>> // Add C++ include arguments, if needed.
>> if (types::isCXX(Inputs[0].getType())) {
>> getToolChain().AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>> - if (AuxToolChain)
>> - AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>> + addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>> }
>>
>> // Add system include arguments for all targets but IAMCU.
>> if (!IsIAMCU) {
>> getToolChain().AddClangSystemIncludeArgs(Args, CmdArgs);
>> - if (AuxToolChain)
>> - AuxToolChain->AddClangCXXStdlibIncludeArgs(Args, CmdArgs);
>> + addExtraOffloadCXXStdlibIncludeArgs(C, JA, Args, CmdArgs);
>>
>
> This doesn't make much sense to me: we already added the C++ stdlib
> includes a few lines above for C++ compiles. Should this be adding the
> (non-C++) system include args instead?
>
>
>> } else {
>> // For IAMCU add special include arguments.
>> getToolChain().AddIAMCUIncludeArgs(Args, CmdArgs);
>> }
>>
>> - // Add CUDA include arguments, if needed.
>> - if (types::isCuda(Inputs[0].getType()))
>> - getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
>> + // Add offload include arguments, if needed.
>> + addExtraOffloadSpecificIncludeArgs(C, JA, Args, CmdArgs);
>> }
>>
>> // FIXME: Move to target hook.
>> @@ -3799,7 +3828,7 @@ void Clang::ConstructJob(Compilation &C,
>> // CUDA compilation may have multiple inputs (source file + results of
>> // device-side compilations). All other jobs are expected to have
>> exactly one
>> // input.
>> - bool IsCuda = types::isCuda(Input.getType());
>> + bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
>> assert((IsCuda || Inputs.size() == 1) && "Unable to handle multiple
>> inputs.");
>>
>> // C++ is not supported for IAMCU.
>> @@ -3815,21 +3844,21 @@ void Clang::ConstructJob(Compilation &C,
>> CmdArgs.push_back("-triple");
>> CmdArgs.push_back(Args.MakeArgString(TripleStr));
>>
>> - const ToolChain *AuxToolChain = nullptr;
>> if (IsCuda) {
>> - // FIXME: We need a (better) way to pass information about
>> - // particular compilation pass we're constructing here. For now we
>> - // can check which toolchain we're using and pick the other one to
>> - // extract the triple.
>> - if (&getToolChain() ==
>> C.getSingleOffloadToolChain<Action::OFK_Cuda>())
>> - AuxToolChain = C.getOffloadingHostToolChain();
>> - else if (&getToolChain() == C.getOffloadingHostToolChain())
>> - AuxToolChain = C.getSingleOffloadToolChain<Action::OFK_Cuda>();
>> - else
>> - llvm_unreachable("Can't figure out CUDA compilation mode.");
>> - assert(AuxToolChain != nullptr && "No aux toolchain.");
>> + // We have to pass the triple of the host if compiling for a CUDA
>> device and
>> + // vice-versa.
>> + StringRef NormalizedTriple;
>> + if (JA.isDeviceOffloading(Action::OFK_Cuda))
>> + NormalizedTriple = C.getSingleOffloadToolChain<Action::OFK_Host>()
>> + ->getTriple()
>> + .normalize();
>> + else
>> + NormalizedTriple = C.getSingleOffloadToolChain<Action::OFK_Cuda>()
>> + ->getTriple()
>> + .normalize();
>> +
>> CmdArgs.push_back("-aux-triple");
>> -
>> CmdArgs.push_back(Args.MakeArgString(AuxToolChain->getTriple().str()));
>> + CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
>> }
>>
>> if (Triple.isOSWindows() && (Triple.getArch() == llvm::Triple::arm ||
>> @@ -4718,8 +4747,7 @@ void Clang::ConstructJob(Compilation &C,
>> //
>> // FIXME: Support -fpreprocessed
>> if (types::getPreprocessedType(InputType) != types::TY_INVALID)
>> - AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs,
>> - AuxToolChain);
>> + AddPreprocessingOptions(C, JA, D, Args, CmdArgs, Output, Inputs);
>>
>> // Don't warn about "clang -c -DPIC -fPIC test.i" because libtool.m4
>> assumes
>> // that "The compiler can only warn and ignore the option if not
>> recognized".
>> @@ -11193,15 +11221,14 @@ void NVPTX::Assembler::ConstructJob(Comp
>> static_cast<const toolchains::CudaToolChain &>(getToolChain());
>> assert(TC.getTriple().isNVPTX() && "Wrong platform");
>>
>> - std::vector<std::string> gpu_archs =
>> - Args.getAllArgValues(options::OPT_march_EQ);
>> - assert(gpu_archs.size() == 1 && "Exactly one GPU Arch required for
>> ptxas.");
>> - const std::string& gpu_arch = gpu_archs[0];
>> + // Obtain architecture from the action.
>> + CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());
>> + assert(gpu_arch != CudaArch::UNKNOWN &&
>> + "Device action expected to have an architecture.");
>>
>> // Check that our installation's ptxas supports gpu_arch.
>> if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
>> - TC.cudaInstallation().CheckCudaVersionSupportsArch(
>> - StringToCudaArch(gpu_arch));
>> + TC.cudaInstallation().CheckCudaVersionSupportsArch(gpu_arch);
>> }
>>
>> ArgStringList CmdArgs;
>> @@ -11245,7 +11272,7 @@ void NVPTX::Assembler::ConstructJob(Comp
>> }
>>
>> CmdArgs.push_back("--gpu-name");
>> - CmdArgs.push_back(Args.MakeArgString(gpu_arch));
>> + CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
>> CmdArgs.push_back("--output-file");
>> CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>> for (const auto& II : Inputs)
>> @@ -11277,13 +11304,20 @@ void NVPTX::Linker::ConstructJob(Compila
>> CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));
>>
>> for (const auto& II : Inputs) {
>> - auto* A = cast<const CudaDeviceAction>(II.getAction());
>> + auto *A = II.getAction();
>> + assert(A->getInputs().size() == 1 &&
>> + "Device offload action is expected to have a single input");
>> + const char *gpu_arch_str = A->getOffloadingArch();
>> + assert(gpu_arch_str &&
>> + "Device action expected to have associated a GPU
>> architecture!");
>> + CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);
>> +
>> // We need to pass an Arch of the form "sm_XX" for cubin files and
>> // "compute_XX" for ptx.
>> const char *Arch =
>> (II.getType() == types::TY_PP_Asm)
>> - ?
>> CudaVirtualArchToString(VirtualArchForCudaArch(A->getGpuArch()))
>> - : CudaArchToString(A->getGpuArch());
>> + ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
>> + : gpu_arch_str;
>> CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=")
>> +
>> Arch + ",file=" +
>> II.getFilename()));
>> }
>>
>> Modified: cfe/trunk/lib/Driver/Tools.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Tools.h?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Driver/Tools.h (original)
>> +++ cfe/trunk/lib/Driver/Tools.h Fri Jul 15 18:13:27 2016
>> @@ -57,8 +57,7 @@ private:
>> const Driver &D, const llvm::opt::ArgList
>> &Args,
>> llvm::opt::ArgStringList &CmdArgs,
>> const InputInfo &Output,
>> - const InputInfoList &Inputs,
>> - const ToolChain *AuxToolChain) const;
>> + const InputInfoList &Inputs) const;
>>
>> void AddAArch64TargetArgs(const llvm::opt::ArgList &Args,
>> llvm::opt::ArgStringList &CmdArgs) const;
>>
>> Modified: cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp?rev=275645&r1=275644&r2=275645&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp (original)
>> +++ cfe/trunk/lib/Frontend/CreateInvocationFromCommandLine.cpp Fri Jul 15
>> 18:13:27 2016
>> @@ -60,25 +60,25 @@ clang::createInvocationFromCommandLine(A
>> }
>>
>> // We expect to get back exactly one command job, if we didn't
>> something
>> - // failed. CUDA compilation is an exception as it creates multiple
>> jobs. If
>> - // that's the case, we proceed with the first job. If caller needs
>> particular
>> - // CUDA job, it should be controlled via --cuda-{host|device}-only
>> option
>> - // passed to the driver.
>> + // failed. Offload compilation is an exception as it creates multiple
>> jobs. If
>> + // that's the case, we proceed with the first job. If caller needs a
>> + // particular job, it should be controlled via options (e.g.
>> + // --cuda-{host|device}-only for CUDA) passed to the driver.
>> const driver::JobList &Jobs = C->getJobs();
>> - bool CudaCompilation = false;
>> + bool OffloadCompilation = false;
>> if (Jobs.size() > 1) {
>> for (auto &A : C->getActions()){
>> // On MacOSX real actions may end up being wrapped in
>> BindArchAction
>> if (isa<driver::BindArchAction>(A))
>> A = *A->input_begin();
>> - if (isa<driver::CudaDeviceAction>(A)) {
>> - CudaCompilation = true;
>> + if (isa<driver::OffloadAction>(A)) {
>> + OffloadCompilation = true;
>> break;
>> }
>> }
>> }
>> if (Jobs.size() == 0 || !isa<driver::Command>(*Jobs.begin()) ||
>> - (Jobs.size() > 1 && !CudaCompilation)) {
>> + (Jobs.size() > 1 && !OffloadCompilation)) {
>> SmallString<256> Msg;
>> llvm::raw_svector_ostream OS(Msg);
>> Jobs.Print(OS, "; ", true);
>>
>> Added: cfe/trunk/test/Driver/cuda_phases.cu
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/cuda_phases.cu?rev=275645&view=auto
>>
>> ==============================================================================
>> --- cfe/trunk/test/Driver/cuda_phases.cu (added)
>> +++ cfe/trunk/test/Driver/cuda_phases.cu Fri Jul 15 18:13:27 2016
>> @@ -0,0 +1,206 @@
>> +// Tests the phases generated for a CUDA offloading target for different
>> +// combinations of:
>> +// - Number of gpu architectures;
>> +// - Host/device-only compilation;
>> +// - User-requested final phase - binary or assembly.
>> +
>> +// REQUIRES: clang-driver
>> +// REQUIRES: powerpc-registered-target
>> +// REQUIRES: nvptx-registered-target
>> +
>> +//
>> +// Test single gpu architecture with complete compilation.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s 2>&1 \
>> +// RUN: | FileCheck -check-prefix=BIN %s
>> +// BIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// BIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// BIN: 2: compiler, {1}, ir, (host-cuda)
>> +// BIN: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// BIN: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>> +// BIN: 5: compiler, {4}, ir, (device-cuda, sm_30)
>> +// BIN: 6: backend, {5}, assembler, (device-cuda, sm_30)
>> +// BIN: 7: assembler, {6}, object, (device-cuda, sm_30)
>> +// BIN: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
>> +// BIN: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>> assembler
>> +// BIN: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
>> +// BIN: 11: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>> "device-cuda (nvptx64-nvidia-cuda)" {10}, ir
>> +// BIN: 12: backend, {11}, assembler, (host-cuda)
>> +// BIN: 13: assembler, {12}, object, (host-cuda)
>> +// BIN: 14: linker, {13}, image, (host-cuda)
>> +
>> +//
>> +// Test single gpu architecture up to the assemble phase.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=ASM %s
>> +// ASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// ASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// ASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// ASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// ASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>> assembler
>> +// ASM: 5: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// ASM: 6: preprocessor, {5}, cuda-cpp-output, (host-cuda)
>> +// ASM: 7: compiler, {6}, ir, (host-cuda)
>> +// ASM: 8: backend, {7}, assembler, (host-cuda)
>> +
>> +//
>> +// Test two gpu architectures with complete compilation.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
>> +// RUN: | FileCheck -check-prefix=BIN2 %s
>> +// BIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// BIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// BIN2: 2: compiler, {1}, ir, (host-cuda)
>> +// BIN2: 3: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// BIN2: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
>> +// BIN2: 5: compiler, {4}, ir, (device-cuda, sm_30)
>> +// BIN2: 6: backend, {5}, assembler, (device-cuda, sm_30)
>> +// BIN2: 7: assembler, {6}, object, (device-cuda, sm_30)
>> +// BIN2: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7},
>> object
>> +// BIN2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6},
>> assembler
>> +// BIN2: 10: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>> +// BIN2: 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_35)
>> +// BIN2: 12: compiler, {11}, ir, (device-cuda, sm_35)
>> +// BIN2: 13: backend, {12}, assembler, (device-cuda, sm_35)
>> +// BIN2: 14: assembler, {13}, object, (device-cuda, sm_35)
>> +// BIN2: 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {14},
>> object
>> +// BIN2: 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {13},
>> assembler
>> +// BIN2: 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
>> +// BIN2: 18: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2},
>> "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
>> +// BIN2: 19: backend, {18}, assembler, (host-cuda)
>> +// BIN2: 20: assembler, {19}, object, (host-cuda)
>> +// BIN2: 21: linker, {20}, image, (host-cuda)
>> +
>> +//
>> +// Test two gpu architecturess up to the assemble phase.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=ASM2 %s
>> +// ASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// ASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// ASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// ASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// ASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>> assembler
>> +// ASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>> +// ASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>> +// ASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>> +// ASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>> +// ASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>> assembler
>> +// ASM2: 10: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// ASM2: 11: preprocessor, {10}, cuda-cpp-output, (host-cuda)
>> +// ASM2: 12: compiler, {11}, ir, (host-cuda)
>> +// ASM2: 13: backend, {12}, assembler, (host-cuda)
>> +
>> +//
>> +// Test single gpu architecture with complete compilation in host-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
>> +// RUN: | FileCheck -check-prefix=HBIN %s
>> +// HBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// HBIN: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// HBIN: 2: compiler, {1}, ir, (host-cuda)
>> +// HBIN: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>> +// HBIN: 4: backend, {3}, assembler, (host-cuda)
>> +// HBIN: 5: assembler, {4}, object, (host-cuda)
>> +// HBIN: 6: linker, {5}, image, (host-cuda)
>> +
>> +//
>> +// Test single gpu architecture up to the assemble phase in host-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=HASM %s
>> +// HASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// HASM: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// HASM: 2: compiler, {1}, ir, (host-cuda)
>> +// HASM: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>> +// HASM: 4: backend, {3}, assembler, (host-cuda)
>> +
>> +//
>> +// Test two gpu architectures with complete compilation in host-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
>> +// RUN: | FileCheck -check-prefix=HBIN2 %s
>> +// HBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// HBIN2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// HBIN2: 2: compiler, {1}, ir, (host-cuda)
>> +// HBIN2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>> +// HBIN2: 4: backend, {3}, assembler, (host-cuda)
>> +// HBIN2: 5: assembler, {4}, object, (host-cuda)
>> +// HBIN2: 6: linker, {5}, image, (host-cuda)
>> +
>> +//
>> +// Test two gpu architectures up to the assemble phase in host-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=HASM2 %s
>> +// HASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (host-cuda)
>> +// HASM2: 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
>> +// HASM2: 2: compiler, {1}, ir, (host-cuda)
>> +// HASM2: 3: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {2}, ir
>> +// HASM2: 4: backend, {3}, assembler, (host-cuda)
>> +
>> +//
>> +// Test single gpu architecture with complete compilation in device-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
>> +// RUN: | FileCheck -check-prefix=DBIN %s
>> +// DBIN: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// DBIN: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// DBIN: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// DBIN: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// DBIN: 4: assembler, {3}, object, (device-cuda, sm_30)
>> +// DBIN: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>> object
>> +
>> +//
>> +// Test single gpu architecture up to the assemble phase in device-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=DASM %s
>> +// DASM: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// DASM: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// DASM: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// DASM: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// DASM: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>> assembler
>> +
>> +//
>> +// Test two gpu architectures with complete compilation in device-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
>> +// RUN: | FileCheck -check-prefix=DBIN2 %s
>> +// DBIN2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// DBIN2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// DBIN2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// DBIN2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// DBIN2: 4: assembler, {3}, object, (device-cuda, sm_30)
>> +// DBIN2: 5: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {4},
>> object
>> +// DBIN2: 6: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>> +// DBIN2: 7: preprocessor, {6}, cuda-cpp-output, (device-cuda, sm_35)
>> +// DBIN2: 8: compiler, {7}, ir, (device-cuda, sm_35)
>> +// DBIN2: 9: backend, {8}, assembler, (device-cuda, sm_35)
>> +// DBIN2: 10: assembler, {9}, object, (device-cuda, sm_35)
>> +// DBIN2: 11: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {10},
>> object
>> +
>> +//
>> +// Test two gpu architectures up to the assemble phase in device-only
>> +// compilation mode.
>> +//
>> +// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases
>> --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
>> +// RUN: | FileCheck -check-prefix=DASM2 %s
>> +// DASM2: 0: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_30)
>> +// DASM2: 1: preprocessor, {0}, cuda-cpp-output, (device-cuda, sm_30)
>> +// DASM2: 2: compiler, {1}, ir, (device-cuda, sm_30)
>> +// DASM2: 3: backend, {2}, assembler, (device-cuda, sm_30)
>> +// DASM2: 4: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {3},
>> assembler
>> +// DASM2: 5: input, "{{.*}}cuda_phases.cu", cuda, (device-cuda, sm_35)
>> +// DASM2: 6: preprocessor, {5}, cuda-cpp-output, (device-cuda, sm_35)
>> +// DASM2: 7: compiler, {6}, ir, (device-cuda, sm_35)
>> +// DASM2: 8: backend, {7}, assembler, (device-cuda, sm_35)
>> +// DASM2: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {8},
>> assembler
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160718/5266b725/attachment-0001.html>
More information about the cfe-commits
mailing list