r242085 - [cuda] Driver changes to compile and stitch together host and device-side CUDA code.
Artem Belevich
tra at google.com
Thu Jul 16 11:31:25 PDT 2015
Justin,
Thank you for your comments. Please take a look at the patch that addresses
them:
http://reviews.llvm.org/D11273
On Mon, Jul 13, 2015 at 11:39 PM, Justin Bogner <mail at justinbogner.com>
wrote:
> Artem Belevich <tra at google.com> writes:
> > Author: tra
> > Date: Mon Jul 13 18:27:56 2015
> > New Revision: 242085
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=242085&view=rev
> > Log:
> > [cuda] Driver changes to compile and stitch together host and
> device-side CUDA code.
> >
> > NOTE: reverts r242077 to reinstate r242058, r242065, 242067
> > and includes fix for OS X test failures.
> >
> > - Changed driver pipeline to compile host and device side of CUDA
> > files and incorporate results of device-side compilation into host
> > object file.
> >
> > - Added a test for cuda pipeline creation in clang driver.
> >
> > New clang options:
> > --cuda-host-only - Do host-side compilation only.
> > --cuda-device-only - Do device-side compilation only.
> >
> > --cuda-gpu-arch=<ARCH> - specify GPU architecture for device-side
> > compilation. E.g. sm_35, sm_30. Default is sm_20. May be used more
> > than once in which case one device-compilation will be done per
> > unique specified GPU architecture.
> >
> > Differential Revision: http://reviews.llvm.org/D9509
> > +
> > +CudaHostAction::CudaHostAction(std::unique_ptr<Action> Input,
> > + const ActionList &_DeviceActions)
>
> Identifiers starting with an underscore than a capital are reserved.
> Please just call this DeviceActions.
>
>
Done.
> > + : Action(CudaHostClass, std::move(Input)),
> DeviceActions(_DeviceActions) {}
> > +
> > +CudaHostAction::~CudaHostAction() {
> > + for (iterator it = DeviceActions.begin(), ie = DeviceActions.end();
> it != ie;
>
> We usually name variables starting with a capital letter, so these
> should probably be `I` and `E` rather than `it` and `ie`. Might as well
> use auto here as well, since they're just iterators.
>
This applies in several places later in this patch too. Please clean
> them up.
>
Fixed here and in other places.
> > + // Check whether any of device actions stopped before they could
> generate PTX.
> > + bool PartialCompilation = false;
> > + bool DeviceOnlyCompilation =
> Args.hasArg(options::OPT_cuda_device_only);
>
> The ordering here is confusing - it looks like DeviceOnlyCompilation is
> related to the loop that sets PartialCompilation.
>
>
Moved DeviceOnlyCompilation down to where it's used.
> > + for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i)
> > + Actions.push_back(
> > + new
> CudaDeviceAction(std::unique_ptr<Action>(CudaDeviceActions[i]),
> > + GpuArchList[i], /* AtTopLevel */ true));
> > + // Kill host action in case of device-only compilation.
> > + if (DeviceOnlyCompilation)
> > + Current.reset(nullptr);
> > + return Current;
> > + } else {
>
> Since the `if` returns unconditionally, an early exit is better than an
> else.
>
>
Done.
> > + // Outputs of device actions during complete CUDA compilation get
> created
> > + // with AtTopLevel=false and become inputs for the host action.
> > + ActionList DeviceActions;
> > + for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i)
> > + DeviceActions.push_back(
> > + new
> CudaDeviceAction(std::unique_ptr<Action>(CudaDeviceActions[i]),
> > + GpuArchList[i], /* AtTopLevel */ false));
> > + // Return a new host action that incorporates original host action
> and all
> > + // device actions.
> > + return std::unique_ptr<Action>(
> > + new CudaHostAction(std::move(Current), DeviceActions));
> > + }
> > +}
> > +
> > void Driver::BuildActions(const ToolChain &TC, DerivedArgList &Args,
> > const InputList &Inputs, ActionList &Actions)
> const {
> > llvm::PrettyStackTraceString CrashInfo("Building compilation
> actions");
> > @@ -1312,6 +1412,25 @@ void Driver::BuildActions(const ToolChai
> > continue;
> > }
> >
> > + phases::ID CudaInjectionPhase;
> > + if (isSaveTempsEnabled()) {
> > + // All phases are done independently, inject GPU blobs during
> compilation
> > + // phase as that's where we generate glue code to init them.
> > + CudaInjectionPhase = phases::Compile;
> > + } else {
> > + // Assumes that clang does everything up until linking phase, so
> we inject
> > + // cuda device actions at the last step before linking. Otherwise
> CUDA
> > + // host action forces preprocessor into a separate invocation.
> > + if (FinalPhase == phases::Link) {
>
> This is `else if`. Nesting the if inside the else is confusing.
>
> > + for (auto i = PL.begin(), e = PL.end(); i != e; ++i) {
> > + auto next = i + 1;
> > + if (next != e && *next == phases::Link)
>
> What if i == e?
>
>
Then CudaInjectionPhase would be left uninitialized which is a bug. Fixed,
which also made moot 'else if' point you made above.
> > + CudaInjectionPhase = *i;
> > + }
> > + } else
> > + CudaInjectionPhase = FinalPhase;
> > + }
> > +
> > // Build the pipeline for this file.
> > std::unique_ptr<Action> Current(new InputAction(*InputArg,
> InputType));
> > for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e =
> PL.end();
> > @@ -1337,6 +1456,15 @@ void Driver::BuildActions(const ToolChai
> >
> > // Otherwise construct the appropriate action.
> > Current = ConstructPhaseAction(TC, Args, Phase,
> std::move(Current));
> > +
> > + if (InputType == types::TY_CUDA && Phase == CudaInjectionPhase &&
> > + !Args.hasArg(options::OPT_cuda_host_only)) {
> > + Current = buildCudaActions(*this, TC, Args, InputArg, InputType,
> > + std::move(Current), Actions);
> > + if (!Current)
> > + break;
> > + }
>
> This whole block of code seems out of place. In fact, why doesn't CUDA
> just have its own phases::ID? It seems like that would simplify all of
> this.
>
>
It does not quite fit the 'phase' concept. Phases implicitly assume that
things are done linearly. CUDA, on the other hand, effectively builds a
tree and the weird code above is a somewhat broken way to glue results of
device-side builds into just the right action of the host build pipeline,
so that results can be used to modify compiler arguments.
I've got a better way of dealing with that that I'll send as a separate
patch for review soon.
> +
> > if (Current->getType() == types::TY_Nothing)
> > break;
> > }
> > @@ -1576,7 +1704,13 @@ static const Tool *SelectToolForJob(Comp
> > if (isa<BackendJobAction>(JA)) {
> > // Check if the compiler supports emitting LLVM IR.
> > assert(Inputs->size() == 1);
> > - JobAction *CompileJA = cast<CompileJobAction>(*Inputs->begin());
> > + JobAction *CompileJA;
> > + // Extract real host action, if it's a CudaHostAction.
> > + if (CudaHostAction *CudaHA =
> dyn_cast<CudaHostAction>(*Inputs->begin()))
> > + CompileJA = cast<CompileJobAction>(*CudaHA->begin());
> > + else
> > + CompileJA = cast<CompileJobAction>(*Inputs->begin());
>
> This seems kind of awkward. Not sure I have a better suggestion though.
>
>
This will also be changes in the upcoming patch.
> > if (const InputAction *IA = dyn_cast<InputAction>(A)) {
> > // FIXME: It would be nice to not claim this here; maybe the old
> scheme of
> > // just using Args was better?
> > @@ -1635,11 +1783,24 @@ void Driver::BuildJobsForAction(Compilat
> > else
> > TC = &C.getDefaultToolChain();
> >
> > - BuildJobsForAction(C, *BAA->begin(), TC, BAA->getArchName(),
> AtTopLevel,
> > + BuildJobsForAction(C, *BAA->begin(), TC, ArchName, AtTopLevel,
>
> I guess this is just an unrelated cleanup? Better to commit those
> separately in the future.
>
>
Noted.
> > MultipleArchs, LinkingOutput, Result);
> > return;
> > }
> >
> > + if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) {
> > + // Figure out which NVPTX triple to use for device-side compilation
> based on
> > + // whether host is 64-bit.
> > + llvm::Triple
> DeviceTriple(C.getDefaultToolChain().getTriple().isArch64Bit()
> > + ? "nvptx64-nvidia-cuda"
> > + : "nvptx-nvidia-cuda");
>
> It seems like a pretty bad layering violation to need to know these
> particular triples right here...
>
>
Well, it's up to driver to figure out device-side triple. While I agree
that it's way too much target knowledge for the driver, I'm short on
practical ideas. Do you have any suggestions?
> Is getDefaultToolChain() even the right thing to do here? Isn't `TC`
> more appropriate?
>
Good point. TC indeed should do.
--
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150716/f52b93b1/attachment.html>
More information about the cfe-commits
mailing list