[PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

Thu Jan 28 13:24:46 PST 2016

jlebar added a comment.

In http://reviews.llvm.org/D16514#338631, @echristo wrote:

> In general it feels like keeping 2 errors might make the most sense:
>
> #if _NOT_ARCH4_
>  #error "aiee!"
>  #endif
>
> clang -arch arch1 -arch arch2 -arch arch3 -arch arch4 t.c
>
> seems like it might be nice to get 3 errors here rather than a single one and fixing that single one, then getting another one, etc. or realizing what the error is here.


Yes, this patch makes that case worse.

But I suspect errors that apply to some but not all archs will be far less common than errors that apply to all arches -- regular C++ errors like missing a semicolon or whatever.  It feels pretty overwhelming to output N copies of every error in those cases, especially when you consider multipage template errors.

In addition, iirc there's no separation between errors outputted for different archs, so it really looks like we're just outputting multiple copies of the errors for fun.

> I don't feel strongly about this, but I'm still uncertain as to why we want to make things more complicated here :)


The other reason, which is less important, is that when you have one arch and ptxas fails -- which, it shouldn't, but we're not good enough to catch everything yet, and likely won't be for some time -- the error you get is

  ptxas: foo is not defined
  *FATAL ERROR*: fatbinary failed, /tmp/bar.cubin does not exist.

I'd like not to display that second line, since it hides the actual problem.  Once you get used to it, it's not a big deal, but it tripped me up for a few minutes, and I'm the one who added the call to ptxas.


================
Comment at: lib/Driver/Driver.cpp:652
@@ -640,3 +651,3 @@
   SmallVector<std::pair<int, const Command *>, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ false, FailingCommands);
 
----------------
tra wrote:
> As far as I can tell, we don't do anything interesting if we've detected that *any* of the commands have failed. That suggests that doing anything beyond the first failing command does not do us any good. That would suggest that we may really want StopOnFailure=true here.
> 
> 'false' would preserve current behavior, though.
> 
> In either case I'm OK with a constant here.
Sorry, I think I'm misunderstanding something.  Would you mind rephrasing this?

> As far as I can tell, we don't do anything interesting if we've detected that *any* of the commands have failed.  That suggests that doing anything beyond the first failing command does not do us any good.

The scenario I thought this change applied to was:

External tool crashes during a call to ExecuteJobs() (not this one).  We now want to output preprocessed inputs, so we run this code, which again calls ExecuteJobs(), but these jobs only run the preprocessor on the inputs.

Now suppose one of those preprocessor jobs fails.  Maybe it has a bad preprocessor directive, or maybe #error would be enough.  It seems to me in this case that we should continue running the other preprocessor jobs, so we dump as much debug info as we can.

Note that if the StopOnFailure flag is false, afaict it's entirely possible for us to have two inputs, one of which has a pp error and the other of which causes a compiler crash -- if we stopped on failure here, we wouldn't output anything for the second input, which is the one we're interested in.

Sorry again, I'm sure I'm missing something.


http://reviews.llvm.org/D16514