[PATCH] D68587: [hip] Assume host-only compilation if the final phase is ahead of `backend`.

Tue Oct 8 11:53:58 PDT 2019

tra added a comment.

TL; DR; 
+1 to formalizing how we want -M*/-E/-S/-emit-llvm/-fsyntax-only to behave. 
OK with -M/-E/-S defaulting to host, and erroring out if applied to multiple sub-compilations.
I'm still convinced that the tooling issue with multiple subcompilations is orthogonal to this change and should be handled in libclang and that -fsyntax-only should not default to one sub-compilation.

See the details below.

In D68587#1698777 <https://reviews.llvm.org/D68587#1698777>, @hliao wrote:

> In D68587#1698247 <https://reviews.llvm.org/D68587#1698247>, @tra wrote:
>
> > In D68587#1698102 <https://reviews.llvm.org/D68587#1698102>, @hliao wrote:
> >
> > > for most compilation tools, single input and single output are expected. Without assuming `-fsyntax-only` alone is host-compilation only, that at least run syntax checking twice.
> >
> >
> > I believe the driver will not run subsequent jobs if one of the device compilations fails. You may see duplicate warnings from multiple stages, but overall the error handling works in a fairly predictable way now.
>
>
> It still runs and gives the same error (if that error is applicable to both sides) at least twice if you just specify `-fsyntax-only` or `-E`. That won't happen for regular compilation option (`-c`) due to the additional device dependencies added.

Are you talking about the behavior with or without  this patch?

Without the patch I do see the driver stopping as soon as one of the subcompilations errors out and it needs *all* subcompilations to succeed in order for the whole invocation to succeed.

E.g.: err.cu:

  common error;

  #if __CUDA_ARCH__
  __device__ void foo() {
    device error;
  }
  #else
  __host__ void bar() {
    host error;
  }
  #endif

  $ bin/clang --cuda-path=$HOME/local/cuda-10.0 --cuda-gpu-arch=sm_30 -fsyntax-only err.cu
  err.cu:1:1: error: unknown type name 'common'
  common error;
  ^
  err.cu:9:3: error: unknown type name 'host'
    host error;
    ^
  2 errors generated when compiling for host.

If I comment out the host and common errors, then clang finishes host compilation, moves on to device-side compilation and reports the error there:

  err.cu:5:3: error: unknown type name 'device'; did you mean 'CUdevice'?
    device error;
    ^~~~~~
    CUdevice
  /usr/local/google/home/tra/local/cuda-10.0/include/cuda.h:252:13: note: 'CUdevice' declared here
  typedef int CUdevice;                                     /**< CUDA device */
              ^
  1 error generated when compiling for sm_30.

Again, it reports that there's an error on device side. My understanding is that with this patch clang would've succeeded, which is, IMO, incorrect.

> The error itself is, in fact, should be clear enough, the most confusing part is the diagnostic message and suggestions from clang as host- and device-side compilations are quite different, especially the error message may be mixed with other-side the normal output.

Mixing errors and output will be confusing no matter what you do. That's why diags go to stderr by default, so you can separate them from the regular output.
As far as errors go, clang clearly labels which side of the compilation produced them:

`1 error generated when compiling for sm_30.`

>> To think of it, I'm starting to doubt that this patch is an improvement for `-M` either. You will get the dependencies for the host, but they are not necessarily the same as the dependencies for the device-side compilation. Producing a partial list of dependencies will be potentially incorrect. IMO we do need dependencies info from all sub-compilations.
> 
> Even without this patch, `-M` or more specifically `-MD` already breaks now as we just run the dependency generation action twice for each side. The later will overwrite the former *.d file. We need special handling of `-M` to match nvcc.

This sounds like a bug to me. We should've refused to write multiple files -- similar to how we refuse to preprocess if we have more than one sub-compilation.
OK. In this sense defaulting to host-only here would be a marginal improvement. Until we can produce complete dependency info for all sub-compilations letting user specifically select one (with host being default) is probably the best choice we can make at the moment.

> 
> 
>> Perhaps we should limit the scope of this patch to -E only for now?
> 
> Just found nvcc's `-E` returns the output of the device-side compilation for the first GPU arch specified. Anyway, whether to match that behavior is just another question.

NVCC is not a good guideline here. Considering that they do source code transformation, it's not quite clear what exactly -E means for NVCC at all.
Defaulting to host-side -E would probably be fine as long as user can override it with --cuda-device-only.

> but some tools, like clang-tidy, may be found difficult to insert that option properly, says `clang-tidy -p` supposes to read the compilation command databased generated by cmake or meta-build systems and performs additional checks for sources. Adding that option may inevitably make them CUDA/HIP aware.

I'm not sure I understand your argument here. Clang driver does create multiple sub-compilations for CUDA/HIP. That's the reality. The tools that use the library commonly assume that 1 compilation == 1 clang action, which is not valid for CUDA. Our options are:

- Let user specify which subcompilation they want. This requires passing extra flags and complicates their use, especially for tools that rely on using the exact flags used during normal build.
- Teach the tools to deal with multiple actions. This will need changing all tools. Probably the best way long-term, but does not help much now.
- Let libclang pick a sensible default, but allow the tool override it if they need it. This would be preferred way, IMO. This keeps current tools working and allows future tools to dig into all sub-compilations if they wish.
- Let clang driver guess that the user would only need one sub-compilation (e.g. assuming that the tools would commonly use -fsyntax-only). This, IMO, not the right place to deal with tooling issues and it does result in what I think is incorrect/unexpected output for normal use of -fsyntax-only in clang.

>> If that's still a problem, then we should change the tooling infrastructure to use host-only compilation for HIP and CUDA by default.
> 
> That's an option I was looking into as well. But, generally speaking, we need a clear definition on expected output for options like `-M`, `-MD`, `-E`, `-fsyntax-only`, `-S -emit-llvm`, even `-S` (for HIP only.)

I think we have a decent understanding of what we want (even if clang does not quite do the right thing ATM as you've pointed above regarding `-M`).

- `-M`, `-MD`, `-E`, `-S -emit-llvm`, even `-S`

  Require unambiguous single sub-compilation. It could either be explicitly specified by user (--cuda-host-only/cuda-device-only w/ single GPU arch), or have a reasonable default (host?)

- `-fsyntax-only`

  Should succeed only if all sub-compilations succeeded. Error reporting stops after one of sub-compilations fail with clear indication of which side produced the errors.

Does this make sense?

>> Also, this patch makes it impossible to run -fsyntax-only on *all* sub-compilations at once. I will have to run `clang -fsyntax-only` multiple times -- once per host and once per each device.
> 
> We do have another option `-cuda-compile-host-device` to explicit ask for host- and device-side compilations.

Does it work with this patch? IIRC, `--cuda-compile-host-device` would only override `--cuda-host-only/--cuda-device-only` flags and with this patch we'd still end up with the host compilation only.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68587/new/

https://reviews.llvm.org/D68587