[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain
Joseph Huber via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jan 18 12:27:12 PST 2023
jhuber6 added inline comments.
================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:448-450
+ // If we are invoking `nvlink` internally we need to output a `.cubin` file.
+ // Checking if the output is a temporary is the cleanest way to determine
+ // this. Putting this logic in `getInputFIlename` isn't an option because it
----------------
tra wrote:
> Can't say that I like this approach. It heavily relies on "happens to work".
>
> Perhaps a better way to deal with this is to create the temporary GPU object file name with ".cubin" extension to start with.
>
>
>
As far as I know, the files are created independently of the tool-chains so I'm not sure if we'd be able to check there. The current way is to use `getInputFilename` but that doesn't have access to the compilation. As far as I can come up with there's the following solutions
- Do this and check the temp files
- Create a symbolic link if the file is empty (Doesn't work on Windows)
- Make a random global that's true if the Linker tool was built at some point
================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:469
- bool Relocatable = false;
+ bool Relocatable = true;
if (JA.isOffloading(Action::OFK_OpenMP))
----------------
tra wrote:
> Nit: I'd add an `else { Relocatable = false; // comment on what use cases use relocatable compilation by default. }` and leave it uninitialized here. At the very least it may be worth a comment. Silently defaulting to `true` makes me ask "what other compilation modes we may have?" and stand-alone compilation targeting NVPTX is hard to infer here from the surrounding details.
>
>
Works for me.
================
Comment at: clang/test/Driver/cuda-cross-compiling.c:37-38
+// ARGS: -cc1" "-triple" "nvptx64-nvidia-cuda" "-S" {{.*}} "-target-cpu" "sm_61" "-target-feature" "+ptx{{[0-9]+}}" {{.*}} "-o" "[[PTX:.+]].s"
+// ARGS-NEXT: ptxas" "-m64" "-O0" "--gpu-name" "sm_61" "--output-file" "[[CUBIN:.+]].cubin" "[[PTX]].s" "-c"
+// ARGS-NEXT: nvlink" "-o" "a.out" "-arch" "sm_61" {{.*}} "[[CUBIN]].cubin"
+
----------------
tra wrote:
> This may fail on windows where ptxas/nvlink will be `ptxas.exe` `nvlink.exe`. I think we typically use something like `fatbinary{{.*}}"` in other tests.
Good point.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140158/new/
https://reviews.llvm.org/D140158
More information about the cfe-commits
mailing list