[llvm] [docs][CUDA] Document --offload-arch in CompileCudaWithLLVM (PR #190558)
via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 5 16:26:39 PDT 2026
https://github.com/nataliakokoromyti created https://github.com/llvm/llvm-project/pull/190558
The docs say to use --cuda-gpu-arch for Clang to compile CUDA, but that flag is just an old alias for --offload-arch. I’m updating the docs to point to the proper flag that should be used from now on.
>From 345fd022982adde4423d5dc3349bb5ed60b47057 Mon Sep 17 00:00:00 2001
From: nataliakokoromyti <nataliakokoromyti at gmail.com>
Date: Sun, 5 Apr 2026 16:13:27 -0700
Subject: [PATCH] [docs][CUDA] Prefer --offload-arch in CompileCudaWithLLVM
---
llvm/docs/CompileCudaWithLLVM.rst | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst
index a557112c9e7f3..c9a374ea7746e 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -55,7 +55,7 @@ brackets as described below:
.. code-block:: console
- $ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \
+ $ clang++ axpy.cu -o axpy --offload-arch=<GPU arch> \
-L<CUDA install path>/<lib64 or lib> \
-lcudart_static -ldl -lrt -pthread
$ ./axpy
@@ -81,14 +81,15 @@ run your program.
* ``<GPU arch>`` -- the `compute capability
<https://developer.nvidia.com/cuda-gpus>`_ of your GPU. For example, if you
want to run your program on a GPU with compute capability of 3.5, specify
- ``--cuda-gpu-arch=sm_35``.
+ ``--offload-arch=sm_35``.
- Note: You cannot pass ``compute_XX`` as an argument to ``--cuda-gpu-arch``;
+ Note: You cannot pass ``compute_XX`` as an argument to ``--offload-arch``;
only ``sm_XX`` is currently supported. However, clang always includes PTX in
- its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30`` would be
+ its binaries, so e.g. a binary compiled with ``--offload-arch=sm_30`` would be
forwards-compatible with e.g. ``sm_35`` GPUs.
- You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.
+ You can pass ``--offload-arch`` multiple times to compile for multiple archs.
+ ``--cuda-gpu-arch`` is a legacy alias for ``--offload-arch``.
The `-L` and `-l` flags only need to be passed when linking. When compiling,
you may also need to pass ``--cuda-path=/path/to/cuda`` if you didn't install
More information about the llvm-commits
mailing list