[llvm] [docs][CUDA] Document --offload-arch in CompileCudaWithLLVM (PR #190558)

Sun Apr 5 16:26:39 PDT 2026

https://github.com/nataliakokoromyti created https://github.com/llvm/llvm-project/pull/190558

The docs say to use --cuda-gpu-arch for Clang to compile CUDA, but that flag is just an old alias for --offload-arch. I’m updating the docs to point to the proper flag that should be used from now on.

>From 345fd022982adde4423d5dc3349bb5ed60b47057 Mon Sep 17 00:00:00 2001
From: nataliakokoromyti <nataliakokoromyti at gmail.com>
Date: Sun, 5 Apr 2026 16:13:27 -0700
Subject: [PATCH] [docs][CUDA] Prefer --offload-arch in CompileCudaWithLLVM

---
 llvm/docs/CompileCudaWithLLVM.rst | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst
index a557112c9e7f3..c9a374ea7746e 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -55,7 +55,7 @@ brackets as described below:
 
 .. code-block:: console
 
-  $ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \
+  $ clang++ axpy.cu -o axpy --offload-arch=<GPU arch> \
       -L<CUDA install path>/<lib64 or lib>             \
       -lcudart_static -ldl -lrt -pthread
   $ ./axpy
@@ -81,14 +81,15 @@ run your program.
 * ``<GPU arch>`` -- the `compute capability
   <https://developer.nvidia.com/cuda-gpus>`_ of your GPU. For example, if you
   want to run your program on a GPU with compute capability of 3.5, specify
-  ``--cuda-gpu-arch=sm_35``.
+  ``--offload-arch=sm_35``.
 
-  Note: You cannot pass ``compute_XX`` as an argument to ``--cuda-gpu-arch``;
+  Note: You cannot pass ``compute_XX`` as an argument to ``--offload-arch``;
   only ``sm_XX`` is currently supported.  However, clang always includes PTX in
-  its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30`` would be
+  its binaries, so e.g. a binary compiled with ``--offload-arch=sm_30`` would be
   forwards-compatible with e.g. ``sm_35`` GPUs.
 
-  You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.
+  You can pass ``--offload-arch`` multiple times to compile for multiple archs.
+  ``--cuda-gpu-arch`` is a legacy alias for ``--offload-arch``.
 
 The `-L` and `-l` flags only need to be passed when linking.  When compiling,
 you may also need to pass ``--cuda-path=/path/to/cuda`` if you didn't install