[llvm] [doc] Update clang docs for PTX generation (PR #172588)

Tue Dec 16 19:01:50 PST 2025

https://github.com/philsc created https://github.com/llvm/llvm-project/pull/172588

As of clang 19, PTX is no longer included by default. This was
creating confusion when I upgraded from clang 17 to clang 21. That
change in defaults is documented in the changelog, but not in the
docs.
https://releases.llvm.org/19.1.0/tools/clang/docs/ReleaseNotes.html#cuda-hip-language-changes

Fix the documentation to mention the flag necessary to include PTX in
the binaries.

>From b7853e25779c84ed7884b86e3c852cd3e6e24edf Mon Sep 17 00:00:00 2001
From: Philipp Schrader <philipp.schrader at bluerivertech.com>
Date: Tue, 16 Dec 2025 18:44:28 -0800
Subject: [PATCH] [doc] Update clang docs for PTX generation

As of clang 19, PTX is no longer included by default. This was
creating confusion when I upgraded from clang 17 to clang 21. That
change in defaults is documented in the changelog, but not in the
docs.
https://releases.llvm.org/19.1.0/tools/clang/docs/ReleaseNotes.html#cuda-hip-language-changes

Fix the documentation to mention the flag necessary to include PTX in
the binaries.
---
 llvm/docs/CompileCudaWithLLVM.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst
index 0bd121a895028..3992798990426 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -84,9 +84,10 @@ run your program.
   ``--cuda-gpu-arch=sm_35``.
 
   Note: You cannot pass ``compute_XX`` as an argument to ``--cuda-gpu-arch``;
-  only ``sm_XX`` is currently supported.  However, clang always includes PTX in
-  its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30`` would be
-  forwards-compatible with e.g. ``sm_35`` GPUs.
+  only ``sm_XX`` is currently supported.  Note that clang does not include PTX
+  in its binaries by default. Use ``--cuda-include-ptx=all`` to make clang
+  include PTX in its binaries. With this flag, a binary compiled with
+  ``--cuda-gpu-arch=sm_30`` would be forwards-compatible with e.g. ``sm_35``.
 
   You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.