[llvm] 30fd35f - AMDGPU: Add some notes about amdgpu-flat-work-group-size

Fri Jul 7 16:02:51 PDT 2023

Author: Matt Arsenault
Date: 2023-07-07T19:02:46-04:00
New Revision: 30fd35f59ceb4c00a550b82af767a5b9cf9e252d

URL: https://github.com/llvm/llvm-project/commit/30fd35f59ceb4c00a550b82af767a5b9cf9e252d
DIFF: https://github.com/llvm/llvm-project/commit/30fd35f59ceb4c00a550b82af767a5b9cf9e252d.diff

LOG: AMDGPU: Add some notes about amdgpu-flat-work-group-size

Added: 
    

Modified: 
    llvm/docs/AMDGPUUsage.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 2fae09a1bced59..dbe6e69a3b3975 100644

--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -999,7 +999,12 @@ The AMDGPU backend supports the following LLVM IR attributes.
      "amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that
                                              will be specified when the kernel is dispatched. Generated
                                              by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
-                                             The implied default value is 1,1024.
+                                             The IR implied default value is 1,1024. Clang may emit this attribute
+                                             with more restrictive bounds depending on language defaults.
+                                             If the actual block or workgroup size exceeds the limit at any point during
+                                             the execution, the behavior is undefined. For example, even if there is
+                                             only one active thread but the thread local id exceeds the limit, the
+                                             behavior is undefined.
 
      "amdgpu-implicitarg-num-bytes"="n"      Number of kernel argument bytes to add to the kernel
                                              argument block size for the implicit arguments. This