[clang] [OpenCL] Allow -fno-offload-uniform-block for 1.2 (PR #79026)

Yaxun Liu via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 22 10:18:31 PST 2024


https://github.com/yxsamliu created https://github.com/llvm/llvm-project/pull/79026

OpenCL 1.2 kernel assumes uniform workgroup size. By default a function attr is added to allow backend to do optimizations. These optimizations may cause UB if such kernels are launched with non-uniform workgroup sizes.

Although OpenCL 1.2 itself does not support non-uniform workgroup launching, OpenCL 1.2 kernels may be launched by OpenCL 2.0 or other offloading language runtime that supports non-uniform workgroup size. Therefore it is useful to allow -fno-offload-uniform-block to override the default uniform-block assumption for OpenCL 1.2.

>From 20d2a32597280a5f29071c3721b81506b24b8bf8 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Mon, 22 Jan 2024 12:51:36 -0500
Subject: [PATCH] [OpenCL] Allow -fno-offload-uniform-block for 1.2

OpenCL 1.2 kernel assumes uniform workgroup size. By default
a function attr is added to allow backend to do optimizations.
These optimizations may cause UB if such kernels are launched
with non-uniform workgroup sizes.

Although OpenCL 1.2 itself does not support non-uniform workgroup
launching, OpenCL 1.2 kernels may be launched by OpenCL 2.0
or other offloading language runtime that supports non-uniform
workgroup size. Therefore it is useful to allow -fno-offload-uniform-block
to override the default uniform-block assumption for OpenCL 1.2.
---
 clang/include/clang/Driver/Options.td         |  3 ++-
 clang/lib/CodeGen/CGCall.cpp                  | 27 +++++++++----------
 .../test/CodeGenOpenCL/cl-uniform-wg-size.cl  |  1 +
 3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..ea257e5dc45907a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1007,7 +1007,8 @@ def b : JoinedOrSeparate<["-"], "b">, Flags<[LinkerInput]>,
   Group<Link_Group>;
 
 defm offload_uniform_block : BoolFOption<"offload-uniform-block",
-  LangOpts<"OffloadUniformBlock">, Default<"LangOpts->CUDA">,
+  LangOpts<"OffloadUniformBlock">,
+  Default<"LangOpts->CUDA || (LangOpts->OpenCL && LangOpts->OpenCLVersion <= 120)">,
   PosFlag<SetTrue, [], [ClangOption, CC1Option], "Assume">,
   NegFlag<SetFalse, [], [ClangOption, CC1Option], "Don't assume">,
   BothFlags<[], [ClangOption], " that kernels are launched with uniform block sizes (default true for CUDA/HIP and false otherwise)">>;
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index acf6cbad1c74809..c81f907fd6cfc4d 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2431,21 +2431,18 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
                                  NumElemsParam);
     }
 
-    if (TargetDecl->hasAttr<OpenCLKernelAttr>()) {
-      if (getLangOpts().OpenCLVersion <= 120) {
-        // OpenCL v1.2 Work groups are always uniform
-        FuncAttrs.addAttribute("uniform-work-group-size", "true");
-      } else {
-        // OpenCL v2.0 Work groups may be whether uniform or not.
-        // '-cl-uniform-work-group-size' compile option gets a hint
-        // to the compiler that the global work-size be a multiple of
-        // the work-group size specified to clEnqueueNDRangeKernel
-        // (i.e. work groups are uniform).
-        FuncAttrs.addAttribute(
-            "uniform-work-group-size",
-            llvm::toStringRef(getLangOpts().OffloadUniformBlock));
-      }
-    }
+    // OpenCL v1.2 Work groups are always uniform
+    // OpenCL v2.0 Work groups may be whether uniform or not.
+    // '-cl-uniform-work-group-size' compile option gets a hint
+    // to the compiler that the global work-size be a multiple of
+    // the work-group size specified to clEnqueueNDRangeKernel
+    // (i.e. work groups are uniform).
+    // OffloadUniformBlock defaults to true for OpenCL v1.2 and false
+    // for OpenCL 2.0, and its value is overriden by a compilation option.
+    if (TargetDecl->hasAttr<OpenCLKernelAttr>())
+      FuncAttrs.addAttribute(
+          "uniform-work-group-size",
+          llvm::toStringRef(getLangOpts().OffloadUniformBlock));
 
     if (TargetDecl->hasAttr<CUDAGlobalAttr>() &&
         getLangOpts().OffloadUniformBlock)
diff --git a/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl b/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
index d139621ede4e7e4..1dc06944eb858d0 100644
--- a/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+++ b/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL1.2 -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM
+// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL1.2 -fno-offload-uniform-block -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-NONUNIFORM
 // RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-NONUNIFORM
 // RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -cl-uniform-work-group-size -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM
 // RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -foffload-uniform-block -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM



More information about the cfe-commits mailing list