[clang] [OpenCL] Allow -fno-offload-uniform-block for 1.2 (PR #79026)
Yaxun Liu via cfe-commits
cfe-commits at lists.llvm.org
Mon Jan 22 10:18:31 PST 2024
https://github.com/yxsamliu created https://github.com/llvm/llvm-project/pull/79026
OpenCL 1.2 kernel assumes uniform workgroup size. By default a function attr is added to allow backend to do optimizations. These optimizations may cause UB if such kernels are launched with non-uniform workgroup sizes.
Although OpenCL 1.2 itself does not support non-uniform workgroup launching, OpenCL 1.2 kernels may be launched by OpenCL 2.0 or other offloading language runtime that supports non-uniform workgroup size. Therefore it is useful to allow -fno-offload-uniform-block to override the default uniform-block assumption for OpenCL 1.2.
>From 20d2a32597280a5f29071c3721b81506b24b8bf8 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <yaxun.liu at amd.com>
Date: Mon, 22 Jan 2024 12:51:36 -0500
Subject: [PATCH] [OpenCL] Allow -fno-offload-uniform-block for 1.2
OpenCL 1.2 kernel assumes uniform workgroup size. By default
a function attr is added to allow backend to do optimizations.
These optimizations may cause UB if such kernels are launched
with non-uniform workgroup sizes.
Although OpenCL 1.2 itself does not support non-uniform workgroup
launching, OpenCL 1.2 kernels may be launched by OpenCL 2.0
or other offloading language runtime that supports non-uniform
workgroup size. Therefore it is useful to allow -fno-offload-uniform-block
to override the default uniform-block assumption for OpenCL 1.2.
---
clang/include/clang/Driver/Options.td | 3 ++-
clang/lib/CodeGen/CGCall.cpp | 27 +++++++++----------
.../test/CodeGenOpenCL/cl-uniform-wg-size.cl | 1 +
3 files changed, 15 insertions(+), 16 deletions(-)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index f9e883e3e22de86..ea257e5dc45907a 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1007,7 +1007,8 @@ def b : JoinedOrSeparate<["-"], "b">, Flags<[LinkerInput]>,
Group<Link_Group>;
defm offload_uniform_block : BoolFOption<"offload-uniform-block",
- LangOpts<"OffloadUniformBlock">, Default<"LangOpts->CUDA">,
+ LangOpts<"OffloadUniformBlock">,
+ Default<"LangOpts->CUDA || (LangOpts->OpenCL && LangOpts->OpenCLVersion <= 120)">,
PosFlag<SetTrue, [], [ClangOption, CC1Option], "Assume">,
NegFlag<SetFalse, [], [ClangOption, CC1Option], "Don't assume">,
BothFlags<[], [ClangOption], " that kernels are launched with uniform block sizes (default true for CUDA/HIP and false otherwise)">>;
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index acf6cbad1c74809..c81f907fd6cfc4d 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2431,21 +2431,18 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
NumElemsParam);
}
- if (TargetDecl->hasAttr<OpenCLKernelAttr>()) {
- if (getLangOpts().OpenCLVersion <= 120) {
- // OpenCL v1.2 Work groups are always uniform
- FuncAttrs.addAttribute("uniform-work-group-size", "true");
- } else {
- // OpenCL v2.0 Work groups may be whether uniform or not.
- // '-cl-uniform-work-group-size' compile option gets a hint
- // to the compiler that the global work-size be a multiple of
- // the work-group size specified to clEnqueueNDRangeKernel
- // (i.e. work groups are uniform).
- FuncAttrs.addAttribute(
- "uniform-work-group-size",
- llvm::toStringRef(getLangOpts().OffloadUniformBlock));
- }
- }
+ // OpenCL v1.2 Work groups are always uniform
+ // OpenCL v2.0 Work groups may be whether uniform or not.
+ // '-cl-uniform-work-group-size' compile option gets a hint
+ // to the compiler that the global work-size be a multiple of
+ // the work-group size specified to clEnqueueNDRangeKernel
+ // (i.e. work groups are uniform).
+ // OffloadUniformBlock defaults to true for OpenCL v1.2 and false
+ // for OpenCL 2.0, and its value is overriden by a compilation option.
+ if (TargetDecl->hasAttr<OpenCLKernelAttr>())
+ FuncAttrs.addAttribute(
+ "uniform-work-group-size",
+ llvm::toStringRef(getLangOpts().OffloadUniformBlock));
if (TargetDecl->hasAttr<CUDAGlobalAttr>() &&
getLangOpts().OffloadUniformBlock)
diff --git a/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl b/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
index d139621ede4e7e4..1dc06944eb858d0 100644
--- a/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
+++ b/clang/test/CodeGenOpenCL/cl-uniform-wg-size.cl
@@ -1,4 +1,5 @@
// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL1.2 -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM
+// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL1.2 -fno-offload-uniform-block -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-NONUNIFORM
// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-NONUNIFORM
// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -cl-uniform-work-group-size -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM
// RUN: %clang_cc1 -emit-llvm -O0 -cl-std=CL2.0 -foffload-uniform-block -o - %s 2>&1 | FileCheck %s -check-prefixes CHECK,CHECK-UNIFORM
More information about the cfe-commits
mailing list