[clang] 4aa87a1 - [OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels
Joseph Huber via cfe-commits
cfe-commits at lists.llvm.org
Thu Oct 6 16:22:19 PDT 2022
Author: Joseph Huber
Date: 2022-10-06T18:22:09-05:00
New Revision: 4aa87a131f93de73b8e245021af0294545d063c5
URL: https://github.com/llvm/llvm-project/commit/4aa87a131f93de73b8e245021af0294545d063c5
DIFF: https://github.com/llvm/llvm-project/commit/4aa87a131f93de73b8e245021af0294545d063c5.diff
LOG: [OpenMP][AMDGPU] Add 'uniform-work-group' attribute to OpenMP kernels
The `cl-uniform-work-group` attribute asserts that the global work-size
be a multiple of the work-group specified work group size. This should
allow optimizations. It is already present by default in the AMD
compiler and for HIP kernels so it should be safe to allow this for
OpenMP kernels by default.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135374
Added:
Modified:
clang/lib/CodeGen/TargetInfo.cpp
clang/test/OpenMP/amdgcn-attributes.cpp
Removed:
################################################################################
diff --git a/clang/lib/CodeGen/TargetInfo.cpp b/clang/lib/CodeGen/TargetInfo.cpp
index 02bcf8a19bbfb..74997f0f9e9dc 100644
--- a/clang/lib/CodeGen/TargetInfo.cpp
+++ b/clang/lib/CodeGen/TargetInfo.cpp
@@ -9423,8 +9423,12 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const bool IsHIPKernel =
M.getLangOpts().HIP && FD && FD->hasAttr<CUDAGlobalAttr>();
+ const bool IsOpenMPkernel =
+ M.getLangOpts().OpenMPIsDevice &&
+ (F->getCallingConv() == llvm::CallingConv::AMDGPU_KERNEL);
- if (IsHIPKernel)
+ // TODO: This should be moved to language specific attributes instead.
+ if (IsHIPKernel || IsOpenMPkernel)
F->addFnAttr("uniform-work-group-size", "true");
if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
diff --git a/clang/test/OpenMP/amdgcn-attributes.cpp b/clang/test/OpenMP/amdgcn-attributes.cpp
index ab11bf765b9bc..506989a8de138 100644
--- a/clang/test/OpenMP/amdgcn-attributes.cpp
+++ b/clang/test/OpenMP/amdgcn-attributes.cpp
@@ -32,10 +32,10 @@ int callable(int x) {
return x + 1;
}
- // DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
- // CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
- // NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
- // UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// DEFAULT: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+// CPU: attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" "uniform-work-group-size"="true" }
+// NOIEEE: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-ieee"="false" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-nans-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
+// UNSAFEATOMIC: attributes #0 = { convergent noinline norecurse nounwind optnone "amdgpu-unsafe-fp-atomics"="true" "frame-pointer"="none" "kernel" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
// DEFAULT: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// CPU: attributes #1 = { convergent mustprogress noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx900" "target-features"="+16-bit-insts,+ci-insts,+dpp,+flat-address-space,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst" }
More information about the cfe-commits
mailing list