[clang] 6a9cf21 - [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Fri Aug 6 11:22:24 PDT 2021
Author: Artem Belevich
Date: 2021-08-06T11:13:52-07:00
New Revision: 6a9cf21f5a2dcd02f90075d6d3576a87f1abd8a9
URL: https://github.com/llvm/llvm-project/commit/6a9cf21f5a2dcd02f90075d6d3576a87f1abd8a9
DIFF: https://github.com/llvm/llvm-project/commit/6a9cf21f5a2dcd02f90075d6d3576a87f1abd8a9.diff
LOG: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.
While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.
For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.
Differential Revision: https://reviews.llvm.org/D106401
Added:
Modified:
clang/lib/Driver/ToolChains/Cuda.cpp
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
Removed:
################################################################################
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index c4d1ebdf6913..37a4da80c03c 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -685,7 +685,8 @@ void CudaToolChain::addClangTargetOptions(
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");
if (DeviceOffloadingKind == Action::OFK_Cuda) {
- CC1Args.push_back("-fcuda-is-device");
+ CC1Args.append(
+ {"-fcuda-is-device", "-mllvm", "-enable-memcpyopt-without-libcalls"});
if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))
diff --git a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
index 0dd0b45cf054..42650f3b6f2e 100644
--- a/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
+++ b/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
@@ -67,6 +67,10 @@ using namespace llvm;
#define DEBUG_TYPE "memcpyopt"
+static cl::opt<bool> EnableMemCpyOptWithoutLibcalls(
+ "enable-memcpyopt-without-libcalls", cl::init(false), cl::Hidden,
+ cl::desc("Enable memcpyopt even when libcalls are disabled"));
+
static cl::opt<bool>
EnableMemorySSA("enable-memcpyopt-memoryssa", cl::init(true), cl::Hidden,
cl::desc("Use MemorySSA-backed MemCpyOpt."));
@@ -677,8 +681,9 @@ bool MemCpyOptPass::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
// the corresponding libcalls are not available.
// TODO: We should really distinguish between libcall availability and
// our ability to introduce intrinsics.
- if (T->isAggregateType() && TLI->has(LibFunc_memcpy) &&
- TLI->has(LibFunc_memmove)) {
+ if (T->isAggregateType() &&
+ (EnableMemCpyOptWithoutLibcalls ||
+ (TLI->has(LibFunc_memcpy) && TLI->has(LibFunc_memmove)))) {
MemoryLocation LoadLoc = MemoryLocation::get(LI);
// We use alias analysis to check if an instruction may store to
@@ -806,7 +811,7 @@ bool MemCpyOptPass::processStore(StoreInst *SI, BasicBlock::iterator &BBI) {
// this if the corresponding libfunc is not available.
// TODO: We should really distinguish between libcall availability and
// our ability to introduce intrinsics.
- if (!TLI->has(LibFunc_memset))
+ if (!(TLI->has(LibFunc_memset) || EnableMemCpyOptWithoutLibcalls))
return false;
// There are two cases that are interesting for this code to handle: memcpy
diff --git a/llvm/test/Transforms/MemCpyOpt/no-libcalls.ll b/llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
index c4d935158435..ac7cfc55ce50 100644
--- a/llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
+++ b/llvm/test/Transforms/MemCpyOpt/no-libcalls.ll
@@ -1,6 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -memcpyopt < %s | FileCheck %s --check-prefixes=CHECK,LIBCALLS
; RUN: opt -S -memcpyopt -mtriple=amdgcn-- < %s | FileCheck %s --check-prefixes=CHECK,NO-LIBCALLS
+; RUN: opt -S -memcpyopt -mtriple=amdgcn-- -enable-memcpyopt-without-libcalls < %s \
+; RUN: | FileCheck %s --check-prefixes=CHECK,LIBCALLS
; REQUIRES: amdgpu-registered-target
More information about the cfe-commits
mailing list