r236765 - [cuda] Include GPU binary into host object file and generate init/deinit code.
Artem Belevich
tra at google.com
Mon May 11 10:43:54 PDT 2015
Could you send me output of the CC1 executed by the test before it's piped
into FileCheck?
/scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem
/scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include
-nostdsysteminc -emit-llvm /gsa/tlbgsa-h1/08/hstong/pub/
cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -fcuda-include-gpubinary
/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -o
-
Oh, and I see a typo in the script -- "CHEKC: call{{.*}}kernelfunc", though
it's probably not what breaks the test in your case.
--Artem
On Sat, May 9, 2015 at 11:10 AM, Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:
> Hi Artem,
>
> I am encountering a failure with device-stub.cu on s390x-suse-linux. Can
> you take a look?
>
> *Output:*
> FAIL: Clang :: CodeGenCUDA/device-stub.cu (1986 of 21893)
> ******************** TEST 'Clang :: CodeGenCUDA/device-stub.cu' FAILED
> ********************
> Script:
> --
> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem
> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include
> -nostdsysteminc -emit-llvm
> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
> device-stub.cu -fcuda-include-gpubinary
> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
> device-stub.cu -o - |
> /scratch/hstong/workdir/Release+Asserts/bin/FileCheck
> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
> device-stub.cu
> --
> Exit Code: 1
>
> Command Output (stderr):
> --
>
> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu:7:11:
> error: expected string not found in input
> // CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", align 1
> ^
> <stdin>:1:1: note: scanning from here
> ; ModuleID =
> '/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
> device-stub.cu'
> ^
> <stdin>:13:298: note: possible intended match here
> @1 = private unnamed_addr constant [2259 x i8] c"// RUN: %clang_cc1
> -emit-llvm %s -fcuda-include-gpubinary %s -o - | FileCheck %s\0A\0A#include
> \22Inputs/cuda.h\22\0A\0A// Make sure that all parts of GPU code
> init/cleanup are there:\0A// * constant unnamed string with the kernel
> name\0A// CHECK: private unnamed_addr
> constant{{.*}}kernelfunc{{.*}}\5C00\22, align 1\0A// * constant unnamed
> string with GPU binary\0A// CHECK: private unnamed_addr
> constant{{.*}}\5C00\22\0A// * constant struct that wraps GPU binary\0A//
> CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, i8* }
> \0A// CHECK: { i32 1180844977, i32 1, {{.*}}, i8* null }\0A// * variable to
> save GPU binary handle after initialization\0A// CHECK:
> @__cuda_gpubin_handle = internal global i8** null\0A// * Make sure our
> constructor/destructor was added to global ctor/dtor list.\0A// CHECK:
> @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor\0A// CHECK:
> @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor\0A\0A//
> Test that we build the correct number of calls to cudaSetupArgument
> followed\0A// by a call to cudaLaunch.\0A\0A// CHECK:
> define{{.*}}kernelfunc\0A// CHECK: call{{.*}}cudaSetupArgument\0A// CHECK:
> call{{.*}}cudaSetupArgument\0A// CHECK: call{{.*}}cudaSetupArgument\0A//
> CHECK: call{{.*}}cudaLaunch\0A__global__ void kernelfunc(int i, int j, int
> k) {}\0A\0A// Test that we've built correct kernel launch sequence.\0A//
> CHECK: define{{.*}}hostfunc\0A// CHECK: call{{.*}}cudaConfigureCall\0A//
> CHEKC: call{{.*}}kernelfunc\0Avoid hostfunc(void) { kernelfunc<<<1, 1>>>(1,
> 1, 1); }\0A\0A// Test that we've built a function to register kernels\0A//
> CHECK: define internal void @__cuda_register_kernels\0A// CHECK:
> call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc\0A\0A// Test that
> we've built contructor..\0A// CHECK: define internal void
> @__cuda_module_ctor\0A// .. that calls
> __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)\0A// CHECK:
> call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper\0A// .. stores
> return value in __cuda_gpubin_handle\0A// CHECK-NEXT:
> store{{.*}}__cuda_gpubin_handle\0A// .. and then calls
> __cuda_register_kernels\0A// CHECK-NEXT: call void
> @__cuda_register_kernels\0A\0A// Test that we've created destructor.\0A//
> CHECK: define internal void @__cuda_module_dtor\0A// CHECK:
> load{{.*}}__cuda_gpubin_handle\0A// CHECK-NEXT: call void
> @__cudaUnregisterFatBinary\0A\0A\00", align 2
>
> ^
>
> --
>
> ********************
>
> *Build environment info:*
> > g++ -v
> Using built-in specs.
> COLLECT_GCC=g++
> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/s390x-suse-linux/4.8/lto-wrapper
> Target: s390x-suse-linux
> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
> --enable-languages=c,c++,objc,fortran,obj-c++,java
> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8
> --enable-ssp --disable-libssp --disable-plugin --with-bugurl=
> http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj
> --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib
> --enable-__cxa_atexit --enable-libstdcxx-allocator=new
> --disable-libstdcxx-pch --enable-version-specific-runtime-libs
> --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8
> --without-system-libunwind --with-tune=zEC12 --with-arch=z196
> --with-long-double-128 --enable-decimal-float --build=s390x-suse-linux
> --host=s390x-suse-linux
> Thread model: posix
> gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux)
>
> Thanks,
>
>
> Hubert Tong
>
> On Thu, May 7, 2015 at 2:34 PM, Artem Belevich <tra at google.com> wrote:
>
>> Author: tra
>> Date: Thu May 7 14:34:16 2015
>> New Revision: 236765
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=236765&view=rev
>> Log:
>> [cuda] Include GPU binary into host object file and generate init/deinit
>> code.
>>
>> - added -fcuda-include-gpubinary option to incorporate results of
>> device-side compilation into host-side one.
>> - generate code to register GPU binaries and associated kernels
>> with CUDA runtime and clean-up on exit.
>> - added test case for init/deinit code generation.
>>
>> Differential Revision: http://reviews.llvm.org/D9507
>>
>> Modified:
>> cfe/trunk/include/clang/Driver/CC1Options.td
>> cfe/trunk/include/clang/Frontend/CodeGenOptions.h
>> cfe/trunk/lib/CodeGen/CGCUDANV.cpp
>> cfe/trunk/lib/CodeGen/CGCUDARuntime.h
>> cfe/trunk/lib/CodeGen/CodeGenFunction.cpp
>> cfe/trunk/lib/CodeGen/CodeGenModule.cpp
>> cfe/trunk/lib/Frontend/CompilerInvocation.cpp
>> cfe/trunk/test/CodeGenCUDA/device-stub.cu
>>
>> Modified: cfe/trunk/include/clang/Driver/CC1Options.td
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Driver/CC1Options.td (original)
>> +++ cfe/trunk/include/clang/Driver/CC1Options.td Thu May 7 14:34:16 2015
>> @@ -631,6 +631,8 @@ def fcuda_allow_host_calls_from_host_dev
>> def fcuda_disable_target_call_checks : Flag<["-"],
>> "fcuda-disable-target-call-checks">,
>> HelpText<"Disable all cross-target (host, device, etc.) call checks in
>> CUDA">;
>> +def fcuda_include_gpubinary : Separate<["-"], "fcuda-include-gpubinary">,
>> + HelpText<"Incorporate CUDA device-side binary into host object file.">;
>>
>> } // let Flags = [CC1Option]
>>
>>
>> Modified: cfe/trunk/include/clang/Frontend/CodeGenOptions.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Frontend/CodeGenOptions.h?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/include/clang/Frontend/CodeGenOptions.h (original)
>> +++ cfe/trunk/include/clang/Frontend/CodeGenOptions.h Thu May 7 14:34:16
>> 2015
>> @@ -163,6 +163,11 @@ public:
>> /// Name of the profile file to use as input for -fprofile-instr-use
>> std::string InstrProfileInput;
>>
>> + /// A list of file names passed with -fcuda-include-gpubinary options
>> to
>> + /// forward to CUDA runtime back-end for incorporating them into
>> host-side
>> + /// object file.
>> + std::vector<std::string> CudaGpuBinaryFileNames;
>> +
>> /// Regular expression to select optimizations for which we should
>> enable
>> /// optimization remarks. Transformation passes whose name matches this
>> /// expression (and support this feature), will emit a diagnostic
>>
>> Modified: cfe/trunk/lib/CodeGen/CGCUDANV.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDANV.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/CodeGen/CGCUDANV.cpp (original)
>> +++ cfe/trunk/lib/CodeGen/CGCUDANV.cpp Thu May 7 14:34:16 2015
>> @@ -20,7 +20,6 @@
>> #include "llvm/IR/CallSite.h"
>> #include "llvm/IR/Constants.h"
>> #include "llvm/IR/DerivedTypes.h"
>> -#include <vector>
>>
>> using namespace clang;
>> using namespace CodeGen;
>> @@ -30,29 +29,66 @@ namespace {
>> class CGNVCUDARuntime : public CGCUDARuntime {
>>
>> private:
>> - llvm::Type *IntTy, *SizeTy;
>> - llvm::PointerType *CharPtrTy, *VoidPtrTy;
>> + llvm::Type *IntTy, *SizeTy, *VoidTy;
>> + llvm::PointerType *CharPtrTy, *VoidPtrTy, *VoidPtrPtrTy;
>> +
>> + /// Convenience reference to LLVM Context
>> + llvm::LLVMContext &Context;
>> + /// Convenience reference to the current module
>> + llvm::Module &TheModule;
>> + /// Keeps track of kernel launch stubs emitted in this module
>> + llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
>> + /// Keeps track of variables containing handles of GPU binaries.
>> Populated by
>> + /// ModuleCtorFunction() and used to create corresponding cleanup
>> calls in
>> + /// ModuleDtorFunction()
>> + llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles;
>>
>> llvm::Constant *getSetupArgumentFn() const;
>> llvm::Constant *getLaunchFn() const;
>>
>> + /// Creates a function to register all kernel stubs generated in this
>> module.
>> + llvm::Function *makeRegisterKernelsFn();
>> +
>> + /// Helper function that generates a constant string and returns a
>> pointer to
>> + /// the start of the string. The result of this function can be used
>> anywhere
>> + /// where the C code specifies const char*.
>> + llvm::Constant *makeConstantString(const std::string &Str,
>> + const std::string &Name = "",
>> + unsigned Alignment = 0) {
>> + llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0),
>> + llvm::ConstantInt::get(SizeTy, 0)};
>> + auto *ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str());
>> + return llvm::ConstantExpr::getGetElementPtr(ConstStr->getValueType(),
>> + ConstStr, Zeros);
>> + }
>> +
>> + void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args);
>> +
>> public:
>> CGNVCUDARuntime(CodeGenModule &CGM);
>>
>> - void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args)
>> override;
>> + void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args)
>> override;
>> + /// Creates module constructor function
>> + llvm::Function *makeModuleCtorFunction() override;
>> + /// Creates module destructor function
>> + llvm::Function *makeModuleDtorFunction() override;
>> };
>>
>> }
>>
>> -CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) :
>> CGCUDARuntime(CGM) {
>> +CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
>> + : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
>> + TheModule(CGM.getModule()) {
>> CodeGen::CodeGenTypes &Types = CGM.getTypes();
>> ASTContext &Ctx = CGM.getContext();
>>
>> IntTy = Types.ConvertType(Ctx.IntTy);
>> SizeTy = Types.ConvertType(Ctx.getSizeType());
>> + VoidTy = llvm::Type::getVoidTy(Context);
>>
>> CharPtrTy =
>> llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));
>> VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));
>> + VoidPtrPtrTy = VoidPtrTy->getPointerTo();
>> }
>>
>> llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {
>> @@ -68,14 +104,17 @@ llvm::Constant *CGNVCUDARuntime::getSetu
>>
>> llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {
>> // cudaError_t cudaLaunch(char *)
>> - std::vector<llvm::Type*> Params;
>> - Params.push_back(CharPtrTy);
>> - return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,
>> - Params,
>> false),
>> - "cudaLaunch");
>> + return CGM.CreateRuntimeFunction(
>> + llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");
>> +}
>> +
>> +void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
>> + FunctionArgList &Args) {
>> + EmittedKernels.push_back(CGF.CurFn);
>> + emitDeviceStubBody(CGF, Args);
>> }
>>
>> -void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF,
>> +void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF,
>> FunctionArgList &Args) {
>> // Build the argument value list and the argument stack struct type.
>> SmallVector<llvm::Value *, 16> ArgValues;
>> @@ -87,8 +126,7 @@ void CGNVCUDARuntime::EmitDeviceStubBody
>> assert(isa<llvm::PointerType>(V->getType()) && "Arg type not
>> PointerType");
>>
>> ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType());
>> }
>> - llvm::StructType *ArgStackTy = llvm::StructType::get(
>> - CGF.getLLVMContext(), ArgTypes);
>> + llvm::StructType *ArgStackTy = llvm::StructType::get(Context,
>> ArgTypes);
>>
>> llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");
>>
>> @@ -120,6 +158,160 @@ void CGNVCUDARuntime::EmitDeviceStubBody
>> CGF.EmitBlock(EndBlock);
>> }
>>
>> +/// Creates internal function to register all kernel stubs generated in
>> this
>> +/// module with the CUDA runtime.
>> +/// \code
>> +/// void __cuda_register_kernels(void** GpuBinaryHandle) {
>> +/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...);
>> +/// ...
>> +/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...);
>> +/// }
>> +/// \endcode
>> +llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() {
>> + llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
>> + llvm::GlobalValue::InternalLinkage, "__cuda_register_kernels",
>> &TheModule);
>> + llvm::BasicBlock *EntryBB =
>> + llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
>> + CGBuilderTy Builder(Context);
>> + Builder.SetInsertPoint(EntryBB);
>> +
>> + // void __cudaRegisterFunction(void **, const char *, char *, const
>> char *,
>> + // int, uint3*, uint3*, dim3*, dim3*, int*)
>> + std::vector<llvm::Type *> RegisterFuncParams = {
>> + VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,
>> + VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy,
>> IntTy->getPointerTo()};
>> + llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(
>> + llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
>> + "__cudaRegisterFunction");
>> +
>> + // Extract GpuBinaryHandle passed as the first argument passed to
>> + // __cuda_register_kernels() and generate __cudaRegisterFunction()
>> call for
>> + // each emitted kernel.
>> + llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
>> + for (llvm::Function *Kernel : EmittedKernels) {
>> + llvm::Constant *KernelName = makeConstantString(Kernel->getName());
>> + llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
>> + llvm::Value *args[] = {
>> + &GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),
>> + KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1),
>> NullPtr,
>> + NullPtr, NullPtr, NullPtr,
>> + llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
>> + Builder.CreateCall(RegisterFunc, args);
>> + }
>> +
>> + Builder.CreateRetVoid();
>> + return RegisterKernelsFunc;
>> +}
>> +
>> +/// Creates a global constructor function for the module:
>> +/// \code
>> +/// void __cuda_module_ctor(void*) {
>> +/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0);
>> +/// __cuda_register_kernels(Handle0);
>> +/// ...
>> +/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN);
>> +/// __cuda_register_kernels(HandleN);
>> +/// }
>> +/// \endcode
>> +llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
>> + // void __cuda_register_kernels(void* handle);
>> + llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn();
>> + // void ** __cudaRegisterFatBinary(void *);
>> + llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
>> + llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
>> + "__cudaRegisterFatBinary");
>> + // struct { int magic, int version, void * gpu_binary, void *
>> dont_care };
>> + llvm::StructType *FatbinWrapperTy =
>> + llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy, nullptr);
>> +
>> + llvm::Function *ModuleCtorFunc = llvm::Function::Create(
>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor",
>> &TheModule);
>> + llvm::BasicBlock *CtorEntryBB =
>> + llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);
>> + CGBuilderTy CtorBuilder(Context);
>> +
>> + CtorBuilder.SetInsertPoint(CtorEntryBB);
>> +
>> + // For each GPU binary, register it with the CUDA runtime and store
>> returned
>> + // handle in a global variable and save the handle in GpuBinaryHandles
>> vector
>> + // to be cleaned up in destructor on exit. Then associate all known
>> kernels
>> + // with the GPU binary handle so CUDA runtime can figure out what to
>> call on
>> + // the GPU side.
>> + for (const std::string &GpuBinaryFileName :
>> + CGM.getCodeGenOpts().CudaGpuBinaryFileNames) {
>> + llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =
>> + llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);
>> + if (std::error_code EC = GpuBinaryOrErr.getError()) {
>> + CGM.getDiags().Report(diag::err_cannot_open_file) <<
>> GpuBinaryFileName
>> + << EC.message();
>> + continue;
>> + }
>> +
>> + // Create initialized wrapper structure that points to the loaded
>> GPU binary
>> + llvm::Constant *Values[] = {
>> + llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper
>> magic.
>> + llvm::ConstantInt::get(IntTy, 1), // Fatbin version.
>> + makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16),
>> // Data.
>> + llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in fatbin
>> v1.
>> + llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable(
>> + TheModule, FatbinWrapperTy, true,
>> llvm::GlobalValue::InternalLinkage,
>> + llvm::ConstantStruct::get(FatbinWrapperTy, Values),
>> + "__cuda_fatbin_wrapper");
>> +
>> + // GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
>> + llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
>> + RegisterFatbinFunc,
>> + CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
>> + llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable(
>> + TheModule, VoidPtrPtrTy, false,
>> llvm::GlobalValue::InternalLinkage,
>> + llvm::ConstantPointerNull::get(VoidPtrPtrTy),
>> "__cuda_gpubin_handle");
>> + CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle, false);
>> +
>> + // Call __cuda_register_kernels(GpuBinaryHandle);
>> + CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall);
>> +
>> + // Save GpuBinaryHandle so we can unregister it in destructor.
>> + GpuBinaryHandles.push_back(GpuBinaryHandle);
>> + }
>> +
>> + CtorBuilder.CreateRetVoid();
>> + return ModuleCtorFunc;
>> +}
>> +
>> +/// Creates a global destructor function that unregisters all GPU code
>> blobs
>> +/// registered by constructor.
>> +/// \code
>> +/// void __cuda_module_dtor(void*) {
>> +/// __cudaUnregisterFatBinary(Handle0);
>> +/// ...
>> +/// __cudaUnregisterFatBinary(HandleN);
>> +/// }
>> +/// \endcode
>> +llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {
>> + // void __cudaUnregisterFatBinary(void ** handle);
>> + llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(
>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
>> + "__cudaUnregisterFatBinary");
>> +
>> + llvm::Function *ModuleDtorFunc = llvm::Function::Create(
>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor",
>> &TheModule);
>> + llvm::BasicBlock *DtorEntryBB =
>> + llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);
>> + CGBuilderTy DtorBuilder(Context);
>> + DtorBuilder.SetInsertPoint(DtorEntryBB);
>> +
>> + for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) {
>> + DtorBuilder.CreateCall(UnregisterFatbinFunc,
>> + DtorBuilder.CreateLoad(GpuBinaryHandle,
>> false));
>> + }
>> +
>> + DtorBuilder.CreateRetVoid();
>> + return ModuleDtorFunc;
>> +}
>> +
>> CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
>> return new CGNVCUDARuntime(CGM);
>> }
>>
>> Modified: cfe/trunk/lib/CodeGen/CGCUDARuntime.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDARuntime.h?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/CodeGen/CGCUDARuntime.h (original)
>> +++ cfe/trunk/lib/CodeGen/CGCUDARuntime.h Thu May 7 14:34:16 2015
>> @@ -16,6 +16,10 @@
>> #ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
>> #define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
>>
>> +namespace llvm {
>> +class Function;
>> +}
>> +
>> namespace clang {
>>
>> class CUDAKernelCallExpr;
>> @@ -39,10 +43,17 @@ public:
>> virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
>> const CUDAKernelCallExpr *E,
>> ReturnValueSlot ReturnValue);
>> -
>> - virtual void EmitDeviceStubBody(CodeGenFunction &CGF,
>> - FunctionArgList &Args) = 0;
>>
>> + /// Emits a kernel launch stub.
>> + virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList
>> &Args) = 0;
>> +
>> + /// Constructs and returns a module initialization function or nullptr
>> if it's
>> + /// not needed. Must be called after all kernels have been emitted.
>> + virtual llvm::Function *makeModuleCtorFunction() = 0;
>> +
>> + /// Returns a module cleanup function or nullptr if it's not needed.
>> + /// Must be called after ModuleCtorFunction
>> + virtual llvm::Function *makeModuleDtorFunction() = 0;
>> };
>>
>> /// Creates an instance of a CUDA runtime class.
>>
>> Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original)
>> +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Thu May 7 14:34:16 2015
>> @@ -878,7 +878,7 @@ void CodeGenFunction::GenerateCode(Globa
>> else if (getLangOpts().CUDA &&
>> !getLangOpts().CUDAIsDevice &&
>> FD->hasAttr<CUDAGlobalAttr>())
>> - CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args);
>> + CGM.getCUDARuntime().emitDeviceStub(*this, Args);
>> else if (isa<CXXConversionDecl>(FD) &&
>>
>> cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) {
>> // The lambda conversion to block pointer is special; the semantics
>> can't be
>>
>> Modified: cfe/trunk/lib/CodeGen/CodeGenModule.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenModule.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/CodeGen/CodeGenModule.cpp (original)
>> +++ cfe/trunk/lib/CodeGen/CodeGenModule.cpp Thu May 7 14:34:16 2015
>> @@ -350,6 +350,13 @@ void CodeGenModule::Release() {
>> if (ObjCRuntime)
>> if (llvm::Function *ObjCInitFunction =
>> ObjCRuntime->ModuleInitFunction())
>> AddGlobalCtor(ObjCInitFunction);
>> + if (Context.getLangOpts().CUDA && !Context.getLangOpts().CUDAIsDevice
>> &&
>> + CUDARuntime) {
>> + if (llvm::Function *CudaCtorFunction =
>> CUDARuntime->makeModuleCtorFunction())
>> + AddGlobalCtor(CudaCtorFunction);
>> + if (llvm::Function *CudaDtorFunction =
>> CUDARuntime->makeModuleDtorFunction())
>> + AddGlobalDtor(CudaDtorFunction);
>> + }
>> if (PGOReader && PGOStats.hasDiagnostics())
>> PGOStats.reportDiagnostics(getDiags(),
>> getCodeGenOpts().MainFileName);
>> EmitCtorList(GlobalCtors, "llvm.global_ctors");
>> @@ -3678,4 +3685,3 @@ void CodeGenModule::EmitOMPThreadPrivate
>> CXXGlobalInits.push_back(InitFunction);
>> }
>> }
>> -
>>
>> Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original)
>> +++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Thu May 7 14:34:16 2015
>> @@ -651,6 +651,9 @@ static bool ParseCodeGenArgs(CodeGenOpti
>> Args.getAllArgValues(OPT_fsanitize_recover_EQ),
>> Diags,
>> Opts.SanitizeRecover);
>>
>> + Opts.CudaGpuBinaryFileNames =
>> + Args.getAllArgValues(OPT_fcuda_include_gpubinary);
>> +
>> return Success;
>> }
>>
>>
>> Modified: cfe/trunk/test/CodeGenCUDA/device-stub.cu
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-stub.cu?rev=236765&r1=236764&r2=236765&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/test/CodeGenCUDA/device-stub.cu (original)
>> +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu Thu May 7 14:34:16 2015
>> @@ -1,7 +1,21 @@
>> -// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s
>> +// RUN: %clang_cc1 -emit-llvm %s -fcuda-include-gpubinary %s -o - |
>> FileCheck %s
>>
>> #include "Inputs/cuda.h"
>>
>> +// Make sure that all parts of GPU code init/cleanup are there:
>> +// * constant unnamed string with the kernel name
>> +// CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", align
>> 1
>> +// * constant unnamed string with GPU binary
>> +// CHECK: private unnamed_addr constant{{.*}}\00"
>> +// * constant struct that wraps GPU binary
>> +// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*,
>> i8* }
>> +// CHECK: { i32 1180844977, i32 1, {{.*}}, i64 0, i64 0), i8* null
>> }
>> +// * variable to save GPU binary handle after initialization
>> +// CHECK: @__cuda_gpubin_handle = internal global i8** null
>> +// * Make sure our constructor/destructor was added to global ctor/dtor
>> list.
>> +// CHECK: @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor
>> +// CHECK: @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor
>> +
>> // Test that we build the correct number of calls to cudaSetupArgument
>> followed
>> // by a call to cudaLaunch.
>>
>> @@ -11,3 +25,28 @@
>> // CHECK: call{{.*}}cudaSetupArgument
>> // CHECK: call{{.*}}cudaLaunch
>> __global__ void kernelfunc(int i, int j, int k) {}
>> +
>> +// Test that we've built correct kernel launch sequence.
>> +// CHECK: define{{.*}}hostfunc
>> +// CHECK: call{{.*}}cudaConfigureCall
>> +// CHEKC: call{{.*}}kernelfunc
>> +void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }
>> +
>> +// Test that we've built a function to register kernels
>> +// CHECK: define internal void @__cuda_register_kernels
>> +// CHECK: call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc
>> +
>> +// Test that we've built contructor..
>> +// CHECK: define internal void @__cuda_module_ctor
>> +// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)
>> +// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper
>> +// .. stores return value in __cuda_gpubin_handle
>> +// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle
>> +// .. and then calls __cuda_register_kernels
>> +// CHECK-NEXT: call void @__cuda_register_kernels
>> +
>> +// Test that we've created destructor.
>> +// CHECK: define internal void @__cuda_module_dtor
>> +// CHECK: load{{.*}}__cuda_gpubin_handle
>> +// CHECK-NEXT: call void @__cudaUnregisterFatBinary
>> +
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>
>
--
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150511/f14b1ad8/attachment.html>
More information about the cfe-commits
mailing list