r236765 - [cuda] Include GPU binary into host object file and generate init/deinit code.
Hubert Tong
hubert.reinterpretcast at gmail.com
Mon May 11 15:34:13 PDT 2015
Hi Artem,
The issue seems to be resolved now.
Thanks,
Hubert Tong
On Mon, May 11, 2015 at 1:40 PM, Artem Belevich <tra at google.com> wrote:
> Please check whether r237007 fixes the failure for you.
>
> --Artem
>
> On Mon, May 11, 2015 at 11:34 AM, Artem Belevich <tra at google.com> wrote:
>
>> It looks like on s390x strings come with 'align 2'. I'll remove align
>> from the CHECK constraint and that should fix the test failure on s390x.
>> I'll commit the fix shortly.
>>
>> --Artem
>>
>> On Mon, May 11, 2015 at 10:43 AM, Artem Belevich <tra at google.com> wrote:
>>
>>> Could you send me output of the CC1 executed by the test before it's
>>> piped into FileCheck?
>>>
>>> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1
>>> -internal-isystem /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include
>>> -nostdsysteminc -emit-llvm /gsa/tlbgsa-h1/08/hstong/pub/
>>> cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -fcuda-include-gpubinary
>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
>>> device-stub.cu -o -
>>>
>>> Oh, and I see a typo in the script -- "CHEKC: call{{.*}}kernelfunc",
>>> though it's probably not what breaks the test in your case.
>>>
>>> --Artem
>>>
>>> On Sat, May 9, 2015 at 11:10 AM, Hubert Tong <
>>> hubert.reinterpretcast at gmail.com> wrote:
>>>
>>>> Hi Artem,
>>>>
>>>> I am encountering a failure with device-stub.cu on s390x-suse-linux.
>>>> Can you take a look?
>>>>
>>>> *Output:*
>>>> FAIL: Clang :: CodeGenCUDA/device-stub.cu (1986 of 21893)
>>>> ******************** TEST 'Clang :: CodeGenCUDA/device-stub.cu' FAILED
>>>> ********************
>>>> Script:
>>>> --
>>>> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1
>>>> -internal-isystem
>>>> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include
>>>> -nostdsysteminc -emit-llvm
>>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
>>>> device-stub.cu -fcuda-include-gpubinary
>>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
>>>> device-stub.cu -o - |
>>>> /scratch/hstong/workdir/Release+Asserts/bin/FileCheck
>>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
>>>> device-stub.cu
>>>> --
>>>> Exit Code: 1
>>>>
>>>> Command Output (stderr):
>>>> --
>>>>
>>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu:7:11:
>>>> error: expected string not found in input
>>>> // CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00",
>>>> align 1
>>>> ^
>>>> <stdin>:1:1: note: scanning from here
>>>> ; ModuleID =
>>>> '/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/
>>>> device-stub.cu'
>>>> ^
>>>> <stdin>:13:298: note: possible intended match here
>>>> @1 = private unnamed_addr constant [2259 x i8] c"// RUN: %clang_cc1
>>>> -emit-llvm %s -fcuda-include-gpubinary %s -o - | FileCheck %s\0A\0A#include
>>>> \22Inputs/cuda.h\22\0A\0A// Make sure that all parts of GPU code
>>>> init/cleanup are there:\0A// * constant unnamed string with the kernel
>>>> name\0A// CHECK: private unnamed_addr
>>>> constant{{.*}}kernelfunc{{.*}}\5C00\22, align 1\0A// * constant unnamed
>>>> string with GPU binary\0A// CHECK: private unnamed_addr
>>>> constant{{.*}}\5C00\22\0A// * constant struct that wraps GPU binary\0A//
>>>> CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, i8* }
>>>> \0A// CHECK: { i32 1180844977, i32 1, {{.*}}, i8* null }\0A// * variable to
>>>> save GPU binary handle after initialization\0A// CHECK:
>>>> @__cuda_gpubin_handle = internal global i8** null\0A// * Make sure our
>>>> constructor/destructor was added to global ctor/dtor list.\0A// CHECK:
>>>> @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor\0A// CHECK:
>>>> @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor\0A\0A//
>>>> Test that we build the correct number of calls to cudaSetupArgument
>>>> followed\0A// by a call to cudaLaunch.\0A\0A// CHECK:
>>>> define{{.*}}kernelfunc\0A// CHECK: call{{.*}}cudaSetupArgument\0A// CHECK:
>>>> call{{.*}}cudaSetupArgument\0A// CHECK: call{{.*}}cudaSetupArgument\0A//
>>>> CHECK: call{{.*}}cudaLaunch\0A__global__ void kernelfunc(int i, int j, int
>>>> k) {}\0A\0A// Test that we've built correct kernel launch sequence.\0A//
>>>> CHECK: define{{.*}}hostfunc\0A// CHECK: call{{.*}}cudaConfigureCall\0A//
>>>> CHEKC: call{{.*}}kernelfunc\0Avoid hostfunc(void) { kernelfunc<<<1, 1>>>(1,
>>>> 1, 1); }\0A\0A// Test that we've built a function to register kernels\0A//
>>>> CHECK: define internal void @__cuda_register_kernels\0A// CHECK:
>>>> call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc\0A\0A// Test that
>>>> we've built contructor..\0A// CHECK: define internal void
>>>> @__cuda_module_ctor\0A// .. that calls
>>>> __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)\0A// CHECK:
>>>> call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper\0A// .. stores
>>>> return value in __cuda_gpubin_handle\0A// CHECK-NEXT:
>>>> store{{.*}}__cuda_gpubin_handle\0A// .. and then calls
>>>> __cuda_register_kernels\0A// CHECK-NEXT: call void
>>>> @__cuda_register_kernels\0A\0A// Test that we've created destructor.\0A//
>>>> CHECK: define internal void @__cuda_module_dtor\0A// CHECK:
>>>> load{{.*}}__cuda_gpubin_handle\0A// CHECK-NEXT: call void
>>>> @__cudaUnregisterFatBinary\0A\0A\00", align 2
>>>>
>>>> ^
>>>>
>>>> --
>>>>
>>>> ********************
>>>>
>>>> *Build environment info:*
>>>> > g++ -v
>>>> Using built-in specs.
>>>> COLLECT_GCC=g++
>>>> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/s390x-suse-linux/4.8/lto-wrapper
>>>> Target: s390x-suse-linux
>>>> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
>>>> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
>>>> --enable-languages=c,c++,objc,fortran,obj-c++,java
>>>> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8
>>>> --enable-ssp --disable-libssp --disable-plugin --with-bugurl=
>>>> http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux'
>>>> --disable-libgcj --disable-libmudflap --with-slibdir=/lib64
>>>> --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new
>>>> --disable-libstdcxx-pch --enable-version-specific-runtime-libs
>>>> --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8
>>>> --without-system-libunwind --with-tune=zEC12 --with-arch=z196
>>>> --with-long-double-128 --enable-decimal-float --build=s390x-suse-linux
>>>> --host=s390x-suse-linux
>>>> Thread model: posix
>>>> gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE
>>>> Linux)
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Hubert Tong
>>>>
>>>> On Thu, May 7, 2015 at 2:34 PM, Artem Belevich <tra at google.com> wrote:
>>>>
>>>>> Author: tra
>>>>> Date: Thu May 7 14:34:16 2015
>>>>> New Revision: 236765
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=236765&view=rev
>>>>> Log:
>>>>> [cuda] Include GPU binary into host object file and generate
>>>>> init/deinit code.
>>>>>
>>>>> - added -fcuda-include-gpubinary option to incorporate results of
>>>>> device-side compilation into host-side one.
>>>>> - generate code to register GPU binaries and associated kernels
>>>>> with CUDA runtime and clean-up on exit.
>>>>> - added test case for init/deinit code generation.
>>>>>
>>>>> Differential Revision: http://reviews.llvm.org/D9507
>>>>>
>>>>> Modified:
>>>>> cfe/trunk/include/clang/Driver/CC1Options.td
>>>>> cfe/trunk/include/clang/Frontend/CodeGenOptions.h
>>>>> cfe/trunk/lib/CodeGen/CGCUDANV.cpp
>>>>> cfe/trunk/lib/CodeGen/CGCUDARuntime.h
>>>>> cfe/trunk/lib/CodeGen/CodeGenFunction.cpp
>>>>> cfe/trunk/lib/CodeGen/CodeGenModule.cpp
>>>>> cfe/trunk/lib/Frontend/CompilerInvocation.cpp
>>>>> cfe/trunk/test/CodeGenCUDA/device-stub.cu
>>>>>
>>>>> Modified: cfe/trunk/include/clang/Driver/CC1Options.td
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/include/clang/Driver/CC1Options.td (original)
>>>>> +++ cfe/trunk/include/clang/Driver/CC1Options.td Thu May 7 14:34:16
>>>>> 2015
>>>>> @@ -631,6 +631,8 @@ def fcuda_allow_host_calls_from_host_dev
>>>>> def fcuda_disable_target_call_checks : Flag<["-"],
>>>>> "fcuda-disable-target-call-checks">,
>>>>> HelpText<"Disable all cross-target (host, device, etc.) call checks
>>>>> in CUDA">;
>>>>> +def fcuda_include_gpubinary : Separate<["-"],
>>>>> "fcuda-include-gpubinary">,
>>>>> + HelpText<"Incorporate CUDA device-side binary into host object
>>>>> file.">;
>>>>>
>>>>> } // let Flags = [CC1Option]
>>>>>
>>>>>
>>>>> Modified: cfe/trunk/include/clang/Frontend/CodeGenOptions.h
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Frontend/CodeGenOptions.h?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/include/clang/Frontend/CodeGenOptions.h (original)
>>>>> +++ cfe/trunk/include/clang/Frontend/CodeGenOptions.h Thu May 7
>>>>> 14:34:16 2015
>>>>> @@ -163,6 +163,11 @@ public:
>>>>> /// Name of the profile file to use as input for -fprofile-instr-use
>>>>> std::string InstrProfileInput;
>>>>>
>>>>> + /// A list of file names passed with -fcuda-include-gpubinary
>>>>> options to
>>>>> + /// forward to CUDA runtime back-end for incorporating them into
>>>>> host-side
>>>>> + /// object file.
>>>>> + std::vector<std::string> CudaGpuBinaryFileNames;
>>>>> +
>>>>> /// Regular expression to select optimizations for which we should
>>>>> enable
>>>>> /// optimization remarks. Transformation passes whose name matches
>>>>> this
>>>>> /// expression (and support this feature), will emit a diagnostic
>>>>>
>>>>> Modified: cfe/trunk/lib/CodeGen/CGCUDANV.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDANV.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/lib/CodeGen/CGCUDANV.cpp (original)
>>>>> +++ cfe/trunk/lib/CodeGen/CGCUDANV.cpp Thu May 7 14:34:16 2015
>>>>> @@ -20,7 +20,6 @@
>>>>> #include "llvm/IR/CallSite.h"
>>>>> #include "llvm/IR/Constants.h"
>>>>> #include "llvm/IR/DerivedTypes.h"
>>>>> -#include <vector>
>>>>>
>>>>> using namespace clang;
>>>>> using namespace CodeGen;
>>>>> @@ -30,29 +29,66 @@ namespace {
>>>>> class CGNVCUDARuntime : public CGCUDARuntime {
>>>>>
>>>>> private:
>>>>> - llvm::Type *IntTy, *SizeTy;
>>>>> - llvm::PointerType *CharPtrTy, *VoidPtrTy;
>>>>> + llvm::Type *IntTy, *SizeTy, *VoidTy;
>>>>> + llvm::PointerType *CharPtrTy, *VoidPtrTy, *VoidPtrPtrTy;
>>>>> +
>>>>> + /// Convenience reference to LLVM Context
>>>>> + llvm::LLVMContext &Context;
>>>>> + /// Convenience reference to the current module
>>>>> + llvm::Module &TheModule;
>>>>> + /// Keeps track of kernel launch stubs emitted in this module
>>>>> + llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
>>>>> + /// Keeps track of variables containing handles of GPU binaries.
>>>>> Populated by
>>>>> + /// ModuleCtorFunction() and used to create corresponding cleanup
>>>>> calls in
>>>>> + /// ModuleDtorFunction()
>>>>> + llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles;
>>>>>
>>>>> llvm::Constant *getSetupArgumentFn() const;
>>>>> llvm::Constant *getLaunchFn() const;
>>>>>
>>>>> + /// Creates a function to register all kernel stubs generated in
>>>>> this module.
>>>>> + llvm::Function *makeRegisterKernelsFn();
>>>>> +
>>>>> + /// Helper function that generates a constant string and returns a
>>>>> pointer to
>>>>> + /// the start of the string. The result of this function can be
>>>>> used anywhere
>>>>> + /// where the C code specifies const char*.
>>>>> + llvm::Constant *makeConstantString(const std::string &Str,
>>>>> + const std::string &Name = "",
>>>>> + unsigned Alignment = 0) {
>>>>> + llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0),
>>>>> + llvm::ConstantInt::get(SizeTy, 0)};
>>>>> + auto *ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str());
>>>>> + return
>>>>> llvm::ConstantExpr::getGetElementPtr(ConstStr->getValueType(),
>>>>> + ConstStr, Zeros);
>>>>> + }
>>>>> +
>>>>> + void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList
>>>>> &Args);
>>>>> +
>>>>> public:
>>>>> CGNVCUDARuntime(CodeGenModule &CGM);
>>>>>
>>>>> - void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList
>>>>> &Args) override;
>>>>> + void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args)
>>>>> override;
>>>>> + /// Creates module constructor function
>>>>> + llvm::Function *makeModuleCtorFunction() override;
>>>>> + /// Creates module destructor function
>>>>> + llvm::Function *makeModuleDtorFunction() override;
>>>>> };
>>>>>
>>>>> }
>>>>>
>>>>> -CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) :
>>>>> CGCUDARuntime(CGM) {
>>>>> +CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
>>>>> + : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
>>>>> + TheModule(CGM.getModule()) {
>>>>> CodeGen::CodeGenTypes &Types = CGM.getTypes();
>>>>> ASTContext &Ctx = CGM.getContext();
>>>>>
>>>>> IntTy = Types.ConvertType(Ctx.IntTy);
>>>>> SizeTy = Types.ConvertType(Ctx.getSizeType());
>>>>> + VoidTy = llvm::Type::getVoidTy(Context);
>>>>>
>>>>> CharPtrTy =
>>>>> llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));
>>>>> VoidPtrTy =
>>>>> cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));
>>>>> + VoidPtrPtrTy = VoidPtrTy->getPointerTo();
>>>>> }
>>>>>
>>>>> llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {
>>>>> @@ -68,14 +104,17 @@ llvm::Constant *CGNVCUDARuntime::getSetu
>>>>>
>>>>> llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {
>>>>> // cudaError_t cudaLaunch(char *)
>>>>> - std::vector<llvm::Type*> Params;
>>>>> - Params.push_back(CharPtrTy);
>>>>> - return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,
>>>>> - Params,
>>>>> false),
>>>>> - "cudaLaunch");
>>>>> + return CGM.CreateRuntimeFunction(
>>>>> + llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");
>>>>> +}
>>>>> +
>>>>> +void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
>>>>> + FunctionArgList &Args) {
>>>>> + EmittedKernels.push_back(CGF.CurFn);
>>>>> + emitDeviceStubBody(CGF, Args);
>>>>> }
>>>>>
>>>>> -void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF,
>>>>> +void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF,
>>>>> FunctionArgList &Args) {
>>>>> // Build the argument value list and the argument stack struct type.
>>>>> SmallVector<llvm::Value *, 16> ArgValues;
>>>>> @@ -87,8 +126,7 @@ void CGNVCUDARuntime::EmitDeviceStubBody
>>>>> assert(isa<llvm::PointerType>(V->getType()) && "Arg type not
>>>>> PointerType");
>>>>>
>>>>> ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType());
>>>>> }
>>>>> - llvm::StructType *ArgStackTy = llvm::StructType::get(
>>>>> - CGF.getLLVMContext(), ArgTypes);
>>>>> + llvm::StructType *ArgStackTy = llvm::StructType::get(Context,
>>>>> ArgTypes);
>>>>>
>>>>> llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");
>>>>>
>>>>> @@ -120,6 +158,160 @@ void CGNVCUDARuntime::EmitDeviceStubBody
>>>>> CGF.EmitBlock(EndBlock);
>>>>> }
>>>>>
>>>>> +/// Creates internal function to register all kernel stubs generated
>>>>> in this
>>>>> +/// module with the CUDA runtime.
>>>>> +/// \code
>>>>> +/// void __cuda_register_kernels(void** GpuBinaryHandle) {
>>>>> +/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...);
>>>>> +/// ...
>>>>> +/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...);
>>>>> +/// }
>>>>> +/// \endcode
>>>>> +llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() {
>>>>> + llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
>>>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
>>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_register_kernels",
>>>>> &TheModule);
>>>>> + llvm::BasicBlock *EntryBB =
>>>>> + llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
>>>>> + CGBuilderTy Builder(Context);
>>>>> + Builder.SetInsertPoint(EntryBB);
>>>>> +
>>>>> + // void __cudaRegisterFunction(void **, const char *, char *, const
>>>>> char *,
>>>>> + // int, uint3*, uint3*, dim3*, dim3*,
>>>>> int*)
>>>>> + std::vector<llvm::Type *> RegisterFuncParams = {
>>>>> + VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,
>>>>> + VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy,
>>>>> IntTy->getPointerTo()};
>>>>> + llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(
>>>>> + llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
>>>>> + "__cudaRegisterFunction");
>>>>> +
>>>>> + // Extract GpuBinaryHandle passed as the first argument passed to
>>>>> + // __cuda_register_kernels() and generate __cudaRegisterFunction()
>>>>> call for
>>>>> + // each emitted kernel.
>>>>> + llvm::Argument &GpuBinaryHandlePtr =
>>>>> *RegisterKernelsFunc->arg_begin();
>>>>> + for (llvm::Function *Kernel : EmittedKernels) {
>>>>> + llvm::Constant *KernelName =
>>>>> makeConstantString(Kernel->getName());
>>>>> + llvm::Constant *NullPtr =
>>>>> llvm::ConstantPointerNull::get(VoidPtrTy);
>>>>> + llvm::Value *args[] = {
>>>>> + &GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),
>>>>> + KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1),
>>>>> NullPtr,
>>>>> + NullPtr, NullPtr, NullPtr,
>>>>> + llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
>>>>> + Builder.CreateCall(RegisterFunc, args);
>>>>> + }
>>>>> +
>>>>> + Builder.CreateRetVoid();
>>>>> + return RegisterKernelsFunc;
>>>>> +}
>>>>> +
>>>>> +/// Creates a global constructor function for the module:
>>>>> +/// \code
>>>>> +/// void __cuda_module_ctor(void*) {
>>>>> +/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0);
>>>>> +/// __cuda_register_kernels(Handle0);
>>>>> +/// ...
>>>>> +/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN);
>>>>> +/// __cuda_register_kernels(HandleN);
>>>>> +/// }
>>>>> +/// \endcode
>>>>> +llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
>>>>> + // void __cuda_register_kernels(void* handle);
>>>>> + llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn();
>>>>> + // void ** __cudaRegisterFatBinary(void *);
>>>>> + llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
>>>>> + llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
>>>>> + "__cudaRegisterFatBinary");
>>>>> + // struct { int magic, int version, void * gpu_binary, void *
>>>>> dont_care };
>>>>> + llvm::StructType *FatbinWrapperTy =
>>>>> + llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy,
>>>>> nullptr);
>>>>> +
>>>>> + llvm::Function *ModuleCtorFunc = llvm::Function::Create(
>>>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
>>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor",
>>>>> &TheModule);
>>>>> + llvm::BasicBlock *CtorEntryBB =
>>>>> + llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);
>>>>> + CGBuilderTy CtorBuilder(Context);
>>>>> +
>>>>> + CtorBuilder.SetInsertPoint(CtorEntryBB);
>>>>> +
>>>>> + // For each GPU binary, register it with the CUDA runtime and store
>>>>> returned
>>>>> + // handle in a global variable and save the handle in
>>>>> GpuBinaryHandles vector
>>>>> + // to be cleaned up in destructor on exit. Then associate all known
>>>>> kernels
>>>>> + // with the GPU binary handle so CUDA runtime can figure out what
>>>>> to call on
>>>>> + // the GPU side.
>>>>> + for (const std::string &GpuBinaryFileName :
>>>>> + CGM.getCodeGenOpts().CudaGpuBinaryFileNames) {
>>>>> + llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr
>>>>> =
>>>>> + llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);
>>>>> + if (std::error_code EC = GpuBinaryOrErr.getError()) {
>>>>> + CGM.getDiags().Report(diag::err_cannot_open_file) <<
>>>>> GpuBinaryFileName
>>>>> + <<
>>>>> EC.message();
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + // Create initialized wrapper structure that points to the loaded
>>>>> GPU binary
>>>>> + llvm::Constant *Values[] = {
>>>>> + llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper
>>>>> magic.
>>>>> + llvm::ConstantInt::get(IntTy, 1), // Fatbin version.
>>>>> + makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "",
>>>>> 16), // Data.
>>>>> + llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in
>>>>> fatbin v1.
>>>>> + llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable(
>>>>> + TheModule, FatbinWrapperTy, true,
>>>>> llvm::GlobalValue::InternalLinkage,
>>>>> + llvm::ConstantStruct::get(FatbinWrapperTy, Values),
>>>>> + "__cuda_fatbin_wrapper");
>>>>> +
>>>>> + // GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
>>>>> + llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
>>>>> + RegisterFatbinFunc,
>>>>> + CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
>>>>> + llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable(
>>>>> + TheModule, VoidPtrPtrTy, false,
>>>>> llvm::GlobalValue::InternalLinkage,
>>>>> + llvm::ConstantPointerNull::get(VoidPtrPtrTy),
>>>>> "__cuda_gpubin_handle");
>>>>> + CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle,
>>>>> false);
>>>>> +
>>>>> + // Call __cuda_register_kernels(GpuBinaryHandle);
>>>>> + CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall);
>>>>> +
>>>>> + // Save GpuBinaryHandle so we can unregister it in destructor.
>>>>> + GpuBinaryHandles.push_back(GpuBinaryHandle);
>>>>> + }
>>>>> +
>>>>> + CtorBuilder.CreateRetVoid();
>>>>> + return ModuleCtorFunc;
>>>>> +}
>>>>> +
>>>>> +/// Creates a global destructor function that unregisters all GPU
>>>>> code blobs
>>>>> +/// registered by constructor.
>>>>> +/// \code
>>>>> +/// void __cuda_module_dtor(void*) {
>>>>> +/// __cudaUnregisterFatBinary(Handle0);
>>>>> +/// ...
>>>>> +/// __cudaUnregisterFatBinary(HandleN);
>>>>> +/// }
>>>>> +/// \endcode
>>>>> +llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {
>>>>> + // void __cudaUnregisterFatBinary(void ** handle);
>>>>> + llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(
>>>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
>>>>> + "__cudaUnregisterFatBinary");
>>>>> +
>>>>> + llvm::Function *ModuleDtorFunc = llvm::Function::Create(
>>>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
>>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor",
>>>>> &TheModule);
>>>>> + llvm::BasicBlock *DtorEntryBB =
>>>>> + llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);
>>>>> + CGBuilderTy DtorBuilder(Context);
>>>>> + DtorBuilder.SetInsertPoint(DtorEntryBB);
>>>>> +
>>>>> + for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) {
>>>>> + DtorBuilder.CreateCall(UnregisterFatbinFunc,
>>>>> + DtorBuilder.CreateLoad(GpuBinaryHandle,
>>>>> false));
>>>>> + }
>>>>> +
>>>>> + DtorBuilder.CreateRetVoid();
>>>>> + return ModuleDtorFunc;
>>>>> +}
>>>>> +
>>>>> CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
>>>>> return new CGNVCUDARuntime(CGM);
>>>>> }
>>>>>
>>>>> Modified: cfe/trunk/lib/CodeGen/CGCUDARuntime.h
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDARuntime.h?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/lib/CodeGen/CGCUDARuntime.h (original)
>>>>> +++ cfe/trunk/lib/CodeGen/CGCUDARuntime.h Thu May 7 14:34:16 2015
>>>>> @@ -16,6 +16,10 @@
>>>>> #ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
>>>>> #define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H
>>>>>
>>>>> +namespace llvm {
>>>>> +class Function;
>>>>> +}
>>>>> +
>>>>> namespace clang {
>>>>>
>>>>> class CUDAKernelCallExpr;
>>>>> @@ -39,10 +43,17 @@ public:
>>>>> virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
>>>>> const CUDAKernelCallExpr *E,
>>>>> ReturnValueSlot ReturnValue);
>>>>> -
>>>>> - virtual void EmitDeviceStubBody(CodeGenFunction &CGF,
>>>>> - FunctionArgList &Args) = 0;
>>>>>
>>>>> + /// Emits a kernel launch stub.
>>>>> + virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList
>>>>> &Args) = 0;
>>>>> +
>>>>> + /// Constructs and returns a module initialization function or
>>>>> nullptr if it's
>>>>> + /// not needed. Must be called after all kernels have been emitted.
>>>>> + virtual llvm::Function *makeModuleCtorFunction() = 0;
>>>>> +
>>>>> + /// Returns a module cleanup function or nullptr if it's not needed.
>>>>> + /// Must be called after ModuleCtorFunction
>>>>> + virtual llvm::Function *makeModuleDtorFunction() = 0;
>>>>> };
>>>>>
>>>>> /// Creates an instance of a CUDA runtime class.
>>>>>
>>>>> Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original)
>>>>> +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Thu May 7 14:34:16 2015
>>>>> @@ -878,7 +878,7 @@ void CodeGenFunction::GenerateCode(Globa
>>>>> else if (getLangOpts().CUDA &&
>>>>> !getLangOpts().CUDAIsDevice &&
>>>>> FD->hasAttr<CUDAGlobalAttr>())
>>>>> - CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args);
>>>>> + CGM.getCUDARuntime().emitDeviceStub(*this, Args);
>>>>> else if (isa<CXXConversionDecl>(FD) &&
>>>>>
>>>>> cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) {
>>>>> // The lambda conversion to block pointer is special; the
>>>>> semantics can't be
>>>>>
>>>>> Modified: cfe/trunk/lib/CodeGen/CodeGenModule.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenModule.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/lib/CodeGen/CodeGenModule.cpp (original)
>>>>> +++ cfe/trunk/lib/CodeGen/CodeGenModule.cpp Thu May 7 14:34:16 2015
>>>>> @@ -350,6 +350,13 @@ void CodeGenModule::Release() {
>>>>> if (ObjCRuntime)
>>>>> if (llvm::Function *ObjCInitFunction =
>>>>> ObjCRuntime->ModuleInitFunction())
>>>>> AddGlobalCtor(ObjCInitFunction);
>>>>> + if (Context.getLangOpts().CUDA &&
>>>>> !Context.getLangOpts().CUDAIsDevice &&
>>>>> + CUDARuntime) {
>>>>> + if (llvm::Function *CudaCtorFunction =
>>>>> CUDARuntime->makeModuleCtorFunction())
>>>>> + AddGlobalCtor(CudaCtorFunction);
>>>>> + if (llvm::Function *CudaDtorFunction =
>>>>> CUDARuntime->makeModuleDtorFunction())
>>>>> + AddGlobalDtor(CudaDtorFunction);
>>>>> + }
>>>>> if (PGOReader && PGOStats.hasDiagnostics())
>>>>> PGOStats.reportDiagnostics(getDiags(),
>>>>> getCodeGenOpts().MainFileName);
>>>>> EmitCtorList(GlobalCtors, "llvm.global_ctors");
>>>>> @@ -3678,4 +3685,3 @@ void CodeGenModule::EmitOMPThreadPrivate
>>>>> CXXGlobalInits.push_back(InitFunction);
>>>>> }
>>>>> }
>>>>> -
>>>>>
>>>>> Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original)
>>>>> +++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Thu May 7 14:34:16
>>>>> 2015
>>>>> @@ -651,6 +651,9 @@ static bool ParseCodeGenArgs(CodeGenOpti
>>>>> Args.getAllArgValues(OPT_fsanitize_recover_EQ),
>>>>> Diags,
>>>>> Opts.SanitizeRecover);
>>>>>
>>>>> + Opts.CudaGpuBinaryFileNames =
>>>>> + Args.getAllArgValues(OPT_fcuda_include_gpubinary);
>>>>> +
>>>>> return Success;
>>>>> }
>>>>>
>>>>>
>>>>> Modified: cfe/trunk/test/CodeGenCUDA/device-stub.cu
>>>>> URL:
>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-stub.cu?rev=236765&r1=236764&r2=236765&view=diff
>>>>>
>>>>> ==============================================================================
>>>>> --- cfe/trunk/test/CodeGenCUDA/device-stub.cu (original)
>>>>> +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu Thu May 7 14:34:16 2015
>>>>> @@ -1,7 +1,21 @@
>>>>> -// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s
>>>>> +// RUN: %clang_cc1 -emit-llvm %s -fcuda-include-gpubinary %s -o - |
>>>>> FileCheck %s
>>>>>
>>>>> #include "Inputs/cuda.h"
>>>>>
>>>>> +// Make sure that all parts of GPU code init/cleanup are there:
>>>>> +// * constant unnamed string with the kernel name
>>>>> +// CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00",
>>>>> align 1
>>>>> +// * constant unnamed string with GPU binary
>>>>> +// CHECK: private unnamed_addr constant{{.*}}\00"
>>>>> +// * constant struct that wraps GPU binary
>>>>> +// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*,
>>>>> i8* }
>>>>> +// CHECK: { i32 1180844977, i32 1, {{.*}}, i64 0, i64 0), i8*
>>>>> null }
>>>>> +// * variable to save GPU binary handle after initialization
>>>>> +// CHECK: @__cuda_gpubin_handle = internal global i8** null
>>>>> +// * Make sure our constructor/destructor was added to global
>>>>> ctor/dtor list.
>>>>> +// CHECK: @llvm.global_ctors = appending global
>>>>> {{.*}}@__cuda_module_ctor
>>>>> +// CHECK: @llvm.global_dtors = appending global
>>>>> {{.*}}@__cuda_module_dtor
>>>>> +
>>>>> // Test that we build the correct number of calls to
>>>>> cudaSetupArgument followed
>>>>> // by a call to cudaLaunch.
>>>>>
>>>>> @@ -11,3 +25,28 @@
>>>>> // CHECK: call{{.*}}cudaSetupArgument
>>>>> // CHECK: call{{.*}}cudaLaunch
>>>>> __global__ void kernelfunc(int i, int j, int k) {}
>>>>> +
>>>>> +// Test that we've built correct kernel launch sequence.
>>>>> +// CHECK: define{{.*}}hostfunc
>>>>> +// CHECK: call{{.*}}cudaConfigureCall
>>>>> +// CHEKC: call{{.*}}kernelfunc
>>>>> +void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }
>>>>> +
>>>>> +// Test that we've built a function to register kernels
>>>>> +// CHECK: define internal void @__cuda_register_kernels
>>>>> +// CHECK: call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc
>>>>> +
>>>>> +// Test that we've built contructor..
>>>>> +// CHECK: define internal void @__cuda_module_ctor
>>>>> +// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)
>>>>> +// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper
>>>>> +// .. stores return value in __cuda_gpubin_handle
>>>>> +// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle
>>>>> +// .. and then calls __cuda_register_kernels
>>>>> +// CHECK-NEXT: call void @__cuda_register_kernels
>>>>> +
>>>>> +// Test that we've created destructor.
>>>>> +// CHECK: define internal void @__cuda_module_dtor
>>>>> +// CHECK: load{{.*}}__cuda_gpubin_handle
>>>>> +// CHECK-NEXT: call void @__cudaUnregisterFatBinary
>>>>> +
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> cfe-commits mailing list
>>>>> cfe-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> --Artem Belevich
>>>
>>
>>
>>
>> --
>> --Artem Belevich
>>
>
>
>
> --
> --Artem Belevich
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150511/da1613a1/attachment.html>
More information about the cfe-commits
mailing list