[polly] r309802 - [GPUJIT] Teach GPUJIT to use a pre-existing CUDA context if available.
Siddharth Bhat via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 2 02:19:42 PDT 2017
Author: bollu
Date: Wed Aug 2 02:19:42 2017
New Revision: 309802
URL: http://llvm.org/viewvc/llvm-project?rev=309802&view=rev
Log:
[GPUJIT] Teach GPUJIT to use a pre-existing CUDA context if available.
On mixing the driver and runtime APIs, it is quite possible that a
context already exists due to runtime API usage. In this case, Polly should
try to use the same context.
This patch teaches GPUJIT to detect that a context exists and how to
pick up this context.
Without this, calling `cudaMallocManaged`, for example, before a
polly-generated kernel launch causes P100 to *hang*.
This is a part of (https://reviews.llvm.org/D35991) that was extracted
out.
Differential Revision: https://reviews.llvm.org/D36162
Modified:
polly/trunk/tools/GPURuntime/GPUJIT.c
Modified: polly/trunk/tools/GPURuntime/GPUJIT.c
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/tools/GPURuntime/GPUJIT.c?rev=309802&r1=309801&r2=309802&view=diff
==============================================================================
--- polly/trunk/tools/GPURuntime/GPUJIT.c (original)
+++ polly/trunk/tools/GPURuntime/GPUJIT.c Wed Aug 2 02:19:42 2017
@@ -973,6 +973,9 @@ static CuDeviceGetCountFcnTy *CuDeviceGe
typedef CUresult CUDAAPI CuCtxCreateFcnTy(CUcontext *, unsigned int, CUdevice);
static CuCtxCreateFcnTy *CuCtxCreateFcnPtr;
+typedef CUresult CUDAAPI CuCtxGetCurrentFcnTy(CUcontext *);
+static CuCtxGetCurrentFcnTy *CuCtxGetCurrentFcnPtr;
+
typedef CUresult CUDAAPI CuDeviceGetFcnTy(CUdevice *, int);
static CuDeviceGetFcnTy *CuDeviceGetFcnPtr;
@@ -1105,6 +1108,9 @@ static int initialDeviceAPIsCUDA() {
CuCtxCreateFcnPtr =
(CuCtxCreateFcnTy *)getAPIHandleCUDA(HandleCuda, "cuCtxCreate_v2");
+ CuCtxGetCurrentFcnPtr =
+ (CuCtxGetCurrentFcnTy *)getAPIHandleCUDA(HandleCuda, "cuCtxGetCurrent");
+
CuModuleLoadDataExFcnPtr = (CuModuleLoadDataExFcnTy *)getAPIHandleCUDA(
HandleCuda, "cuModuleLoadDataEx");
@@ -1194,7 +1200,33 @@ static PollyGPUContext *initContextCUDA(
fprintf(stderr, "Allocate memory for Polly CUDA context failed.\n");
exit(-1);
}
- CuCtxCreateFcnPtr(&(((CUDAContext *)Context->Context)->Cuda), 0, Device);
+
+ // In cases where managed memory is used, it is quite likely that
+ // `cudaMallocManaged` / `polly_mallocManaged` was called before
+ // `polly_initContext` was called.
+ //
+ // If `polly_initContext` calls `CuCtxCreate` when there already was a
+ // pre-existing context created by the runtime API, this causes code running
+ // on P100 to hang. So, we query for a pre-existing context to try and use.
+ // If there is no pre-existing context, we create a new context
+
+ // The possible pre-existing context from previous runtime API calls.
+ CUcontext MaybeRuntimeAPIContext;
+ if (CuCtxGetCurrentFcnPtr(&MaybeRuntimeAPIContext) != CUDA_SUCCESS) {
+ fprintf(stderr, "cuCtxGetCurrent failed.\n");
+ exit(-1);
+ }
+
+ // There was no previous context, initialise it.
+ if (MaybeRuntimeAPIContext == NULL) {
+ if (CuCtxCreateFcnPtr(&(((CUDAContext *)Context->Context)->Cuda), 0,
+ Device) != CUDA_SUCCESS) {
+ fprintf(stderr, "cuCtxCreateFcnPtr failed.\n");
+ exit(-1);
+ }
+ } else {
+ ((CUDAContext *)Context->Context)->Cuda = MaybeRuntimeAPIContext;
+ }
if (CacheMode)
CurrentContext = Context;
More information about the llvm-commits
mailing list