[polly] r309802 - [GPUJIT] Teach GPUJIT to use a pre-existing CUDA context if available.

Wed Aug 2 02:19:42 PDT 2017

Author: bollu
Date: Wed Aug  2 02:19:42 2017
New Revision: 309802

URL: http://llvm.org/viewvc/llvm-project?rev=309802&view=rev
Log:
[GPUJIT] Teach GPUJIT to use a pre-existing CUDA context if available.

On mixing the driver and runtime APIs, it is quite possible that a
context already exists due to runtime API usage. In this case, Polly should
try to use the same context.

This patch teaches GPUJIT to detect that a context exists and how to
pick up this context.

Without this, calling `cudaMallocManaged`, for example, before a
polly-generated kernel launch causes P100 to *hang*.

This is a part of (https://reviews.llvm.org/D35991) that was extracted
out.

Differential Revision: https://reviews.llvm.org/D36162

Modified:
    polly/trunk/tools/GPURuntime/GPUJIT.c

Modified: polly/trunk/tools/GPURuntime/GPUJIT.c
URL: http://llvm.org/viewvc/llvm-project/polly/trunk/tools/GPURuntime/GPUJIT.c?rev=309802&r1=309801&r2=309802&view=diff
==============================================================================

--- polly/trunk/tools/GPURuntime/GPUJIT.c (original)
+++ polly/trunk/tools/GPURuntime/GPUJIT.c Wed Aug  2 02:19:42 2017
@@ -973,6 +973,9 @@ static CuDeviceGetCountFcnTy *CuDeviceGe
 typedef CUresult CUDAAPI CuCtxCreateFcnTy(CUcontext *, unsigned int, CUdevice);
 static CuCtxCreateFcnTy *CuCtxCreateFcnPtr;
 
+typedef CUresult CUDAAPI CuCtxGetCurrentFcnTy(CUcontext *);
+static CuCtxGetCurrentFcnTy *CuCtxGetCurrentFcnPtr;
+
 typedef CUresult CUDAAPI CuDeviceGetFcnTy(CUdevice *, int);
 static CuDeviceGetFcnTy *CuDeviceGetFcnPtr;
 
@@ -1105,6 +1108,9 @@ static int initialDeviceAPIsCUDA() {
   CuCtxCreateFcnPtr =
       (CuCtxCreateFcnTy *)getAPIHandleCUDA(HandleCuda, "cuCtxCreate_v2");
 
+  CuCtxGetCurrentFcnPtr =
+      (CuCtxGetCurrentFcnTy *)getAPIHandleCUDA(HandleCuda, "cuCtxGetCurrent");
+
   CuModuleLoadDataExFcnPtr = (CuModuleLoadDataExFcnTy *)getAPIHandleCUDA(
       HandleCuda, "cuModuleLoadDataEx");
 
@@ -1194,7 +1200,33 @@ static PollyGPUContext *initContextCUDA(
     fprintf(stderr, "Allocate memory for Polly CUDA context failed.\n");
     exit(-1);
   }
-  CuCtxCreateFcnPtr(&(((CUDAContext *)Context->Context)->Cuda), 0, Device);
+
+  // In cases where managed memory is used, it is quite likely that
+  // `cudaMallocManaged` / `polly_mallocManaged` was called before
+  // `polly_initContext` was called.
+  //
+  // If `polly_initContext` calls `CuCtxCreate` when there already was a
+  // pre-existing context created by the runtime API, this causes code running
+  // on P100 to hang. So, we query for a pre-existing context to try and use.
+  // If there is no pre-existing context, we create a new context
+
+  // The possible pre-existing context from previous runtime API calls.
+  CUcontext MaybeRuntimeAPIContext;
+  if (CuCtxGetCurrentFcnPtr(&MaybeRuntimeAPIContext) != CUDA_SUCCESS) {
+    fprintf(stderr, "cuCtxGetCurrent failed.\n");
+    exit(-1);
+  }
+
+  // There was no previous context, initialise it.
+  if (MaybeRuntimeAPIContext == NULL) {
+    if (CuCtxCreateFcnPtr(&(((CUDAContext *)Context->Context)->Cuda), 0,
+                          Device) != CUDA_SUCCESS) {
+      fprintf(stderr, "cuCtxCreateFcnPtr failed.\n");
+      exit(-1);
+    }
+  } else {
+    ((CUDAContext *)Context->Context)->Cuda = MaybeRuntimeAPIContext;
+  }
 
   if (CacheMode)
     CurrentContext = Context;