[Mlir-commits] [mlir] mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin (PR #65779)

Fri Sep 8 10:03:41 PDT 2023

https://github.com/rohany created https://github.com/llvm/llvm-project/pull/65779:

This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly.

Differential Revision: https://reviews.llvm.org/D159487

>From 91bcd17f10fd3f88e5efea494b0a4278a2d6a4d5 Mon Sep 17 00:00:00 2001
From: Rohan Yadav <rohany at cs.stanford.edu>
Date: Fri, 8 Sep 2023 09:55:24 -0700
Subject: [PATCH] mlir/lib/Dialect/GPU/Transforms: improve context management
 in SerializeToCubin

This commit adjusts the CUDA context management in the SerializeToCubin pass.
In particular, it uses the device 0 primary context instead of creating a new
CUDA context on each invocation of SerializeToCubin. This yields very large
improvements in compile time, especially if an application (like a JIT compiler)
is calling SerializeToCubin repeatedly.

Differential Revision: https://reviews.llvm.org/D159487
---
 mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp b/mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp
index 44a14024e9fefbf..0f340cbe98c60ea 100644
--- a/mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp
@@ -95,7 +95,11 @@ SerializeToCubinPass::serializeISA(const std::string &isa) {
   CUdevice device;
   RETURN_ON_CUDA_ERROR(cuDeviceGet(&device, 0));
   CUcontext context;
-  RETURN_ON_CUDA_ERROR(cuCtxCreate(&context, 0, device));
+  // Use the primary context.
+  RETURN_ON_CUDA_ERROR(cuDevicePrimaryCtxRetain(&context, device));
+  // Push the primary context so that the next CUDA operations
+  // actually use it.
+  RETURN_ON_CUDA_ERROR(cuCtxPushCurrent(context));
   CUlinkState linkState;
 
   CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,
@@ -127,7 +131,10 @@ SerializeToCubinPass::serializeISA(const std::string &isa) {
 
   // This will also destroy the cubin data.
   RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));
-  RETURN_ON_CUDA_ERROR(cuCtxDestroy(context));
+  // Pop and release the primary context.
+  CUcontext poppedContext;
+  RETURN_ON_CUDA_ERROR(cuCtxPopCurrent(&poppedContext));
+  RETURN_ON_CUDA_ERROR(cuDevicePrimaryCtxRelease(device));
 
   return result;
 }