[Mlir-commits] [mlir] [mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (PR #66220)

Wed Sep 13 08:46:42 PDT 2023

llvmbot wrote:




@llvm/pr-subscribers-mlir
            
<details>
<summary>Changes</summary>
This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.

NOTE:
1. Not all tests are using `MLIR_GPU_COMPILATION_TEST_FORMAT`.
2.  An option needs to be added to the `SparseCompiler` to support the format option, however I didn't know if there's any preference.
3. I'm basing the implementation of `mgpuModuleLoadJIT` on the assumption there's a [JIT cache](https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/). Another option is to implement the cache itself in MLIR.
--

Patch is 50.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66220.diff

33 Files Affected:

- (modified) mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td (+12-3) 
- (modified) mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td (+23-2) 
- (modified) mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h (+16-22) 
- (modified) mlir/include/mlir/Dialect/GPU/Transforms/Passes.td (+1-2) 
- (modified) mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (+44-5) 
- (modified) mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp (+18-18) 
- (modified) mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp (+21) 
- (modified) mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp (+5) 
- (modified) mlir/lib/Target/LLVM/NVVM/Target.cpp (+31-7) 
- (modified) mlir/lib/Target/LLVM/ROCDL/Target.cpp (+19-2) 
- (modified) mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp (+69-21) 
- (modified) mlir/test/CMakeLists.txt (+2) 
- (modified) mlir/test/Dialect/GPU/module-to-binary-nvvm.mlir (+3-3) 
- (modified) mlir/test/Dialect/GPU/module-to-binary-rocdl.mlir (+3-3) 
- (modified) mlir/test/Dialect/GPU/ops.mlir (+10) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-and.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-max.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-min.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-op.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-or.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-region.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/all-reduce-xor.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/async.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/gpu-to-cubin.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/lit.local.cfg (+2) 
- (modified) mlir/test/Integration/GPU/CUDA/multiple-all-reduce.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/printf.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/shuffle.mlir (+1-1) 
- (modified) mlir/test/Integration/GPU/CUDA/two-modules.mlir (+1-1) 
- (modified) mlir/test/lib/Dialect/GPU/TestLowerToNVVM.cpp (+7-1) 
- (modified) mlir/test/lit.site.cfg.py.in (+1) 
- (modified) mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp (+3-3) 
- (modified) mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp (+3-3) 


<pre>

diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
index 5255286619e3bf2..160730480394272 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td
@@ -33,12 +33,21 @@ def GPUTargetAttrInterface : AttrInterface&lt;&quot;TargetAttrInterface&quot;&gt; {
 
         If serialization fails then the method should return `std::nullopt`.
 
-        The `module` argument must be a GPU Module Op. The `options` argument is
-        meant to be used for passing additional options that are not in the
+        The `module` parameter must be a GPU Module Op. The `options` parameter
+        is meant to be used for passing additional options that are not in the
         attribute.
       }],
       &quot;std::optional&lt;SmallVector&lt;char, 0&gt;&gt;&quot;, &quot;serializeToObject&quot;,
-      (ins &quot;Operation*&quot;:$module, &quot;const gpu::TargetOptions&amp;&quot;:$options)&gt;
+      (ins &quot;Operation*&quot;:$module, &quot;const gpu::TargetOptions&amp;&quot;:$options)&gt;,
+    InterfaceMethod&lt;[{
+        Creates a GPU object attribute from a binary string.
+
+        The `object` parameter is a binary string. The `options` parameter is
+        meant to be used for passing additional options that are not in the
+        attribute.
+      }], &quot;Attribute&quot;, &quot;createObject&quot;,
+        (ins &quot;const SmallVector&lt;char, 0&gt;&amp;&quot;:$object,
+             &quot;const gpu::TargetOptions&amp;&quot;:$options)&gt;
   ];
 }
 
diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
index 9c1110d8e9a9463..3d2e9848a2b25a0 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationAttrs.td
@@ -20,6 +20,18 @@ include &quot;mlir/Dialect/GPU/IR/CompilationAttrInterfaces.td&quot;
 // GPU object attribute.
 //===----------------------------------------------------------------------===//
 
+def GPU_ObjectOffload : I32EnumAttrCase&lt;&quot;Offload&quot;, 1, &quot;offload&quot;&gt;;
+def GPU_ObjectISA : I32EnumAttrCase&lt;&quot;Assembly&quot;, 2, &quot;assembly&quot;&gt;;
+def GPU_ObjectBinary : I32EnumAttrCase&lt;&quot;Binary&quot;, 3, &quot;bin&quot;&gt;;
+def GPU_ObjectFatbin : I32EnumAttrCase&lt;&quot;Fatbin&quot;, 4, &quot;fatbin&quot;&gt;;
+def GPU_CompilationTargetEnum : GPU_I32Enum&lt;
+  &quot;CompilationTarget&quot;, &quot;GPU object format&quot;, [
+    GPU_ObjectOffload,
+    GPU_ObjectISA,
+    GPU_ObjectBinary,
+    GPU_ObjectFatbin
+  ]&gt;;
+
 def GPU_ObjectAttr : GPU_Attr&lt;&quot;Object&quot;, &quot;object&quot;&gt; {
   let description = [{
     A GPU object attribute pairs a GPU target with a binary string,
@@ -32,8 +44,17 @@ def GPU_ObjectAttr : GPU_Attr&lt;&quot;Object&quot;, &quot;object&quot;&gt; {
       #gpu.object&lt;#nvvm.target, &quot;...&quot;&gt;
     ```
   }];
-  let parameters = (ins &quot;Attribute&quot;:$target, &quot;StringAttr&quot;:$object);
-  let assemblyFormat = [{`&lt;` $target `,` $object `&gt;`}];
+  let parameters = (ins
+    &quot;Attribute&quot;:$target,
+    DefaultValuedParameter&lt;&quot;CompilationTarget&quot;, &quot;CompilationTarget::Fatbin&quot;&gt;:$format,
+    &quot;StringAttr&quot;:$object,
+    OptionalParameter&lt;&quot;DictionaryAttr&quot;&gt;:$properties
+  );
+  let assemblyFormat = [{ `&lt;`
+      $target `,`  (`properties` `=` $properties ^ `,`)?
+      custom&lt;Object&gt;($format, $object)
+    `&gt;`
+  }];
   let genVerifyDecl = 1;
 }
 
diff --git a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
index a1f64be57fa699d..ee7daed58f98314 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
+++ b/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h
@@ -25,6 +25,8 @@ namespace LLVM {
 class ModuleTranslation;
 }
 namespace gpu {
+enum class CompilationTarget : uint32_t;
+
 /// This class indicates that the attribute associated with this trait is a GPU
 /// offloading translation attribute. These kinds of attributes must implement
 /// an interface for handling the translation of GPU offloading operations like
@@ -42,27 +44,15 @@ class OffloadingTranslationAttrTrait
 /// ensure type safeness. Targets are free to ignore these options.
 class TargetOptions {
 public:
-  /// The target representation of the compilation process.
-  typedef enum {
-    offload = 1,  /// The process should produce an offloading representation.
-                  /// For the NVVM &amp; ROCDL targets this option produces LLVM IR.
-    assembly = 2, /// The process should produce assembly code.
-    binary = 4,   /// The process should produce a binary.
-    fatbinary = 8, /// The process should produce a fat binary.
-    binOrFatbin =
-        binary |
-        fatbinary, /// The process should produce a binary or fatbinary. It&#x27;s up
-                   /// to the target to decide which.
-  } CompilationTarget;
-
   /// Constructor initializing the toolkit path, the list of files to link to,
   /// extra command line options, the compilation target and a callback for
   /// obtaining the parent symbol table. The default compilation target is
   /// `binOrFatbin`.
-  TargetOptions(StringRef toolkitPath = {},
-                ArrayRef&lt;std::string&gt; linkFiles = {}, StringRef cmdOptions = {},
-                CompilationTarget compilationTarget = binOrFatbin,
-                function_ref&lt;SymbolTable *()&gt; getSymbolTableCallback = {});
+  TargetOptions(
+      StringRef toolkitPath = {}, ArrayRef&lt;std::string&gt; linkFiles = {},
+      StringRef cmdOptions = {},
+      CompilationTarget compilationTarget = getDefaultCompilationTarget(),
+      function_ref&lt;SymbolTable *()&gt; getSymbolTableCallback = {});
 
   /// Returns the typeID.
   TypeID getTypeID() const;
@@ -90,13 +80,17 @@ class TargetOptions {
   /// table.
   SymbolTable *getSymbolTable() const;
 
+  /// Returns the default compilation target: `CompilationTarget::Fatbin`.
+  static CompilationTarget getDefaultCompilationTarget();
+
 protected:
   /// Derived classes must use this constructor to initialize `typeID` to the
   /// appropiate value: ie. `TargetOptions(TypeID::get&lt;DerivedClass&gt;())`.
-  TargetOptions(TypeID typeID, StringRef toolkitPath = {},
-                ArrayRef&lt;std::string&gt; linkFiles = {}, StringRef cmdOptions = {},
-                CompilationTarget compilationTarget = binOrFatbin,
-                function_ref&lt;SymbolTable *()&gt; getSymbolTableCallback = {});
+  TargetOptions(
+      TypeID typeID, StringRef toolkitPath = {},
+      ArrayRef&lt;std::string&gt; linkFiles = {}, StringRef cmdOptions = {},
+      CompilationTarget compilationTarget = getDefaultCompilationTarget(),
+      function_ref&lt;SymbolTable *()&gt; getSymbolTableCallback = {});
 
   /// Path to the target toolkit.
   std::string toolkitPath;
@@ -108,7 +102,7 @@ class TargetOptions {
   /// process.
   std::string cmdOptions;
 
-  /// Compilation process target representation.
+  /// Compilation process target format.
   CompilationTarget compilationTarget;
 
   /// Callback for obtaining the parent symbol table of all the GPU modules
diff --git a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
index 0bfb2750992058f..3de8e18851369df 100644
--- a/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
@@ -68,7 +68,6 @@ def GpuModuleToBinaryPass
     2. `assembly`, `isa`: produces assembly code.
     3. `binary`, `bin`: produces binaries.
     4. `fatbinary`, `fatbin`: produces fatbinaries.
-    5. `binOrFatbin`: produces bins or fatbins, the target decides which.
   }];
   let options = [
     Option&lt;&quot;offloadingHandler&quot;, &quot;handler&quot;, &quot;Attribute&quot;, &quot;nullptr&quot;,
@@ -79,7 +78,7 @@ def GpuModuleToBinaryPass
            &quot;Extra files to link to.&quot;&gt;,
     Option&lt;&quot;cmdOptions&quot;, &quot;opts&quot;, &quot;std::string&quot;, [{&quot;&quot;}],
            &quot;Command line options to pass to the tools.&quot;&gt;,
-    Option&lt;&quot;compilationTarget&quot;, &quot;format&quot;, &quot;std::string&quot;, [{&quot;binOrFatbin&quot;}],
+    Option&lt;&quot;compilationTarget&quot;, &quot;format&quot;, &quot;std::string&quot;, [{&quot;fatbin&quot;}],
            &quot;The target representation of the compilation process.&quot;&gt;
   ];
 }
diff --git a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
index fde379cd0afe13f..5eb2cadc884e151 100644
--- a/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
+++ b/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
@@ -1959,7 +1959,8 @@ void AllocOp::getCanonicalizationPatterns(RewritePatternSet &amp;results,
 //===----------------------------------------------------------------------===//
 
 LogicalResult ObjectAttr::verify(function_ref&lt;InFlightDiagnostic()&gt; emitError,
-                                 Attribute target, StringAttr object) {
+                                 Attribute target, CompilationTarget format,
+                                 StringAttr object, DictionaryAttr properties) {
   if (!target)
     return emitError() &lt;&lt; &quot;the target attribute cannot be null&quot;;
   if (target.hasPromiseOrImplementsInterface&lt;TargetAttrInterface&gt;())
@@ -1968,6 +1969,40 @@ LogicalResult ObjectAttr::verify(function_ref&lt;InFlightDiagnostic()&gt; emitError,
                         &quot;`gpu::TargetAttrInterface`&quot;;
 }
 
+namespace {
+LogicalResult parseObject(AsmParser &amp;odsParser, CompilationTarget &amp;format,
+                          StringAttr &amp;object) {
+  std::optional&lt;CompilationTarget&gt; formatResult;
+  StringRef enumKeyword;
+  auto loc = odsParser.getCurrentLocation();
+  if (failed(odsParser.parseOptionalKeyword(&amp;enumKeyword)))
+    formatResult = CompilationTarget::Fatbin;
+  if (!formatResult &amp;&amp;
+      (formatResult =
+           gpu::symbolizeEnum&lt;gpu::CompilationTarget&gt;(enumKeyword)) &amp;&amp;
+      odsParser.parseEqual())
+    return odsParser.emitError(loc, &quot;expected an equal sign&quot;);
+  if (!formatResult)
+    return odsParser.emitError(loc, &quot;expected keyword for GPU object format&quot;);
+  FailureOr&lt;StringAttr&gt; objectResult =
+      FieldParser&lt;StringAttr&gt;::parse(odsParser);
+  if (failed(objectResult))
+    return odsParser.emitError(odsParser.getCurrentLocation(),
+                               &quot;failed to parse GPU_ObjectAttr parameter &quot;
+                               &quot;&#x27;object&#x27; which is to be a `StringAttr`&quot;);
+  format = *formatResult;
+  object = *objectResult;
+  return success();
+}
+
+void printObject(AsmPrinter &amp;odsParser, CompilationTarget format,
+                 StringAttr object) {
+  if (format != CompilationTarget::Fatbin)
+    odsParser &lt;&lt; stringifyEnum(format) &lt;&lt; &quot; = &quot;;
+  odsParser &lt;&lt; object;
+}
+} // namespace
+
 //===----------------------------------------------------------------------===//
 // GPU select object attribute
 //===----------------------------------------------------------------------===//
@@ -2020,6 +2055,14 @@ SymbolTable *TargetOptions::getSymbolTable() const {
   return getSymbolTableCallback ? getSymbolTableCallback() : nullptr;
 }
 
+CompilationTarget TargetOptions::getCompilationTarget() const {
+  return compilationTarget;
+}
+
+CompilationTarget TargetOptions::getDefaultCompilationTarget() {
+  return CompilationTarget::Fatbin;
+}
+
 std::pair&lt;llvm::BumpPtrAllocator, SmallVector&lt;const char *&gt;&gt;
 TargetOptions::tokenizeCmdOptions() const {
   std::pair&lt;llvm::BumpPtrAllocator, SmallVector&lt;const char *&gt;&gt; options;
@@ -2043,10 +2086,6 @@ TargetOptions::tokenizeCmdOptions() const {
   return options;
 }
 
-TargetOptions::CompilationTarget TargetOptions::getCompilationTarget() const {
-  return compilationTarget;
-}
-
 MLIR_DEFINE_EXPLICIT_TYPE_ID(::mlir::gpu::TargetOptions)
 
 #include &quot;mlir/Dialect/GPU/IR/GPUOpInterfaces.cpp.inc&quot;
diff --git a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
index e29a1f0c3248d04..2bf89f8c57903e5 100644
--- a/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
+++ b/mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
@@ -57,14 +57,14 @@ void GpuModuleToBinaryPass::getDependentDialects(
 
 void GpuModuleToBinaryPass::runOnOperation() {
   RewritePatternSet patterns(&amp;getContext());
-  int targetFormat = llvm::StringSwitch&lt;int&gt;(compilationTarget)
-                         .Cases(&quot;offloading&quot;, &quot;llvm&quot;, TargetOptions::offload)
-                         .Cases(&quot;assembly&quot;, &quot;isa&quot;, TargetOptions::assembly)
-                         .Cases(&quot;binary&quot;, &quot;bin&quot;, TargetOptions::binary)
-                         .Cases(&quot;fatbinary&quot;, &quot;fatbin&quot;, TargetOptions::fatbinary)
-                         .Case(&quot;binOrFatbin&quot;, TargetOptions::binOrFatbin)
-                         .Default(-1);
-  if (targetFormat == -1)
+  auto targetFormat =
+      llvm::StringSwitch&lt;std::optional&lt;CompilationTarget&gt;&gt;(compilationTarget)
+          .Cases(&quot;offloading&quot;, &quot;llvm&quot;, CompilationTarget::Offload)
+          .Cases(&quot;assembly&quot;, &quot;isa&quot;, CompilationTarget::Assembly)
+          .Cases(&quot;binary&quot;, &quot;bin&quot;, CompilationTarget::Binary)
+          .Cases(&quot;fatbinary&quot;, &quot;fatbin&quot;, CompilationTarget::Fatbin)
+          .Default(std::nullopt);
+  if (!targetFormat)
     getOperation()-&gt;emitError() &lt;&lt; &quot;Invalid format specified.&quot;;
 
   // Lazy symbol table builder callback.
@@ -82,10 +82,8 @@ void GpuModuleToBinaryPass::runOnOperation() {
     return &amp;parentTable.value();
   };
 
-  TargetOptions targetOptions(
-      toolkitPath, linkFiles, cmdOptions,
-      static_cast&lt;TargetOptions::CompilationTarget&gt;(targetFormat),
-      lazyTableBuilder);
+  TargetOptions targetOptions(toolkitPath, linkFiles, cmdOptions, *targetFormat,
+                              lazyTableBuilder);
   if (failed(transformGpuModulesToBinaries(
           getOperation(),
           offloadingHandler ? dyn_cast&lt;OffloadingLLVMTranslationAttrInterface&gt;(
@@ -107,17 +105,19 @@ LogicalResult moduleSerializer(GPUModuleOp op,
     auto target = dyn_cast&lt;gpu::TargetAttrInterface&gt;(targetAttr);
     assert(target &amp;&amp;
            &quot;Target attribute doesn&#x27;t implements `TargetAttrInterface`.&quot;);
-    std::optional&lt;SmallVector&lt;char, 0&gt;&gt; object =
+    std::optional&lt;SmallVector&lt;char, 0&gt;&gt; serializedModule =
         target.serializeToObject(op, targetOptions);
-
-    if (!object) {
+    if (!serializedModule) {
       op.emitError(&quot;An error happened while serializing the module.&quot;);
       return failure();
     }
 
-    objects.push_back(builder.getAttr&lt;gpu::ObjectAttr&gt;(
-        target,
-        builder.getStringAttr(StringRef(object-&gt;data(), object-&gt;size()))));
+    Attribute object = target.createObject(*serializedModule, targetOptions);
+    if (!object) {
+      op.emitError(&quot;An error happened while creating the object.&quot;);
+      return failure();
+    }
+    objects.push_back(object);
   }
   builder.setInsertionPointAfter(op);
   builder.create&lt;gpu::BinaryOp&gt;(op.getLoc(), op.getName(), handler,
diff --git a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
index 7bf6804902479a8..d19d473a5327627 100644
--- a/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
@@ -126,6 +126,27 @@ extern &quot;C&quot; MLIR_CUDA_WRAPPERS_EXPORT CUmodule mgpuModuleLoad(void *data) {
   return module;
 }
 
+extern &quot;C&quot; MLIR_CUDA_WRAPPERS_EXPORT CUmodule mgpuModuleLoadJIT(void *data,
+                                                                int optLevel) {
+  ScopedContext scopedContext;
+  CUmodule module = nullptr;
+  char jitErrorBuffer[4096] = {0};
+  CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,
+                               CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES,
+                               CU_JIT_OPTIMIZATION_LEVEL};
+  void *jitOptionsVals[] = {jitErrorBuffer,
+                            reinterpret_cast&lt;void *&gt;(sizeof(jitErrorBuffer)),
+                            reinterpret_cast&lt;void *&gt;(optLevel)};
+
+  CUresult result =
+      cuModuleLoadDataEx(&amp;module, data, 3, jitOptions, jitOptionsVals);
+  if (result) {
+    fprintf(stderr, &quot;JIT compilation failed with: &#x27;%s&#x27;\n&quot;, jitErrorBuffer);
+    CUDA_REPORT_IF_ERROR(result);
+  }
+  return module;
+}
+
 extern &quot;C&quot; MLIR_CUDA_WRAPPERS_EXPORT void mgpuModuleUnload(CUmodule module) {
   CUDA_REPORT_IF_ERROR(cuModuleUnload(module));
 }
diff --git a/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp b/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
index bd3868a8e196f6f..da2ae87fef6715f 100644
--- a/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
+++ b/mlir/lib/ExecutionEngine/RocmRuntimeWrappers.cpp
@@ -38,6 +38,11 @@ extern &quot;C&quot; hipModule_t mgpuModuleLoad(void *data) {
   return module;
 }
 
+extern &quot;C&quot; hipModule_t mgpuModuleLoadJIT(void *data, int optLevel) {
+  assert(false &amp;&amp; &quot;This function is not available in HIP.&quot;);
+  return nullptr;
+}
+
 extern &quot;C&quot; void mgpuModuleUnload(hipModule_t module) {
   HIP_REPORT_IF_ERROR(hipModuleUnload(module));
 }
diff --git a/mlir/lib/Target/LLVM/NVVM/Target.cpp b/mlir/lib/Target/LLVM/NVVM/Target.cpp
index 13188b1107d928b..7f263627db54fbe 100644
--- a/mlir/lib/Target/LLVM/NVVM/Target.cpp
+++ b/mlir/lib/Target/LLVM/NVVM/Target.cpp
@@ -47,6 +47,10 @@ class NVVMTargetAttrImpl
   std::optional&lt;SmallVector&lt;char, 0&gt;&gt;
   serializeT...
<truncated>
</pre>
</details>


https://github.com/llvm/llvm-project/pull/66220