[llvm] [Offload] Use flat array for cuLaunchKernel (PR #95116)

Wed Jun 12 04:39:46 PDT 2024

================
@@ -1285,11 +1289,16 @@ Error CUDAKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
   uint32_t MaxDynCGroupMem =
       std::max(KernelArgs.DynCGroupMem, GenericDevice.getDynamicMemorySize());
 
-  CUresult Res =
-      cuLaunchKernel(Func, NumBlocks, /*gridDimY=*/1,
-                     /*gridDimZ=*/1, NumThreads,
-                     /*blockDimY=*/1, /*blockDimZ=*/1, MaxDynCGroupMem, Stream,
-                     (void **)Args, nullptr);
+  void *Config[] = {/* CU_LAUNCH_PARAM_BUFFER_POINTER */ (void *)0x01,
+                    LaunchParams.Data,
+                    /* CU_LAUNCH_PARAM_BUFFER_SIZE */ (void *)0x02,
+                    reinterpret_cast<void *>(&LaunchParams.Size),
+                    /* CU_LAUNCH_PARAM_END */ (void *)0x00};
----------------
jhuber6 wrote:

```suggestion
  void *Config[] = {/* CU_LAUNCH_PARAM_BUFFER_POINTER */ (void *)0x01,
                    LaunchParams.Data,
                    /* CU_LAUNCH_PARAM_BUFFER_SIZE */ (void *)0x02,
                    reinterpret_cast<void *>(&LaunchParams.Size),
                    /* CU_LAUNCH_PARAM_END */ (void *)0x00};
```
What's going on here? Shouldn't these just be added to `dynamic_cuda/cuda.h`?

https://github.com/llvm/llvm-project/pull/95116