[llvm] [Offload] Support loading CUDA fat binaries (PR #156955)

Thu Sep 4 13:49:24 PDT 2025

================
@@ -556,7 +657,19 @@ struct CUDADeviceTy : public GenericDeviceTy {
 
     // Allocate and initialize the image object.
     CUDADeviceImageTy *CUDAImage = Plugin.allocate<CUDADeviceImageTy>();
-    new (CUDAImage) CUDADeviceImageTy(ImageId, *this, TgtImage);
+
+    uint32_t Magic = *reinterpret_cast<const uint32_t *>(TgtImage->ImageStart);
+    if (Magic == 0x466243b1 || Magic == 0xba55ed50) {
+      // It's a fatbin or a wrapped fatbin
----------------
jhuber6 wrote:

I don't understand that, this magic is for what you get out when you call `fatbinary`. We then embed that into a special section in the CUDA wrapper code. If there's some mysterious second set of magic bits they use for what `fatbinary` spits out, then we should add that to `Magic.h`. I think you're confusing it with https://github.com/llvm/llvm-project/blob/0e73ebc8997bb7f4c9c04e13792273a636c67bee/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp#L28 which serves a different purpose, unless it's absolutely necessary that we pass a pointer to that section instead of just opening that struct in the hypothetical CUDA runtime we're building.

https://github.com/llvm/llvm-project/pull/156955