[Parallel_libs-commits] [PATCH] D24619: [SE] Cache CUDA modules

Thu Sep 15 14:10:40 PDT 2016

jhen added inline comments.

================
Comment at: streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp:130
@@ +129,3 @@
+        return CUresultToError(Result, "cuModuleGetFunction");
+      LoadedModules.emplace(Code, std::make_pair(Module, Function));
+    } else
----------------
jlebar wrote:
> Hm.  This makes a copy of "Code" in the map.  And also, every time we do a lookup, we're going to have to compare the whole PTX strings.  Which are potentially very long.
> 
> Is there no other identifier we could use as the map key?
Unfortunately, I don't think there is currently any other identifier that won't ever lead to false matches. Using the whole string as the key is what the original developers did because they couldn't find a better solution, so I'm mostly just following their lead here.

There are some things we could do with randomly generated UUIDs that would work for all practical purposes, but I don't want to worry about correctly generating UUIDs.

I also have the following idea that seems a little complex. Does it seem too complex to you?:

Use a static integer with atomic increments (or mutex or whatever) to give a unique ID to each MultiKernelLoaderSpec instance in the process and then have each instance assign a unique ID to each piece of code that is registered with it. The pair of `MultiKernelLoaderSpec` ID and code ID will uniquely identify a piece of code and can be used as a key in the module cache.

https://reviews.llvm.org/D24619