[Parallel_libs-commits] [PATCH] D24596: [SE] Support CUDA dynamic shared memory

Thu Sep 15 10:57:01 PDT 2016

jlebar added inline comments.

================
Comment at: streamexecutor/lib/platforms/cuda/CUDAPlatformDevice.cpp:189
@@ +188,3 @@
+    size_t ArgumentCount = ArgumentArray.getArgumentCount();
+    llvm::SmallVector<void *, 16> NonSharedArgumentAddresses(
+        ArgumentCount - SharedArgumentCount);
----------------
> our approach was that the dynamic shared memory case was not common, and we would accept being inefficient in that case, as long as we were efficient in the case of no dynamic shared memory.

I feel like these things usually are unimportant until they're not.  Which is to say, I'm totally onboard with doing a slow thing for a case we think is uncommon, so long as we're not painting ourselves into a corner.

I like your plan of (if the case arises) telling people who want their code to be fast to put a single shared memory parameter at the front of their param pack and then optimizing for that case.

https://reviews.llvm.org/D24596