[Parallel_libs-commits] [PATCH] D23038: [StreamExecutor] Add KernelLoaderSpec

Tue Aug 2 14:30:55 PDT 2016

jhen added inline comments.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:28
@@ +27,3 @@
+/// MultiKernelLoaderSpec::addCUDAPTXInMemory, to construct the KernelLoaderSpec
+/// and add it to the MultiKernelLoaderSpec at the same time.
+///
----------------
jlebar wrote:
> I think this would be much easier to read if it were in second person.  As-is the subject of the first phrase is MultiKernelLoaderSpec, which isn't right.
I didn't switch to second person, but I do think I cleaned up the language to make it much more understandable.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:42
@@ +41,3 @@
+///    namespace compiler_cuda_namespace {
+///      MultiKernelLoaderSpec UserKernelLoaderSpec();
+///    } // namespace compiler_cuda_namespace
----------------
jlebar wrote:
> Would it be helpful to see how this function is implemented?
Oops, I shouldn't have put the parens here. This is supposed to be instantiating a MultiKernelLoaderSpec called UserKernelLoaderSpec.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:50
@@ +49,3 @@
+///    namespace compiler_cuda_namespace {
+///      UserKernelLoaderSpec.addCUDACubinInMemory(
+///        __UserKernelCubinAddress, "UserKernel");
----------------
jlebar wrote:
> UserKernelLoaderSpec()?
I think the confusion here is the same as the point above. UserKernelLoaderSpec is a class instance, not a function to create a class instance.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:51
@@ +50,3 @@
+///      UserKernelLoaderSpec.addCUDACubinInMemory(
+///        __UserKernelCubinAddress, "UserKernel");
+///    } // namespace compiler_cuda_namespace
----------------
jlebar wrote:
> This won't compile because MultiKernelLoaderSpec doesn't have this function.
I'm not sure what's causing the confusion here. It looks to me like I did declare MultiKernelLoaderSpec::addCUDACubinInMemory. Is that the function you are talking about?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:121
@@ +120,3 @@
+  /// generally usable - in other words, PTX specified in this manner is VERY
+  /// likely to be used as the default.
+  ///
----------------
jlebar wrote:
> I don't really understand this -- we're only giving one PTX string, so of course it's going to be the default?
Yes, this whole method doesn't really make sense. I removed it. I also removed the method for returning the "default" code. We can add that in later if we have a use case, but it's not needed for now.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:125
@@ +124,3 @@
+  /// and does not take ownership.
+  CUDAPTXInMemory(const char *PTXCode, const std::string &KernelName);
+
----------------
jlebar wrote:
> Since PTX source code usually (always?) has a directive at the beginning specifying the compute capability, it doesn't seem unreasonable for someone to assume that if they call this constructor, we will infer the CC from the given string.
> 
> It also seems weird that this constructor is the only way to specify a minimum-CC PTX string.
> 
> I wonder if we we need this constructor at all?  If we had a public way to get the minimum CC, then we could just have the one constructor?
We could add code to make this interface safer at the expense of more complexity, but I think the original design is the right trade-off because this is only meant to be called by the compiler.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:149
@@ +148,3 @@
+  const char *getCode(int ComputeCapabilityMajor,
+                      int ComputeCapabilityMinor) const;
+
----------------
jlebar wrote:
> Since PTX is backwards-compatible, we want to return the PTX with the largest version <= the given compute-capability, right?  Not clear from the comment if that's what we're doing.
I added to the comment to explain that there is no fallback logic in this function. If the exact compute capability is not found, it returns an error value. If we need that other functionality, we can add in another method later, but this is the only functionality that is present right now.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:157
@@ +156,3 @@
+  /// order, map::begin() will give the lowest-numbered version available, i.e.
+  /// the default.
+  std::map<ComputeCapability, const char *> PTXByComputeCapability;
----------------
jlebar wrote:
> Not sure this comment is necessary here.  It's kind of obvious, but if necessary maybe we can place it next to the code that uses this assumption?
I removed the comment because the code that used it is gone now.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:177
@@ +176,3 @@
+  /// does not take ownership.
+  CUDACubinInMemory(const char *Bytes, const std::string &KernelName);
+
----------------
jlebar wrote:
> Do you think it would make sense to take the size of the cubin data?  I guess we don't ever need it so it doesn't matter.
> 
> Maybe I was mislead by the comments saying that we don't make a copy -- clearly we can't if we don't even know the length of the thing!
I don't think it would be good to take the size because we would never use it. I'm not sure what's best to put in the comment to reduce confusion. If the user thought it was a null-terminated string (which it isn't, but they might think it was), then they might be imagining that a copy is made even though no length is provided.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:187
@@ +186,3 @@
+  void operator=(const CUDACubinInMemory &) = delete;
+};
+
----------------
jlebar wrote:
> It seems kind of inconsistent that the PTX spec encompasses multiple PTX strings compiled for different archs, while the cubin one doesn't.
> 
> Unless you mean s/cubin/fatbin/?  AIUI a cubin encompasses SASS code for a particular sm, while a fatbin is a combination of one or more cubins and ptx strings.
Good catch! What this code has been calling "cubin" is really "fatbin". That's how it is treated when it is loaded. So I changed the names to fatbin. Now they are accurate.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:204
@@ +203,3 @@
+  /// OpenCL translation unit text contents in memory.
+  std::string Text;
+
----------------
jlebar wrote:
> It looks like we do make a copy?  That seems inconsistent though.
Oops, thanks, I missed that. I removed the copy to make it consistent with the others.

https://reviews.llvm.org/D23038