[Parallel_libs-commits] [PATCH] D23038: [StreamExecutor] Add KernelLoaderSpec

Mon Aug 1 17:00:20 PDT 2016

jlebar added inline comments.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:11
@@ +10,3 @@
+/// \file
+/// A KernelLoaderSpec is a class that knows where to find the code for a
+/// data-parallel kernel in a single format on a single platform. So, for
----------------
Nit: "A Foo object" or "Foo is a class".

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:12
@@ +11,3 @@
+/// A KernelLoaderSpec is a class that knows where to find the code for a
+/// data-parallel kernel in a single format on a single platform. So, for
+/// example, there will be one subclass that deals with CUDA PTX code, another
----------------
suggest s/single/particular/

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:15
@@ +14,3 @@
+/// subclass that deals with CUDA cubin code, and yet another subclass that
+/// deals with OpenCL text code.
+///
----------------
Oh, hm, I see why you said "A Foo class".  I think maybe you want to say "FileLoaderSpec is the base class for types that know where to find the code."

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:28
@@ +27,3 @@
+/// MultiKernelLoaderSpec::addCUDAPTXInMemory, to construct the KernelLoaderSpec
+/// and add it to the MultiKernelLoaderSpec at the same time.
+///
----------------
I think this would be much easier to read if it were in second person.  As-is the subject of the first phrase is MultiKernelLoaderSpec, which isn't right.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:42
@@ +41,3 @@
+///    namespace compiler_cuda_namespace {
+///      MultiKernelLoaderSpec UserKernelLoaderSpec();
+///    } // namespace compiler_cuda_namespace
----------------
Would it be helpful to see how this function is implemented?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:50
@@ +49,3 @@
+///    namespace compiler_cuda_namespace {
+///      UserKernelLoaderSpec.addCUDACubinInMemory(
+///        __UserKernelCubinAddress, "UserKernel");
----------------
UserKernelLoaderSpec()?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:51
@@ +50,3 @@
+///      UserKernelLoaderSpec.addCUDACubinInMemory(
+///        __UserKernelCubinAddress, "UserKernel");
+///    } // namespace compiler_cuda_namespace
----------------
This won't compile because MultiKernelLoaderSpec doesn't have this function.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:91
@@ +90,3 @@
+protected:
+  /// Allows subclasses to set the kernel name during construction.
+  explicit KernelLoaderSpec(const std::string &KernelName);
----------------
I'm not sure we need this comment; it's pretty clear.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:98
@@ +97,3 @@
+  KernelLoaderSpec(const KernelLoaderSpec &) = delete;
+  void operator=(const KernelLoaderSpec &) = delete;
+};
----------------
operator= usually returns the KernelLoaderSpec&.  I think it will work as you've written it here [1], but maybe changing it will let others avoid having to look it up.

[1] http://en.cppreference.com/w/cpp/language/copy_assignment

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:101
@@ +100,3 @@
+
+/// A KernelLoaderSpec for CUDA PTX code that resides in memory.
+class CUDAPTXInMemory : public KernelLoaderSpec {
----------------
Maybe "resides in memory as a null-terminated string."

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:102
@@ +101,3 @@
+/// A KernelLoaderSpec for CUDA PTX code that resides in memory.
+class CUDAPTXInMemory : public KernelLoaderSpec {
+public:
----------------
Should this have "Spec" in the name?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:106
@@ +105,3 @@
+  ///
+  /// A tuple is used because this will be the key of a map.
+  using ComputeCapability = std::tuple<int, int>;
----------------
A tuple as opposed to std::pair?  pair also defines operator<, and seems simpler.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:121
@@ +120,3 @@
+  /// generally usable - in other words, PTX specified in this manner is VERY
+  /// likely to be used as the default.
+  ///
----------------
I don't really understand this -- we're only giving one PTX string, so of course it's going to be the default?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:125
@@ +124,3 @@
+  /// and does not take ownership.
+  CUDAPTXInMemory(const char *PTXCode, const std::string &KernelName);
+
----------------
Since PTX source code usually (always?) has a directive at the beginning specifying the compute capability, it doesn't seem unreasonable for someone to assume that if they call this constructor, we will infer the CC from the given string.

It also seems weird that this constructor is the only way to specify a minimum-CC PTX string.

I wonder if we we need this constructor at all?  If we had a public way to get the minimum CC, then we could just have the one constructor?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:134
@@ +133,3 @@
+  CUDAPTXInMemory(const std::initializer_list<PTXSpec> &SpecList,
+                  const std::string &KernelName);
+
----------------
I wonder if it would make more sense to put the name first.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:134
@@ +133,3 @@
+  CUDAPTXInMemory(const std::initializer_list<PTXSpec> &SpecList,
+                  const std::string &KernelName);
+
----------------
jlebar wrote:
> I wonder if it would make more sense to put the name first.
Using an initializer_list as opposed to, say, a vector, seems kind of limiting, since you can't dynamically construct an initializer_list.

(llvm::ArrayRef is really what we want, but I don't know if you want to force the compiler to pull that in.)

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:149
@@ +148,3 @@
+  const char *getCode(int ComputeCapabilityMajor,
+                      int ComputeCapabilityMinor) const;
+
----------------
Since PTX is backwards-compatible, we want to return the PTX with the largest version <= the given compute-capability, right?  Not clear from the comment if that's what we're doing.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:154
@@ +153,3 @@
+  ///
+  /// The key is a tuple "<cc_major>,<cc_minor>", i.e., "2,0", "3,0", "3,5".
+  /// Because compute capabilities represented in this way have a clear sorting
----------------
Can we write this as (cc_major, cc_minor) and (2, 0), etc?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:157
@@ +156,3 @@
+  /// order, map::begin() will give the lowest-numbered version available, i.e.
+  /// the default.
+  std::map<ComputeCapability, const char *> PTXByComputeCapability;
----------------
Not sure this comment is necessary here.  It's kind of obvious, but if necessary maybe we can place it next to the code that uses this assumption?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:177
@@ +176,3 @@
+  /// does not take ownership.
+  CUDACubinInMemory(const char *Bytes, const std::string &KernelName);
+
----------------
Do you think it would make sense to take the size of the cubin data?  I guess we don't ever need it so it doesn't matter.

Maybe I was mislead by the comments saying that we don't make a copy -- clearly we can't if we don't even know the length of the thing!

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:187
@@ +186,3 @@
+  void operator=(const CUDACubinInMemory &) = delete;
+};
+
----------------
It seems kind of inconsistent that the PTX spec encompasses multiple PTX strings compiled for different archs, while the cubin one doesn't.

Unless you mean s/cubin/fatbin/?  AIUI a cubin encompasses SASS code for a particular sm, while a fatbin is a combination of one or more cubins and ptx strings.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:189
@@ +188,3 @@
+
+/// A KernelLoaderSpec for OpenCL text that resides in memory.
+class OpenCLTextInMemory : public KernelLoaderSpec {
----------------
If this is null-terminated, maybe we should say that?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:204
@@ +203,3 @@
+  /// OpenCL translation unit text contents in memory.
+  std::string Text;
+
----------------
It looks like we do make a copy?  That seems inconsistent though.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:222
@@ +221,3 @@
+/// kernel, they are all assumed to take the same number and type of parameters,
+/// and to have the same name, but no checking is done to enforce this. Since
+/// this interface is prone to errors, it is better to leave
----------------
It seems like we could at least assert() that they have the same name?

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:263
@@ +262,3 @@
+  // name may be mangled by the compiler if it is not declared in an extern "C"
+  // scope.
+
----------------
extern "C" doesn't need a scope:

  extern "C" void foo()

so maybe this is better said as 'if it is not declared extern "C".'

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:283
@@ +282,3 @@
+  MultiKernelLoaderSpec *addOpenCLTextInMemory(const char *OpenCLText,
+                                               const std::string &KernelName);
+
----------------
Same comments as on the constructors for those types.

================
Comment at: streamexecutor/include/streamexecutor/KernelSpec.h:293
@@ +292,3 @@
+  /// OpenCL text that resides in memory.
+  std::unique_ptr<OpenCLTextInMemory> TheOpenCLTextInMemory;
+};
----------------
Not sure these comments are necessary; seems obvious.

================
Comment at: streamexecutor/lib/KernelSpec.cpp:27
@@ +26,3 @@
+
+// Minimum compute capability is 1.0.
+const CUDAPTXInMemory::ComputeCapability
----------------
Seems obvious from the code?

================
Comment at: streamexecutor/lib/KernelSpec.cpp:71
@@ +70,3 @@
+
+MultiKernelLoaderSpec *
+MultiKernelLoaderSpec::addCUDAPTXInMemory(const char *PTXCode,
----------------
For chaining purposes it's more common to return *this (see e.g. operator=).

https://reviews.llvm.org/D23038