[Parallel_libs-commits] [PATCH] D25701: Initial check-in of Acxxel (StreamExecutor renamed)

Jason Henline via Parallel_libs-commits parallel_libs-commits at lists.llvm.org
Tue Oct 18 12:45:07 PDT 2016


jhen marked 2 inline comments as done.
jhen added a comment.

In https://reviews.llvm.org/D25701#573365, @jlebar wrote:

> > You were right that it did require checking the error at every line. To address this, I've added in a thread_local variable to keep track of the first error status. With this, users can make as many calls as they want without checking the error, then they can do a final check that nothing went wrong.
>
> Should it be thread_local, or local to the stream itself?  (Are streams even thread-safe?  I would have assumed not, but if they are, we should comment that.)


Streams are thread safe. This is something I carried over from the original StreamExecutor. I added a comment to the Stream, Platform, Event, Program, and Kernel classes indicating that they are all thread-safe.

I could make the status local to a Stream, but none of the synchronous operations are launched via Streams, they are launched via Platforms instead. So, I could make a status that is local to a Platform, and another one that is local to a Stream. I feel like that could get a bit confusing for users.

Or, perhaps it would be less confusing to have one Status for the whole library. That actually seems pretty nice. What do you think?



================
Comment at: acxxel/acxxel.h:114
+  bool DeviceMap = false;
+  bool IOMemory = false;
+};
----------------
jlebar wrote:
> jhen wrote:
> > jlebar wrote:
> > > It's not obvious to me what these three mean; maybe they should have comments?
> > I removed these and the config structs below because they were too CUDA-specific, and I don't think they were very helpful anyway.
> So how do we access those configuration knobs now?  Or, is that functionality just not available to users?
The functionality is no longer available to users. We can add support back in later if needed.


================
Comment at: acxxel/acxxel.h:643
+                             Span<void *> Arguments, Span<size_t> ArgumentSizes,
+                             size_t SharedMemoryBytes = 0) {
+    return rawEnqueueKernelLaunch(TheStream.TheHandle.get(),
----------------
jlebar wrote:
> jhen wrote:
> > jlebar wrote:
> > > I am unclear how this kernel-launching stuff relates to the comment at the top of file saying we don't do this...  Is this for opencl only?
> > I can't seem to find the comment at the top of the file that you mention here. Can you be a bit more specific about which comment you mean?
> Sorry, I think I was thinking of the commit message.
> 
> Question stands, though, wrt exactly what we're saying our library does.  We can launch kernels on both CUDA and OpenCL, but only through this limited interface?
We can launch kernels on both CUDA and OpenCL through this type-unsafe interface.

Additionally, for CUDA, we can pass in our Stream and DeviceMemory types to triple-chevron kernel launches as compiled by clang or nvcc.


================
Comment at: acxxel/acxxel.h:1160
+        static_cast<ElementType *>(ThePointer.get())[I].~ElementType();
+      }
+    }
----------------
jlebar wrote:
> jhen wrote:
> > jlebar wrote:
> > >      T *Memory = new (MaybeMemory.getValue()) T[ElementCount];
> > > 
> > > Oh, hm.  Are you sure this is the right way to undo your allocation above?  Like, we don't have to somehow delete the placement new'ed array itself?  (I don't see a way to do it on cppreference.)
> > My understanding is that the destructor of the `ThePointer` member will be called when this function ends. The `ThePointer` member is a `std::unique_ptr` with a custom deleter, and the custom deleter will call out to the platform to "free" the memory owned by `ThePointer`. So I don't think there is any memory leak here.
> Definitely no memory leak.  The question is more whether C++ requires us to do an "array placement delete" to match the "array placement new", or whether it's OK to in-place destroy the elements piecewise like we do here.
> 
> Judging from the random people on the Internet here, it seems like using array placement new for this is dicey and may actually be a buffer overflow.  Maybe we should just construct the elements one at a time?  Then this destruction code is clearly correct.
> 
> http://stackoverflow.com/questions/8720425/array-placement-new-requires-unspecified-overhead-in-the-buffer
Oh I see! Yes, that is scary. OK, I switched to calling placement new for each element individually rather than dealing with placement new for an array.


================
Comment at: acxxel/acxxel.h:295
+/// A program can be created by calling Platform::createProgramFromSource, and a
+/// Kernel can be created from a program by running Platform::createKernel.
+class Program {
----------------
jlebar wrote:
> What is the purpose of having separate program and kernel classes?  It seems like a kernel is just a program plus a name, so could we just merge the two?  If not, maybe we can briefly explain why they are separate concepts.
The important difference is that a program can contain multiple kernels because the input source for the program can define multiple kernels. I added a comment to the `Program` class to explain this.

I don't think we want to merge `Program` with `Kernel` because, while it's true that a `Kernel` is basically just a program plus a name, it's not true that a `Program` is just a `Kernel` minus a name.


https://reviews.llvm.org/D25701





More information about the Parallel_libs-commits mailing list