[Parallel_libs-commits] [PATCH] D25701: Initial check-in of Acxxel (StreamExecutor renamed)

Mon Oct 17 15:10:52 PDT 2016

jhen added a comment.

In https://reviews.llvm.org/D25701#572176, @hfinkel wrote:

> > Due to the shift in emphasis away from supporting type-safe kernel launches and the movement of streams from being the central programming entities,
>
> Can you please discuss the motivation for this?

When hammering out the finer details of kernel loader objects from the original StreamExecutor implementation, it was brought to my attention that templated CUDA kernels would be a big problem for that model. Delving into the internal usage at Google, we also realized that basically nobody was using the StreamExecutor kernel launch style for CUDA, but folks were instead using the CUDA triple-chevron notation. Since triple-chevron supports templated kernels, is supported by clang for CUDA, and (I think) can be extended to support OpenCL, I think it is better to move forward with triple-chevron as the kernel launch model, and to make it work as seamlessly with StreamExecutor as possible.

I don't want this current choice to preclude adding in more general type-safe kernel launch support to this library in the future, but it doesn't seem like StreamExecutor has the right model for that at this time. With Acxxel, I think we will have a good place to experiment with what the right model will be, and maybe that means extending triple-chevron launches to more general accelerator platforms. Meanwhile, Acxxel is still extremely useful as a modern C++ wrapper for host-side accelerator management.

For those who view platform independence as the main feature of StreamExecutor, I think that not much has changed during this shift of focus in StreamExecutor. Acxxel also supports CUDA kernel launches with the same type-unsafe interface that it provides for OpenCL, so by using this interface the host code will only have to be written once to support both platforms.

One major benefit of the Acxxel model over StreamExecutor is that Acxxel code can be compiled by nvcc, whereas StreamExecutor required support from the compiler. This provides an easy route for users of nvcc to try out Acxxel and get platform independence and nvcc compatibility at the same time.

As for the change in the style of dealing with streams, I mentioned a bit in my comments that without fluent kernel launches, it loses most of its benefit. It is also awkward to convert OpenCL and CUDA runtime error handling to work well with this fluent stream interface.

https://reviews.llvm.org/D25701