[PATCH] D99190: WIP: [SYCL] Add design document for SYCL mode

Victor Lomuller via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 29 06:14:37 PDT 2021


Naghasan requested changes to this revision.
Naghasan added a comment.
This revision now requires changes to proceed.

Added comment for the front end support part.

I think the document present to much at once, front-end, driver, tool chains, SPIR-V backend, compiler-runtime bindings etc.
It is good to have everything eventually, but it would be beneficial to split this patch into more focus ones.
Having a link to the main document in your github would still allow one to get the big picture.

I see the address space bit has already been split, that's better to keep things focused I guess.

In D99190#2650326 <https://reviews.llvm.org/D99190#2650326>, @bader wrote:

> @keryell, thanks for the feedback.
>
> I applied your suggestions with two exceptions:
>
> 1. I'm not sure if other back-ends need special processing for kernel function parameters we apply for SPIR kernels.

I'm not sure TBH, but passing just the object as a whole is not helping DAE passes. So either way, it is not a bad thing to have that processing.



================
Comment at: clang/docs/SYCLSupport.md:51
+virtual calls), generates LLVM IR for the device code only and an "integration
+header" which provides information like kernel name, parameters order and data
+type for the runtime library.
----------------
Do you plan on upstreaming your integration header approach ? Even if it is useful in some situations and speeds up the creation of a prototype, it comes with its complications.

An integration header creates a by-product then used as input for another compilation phase. I haven't at the upstream driver capabilities for awhile, but I don't think you can model 2 outputs yet. Are  you planing on adding this capability ?
If not, wouldn't that forces you to trigger a compilation step just for the generation of that files ? If so that puts a strong burden on the compilation speed as you now have to process 3 times your input C++ file.


================
Comment at: clang/docs/SYCLSupport.md:73
+the integration header is used (included) by the SYCL runtime implementation, so
+the header must be available before the host compilation starts.*
+
----------------
> First, it must be possible to use any host compiler

I don't understand the link with the integration header. SYCL being implementable as a library is a design principle of the specs but it doesn't means the clang host compiler has to remain a vanilla C++ compiler.

> information provided in the integration header is used (included) by the SYCL runtime implementation, so the header must be available before the host compilation starts

Another approach to the integration header would be for clang as the host compiler to generate the used type traits.


================
Comment at: clang/docs/SYCLSupport.md:123
+traverse all symbols accessible from kernel functions and add them to the
+"device part" of the code marking them with the new SYCL device attribute.
+
----------------
OpenMP offload uses a similar approach isn't it? Might be worth to describe how the 2 relates to each other and where they diverge. 


================
Comment at: clang/docs/SYCLSupport.md:128
+SYCL memory objects shared between host and device can be accessed either
+through raw pointers (to "shared" memory allocated using USM methods) or special
+`accessor` classes. Raw pointers map to kernel parameters one-to-one without
----------------
I find "shared" ambiguous, to CUDA developer shared memory is the OpenCL local memory. "Unified" is less prone misinterpretation (and matches the U of USM :).


================
Comment at: clang/docs/SYCLSupport.md:130
+`accessor` classes. Raw pointers map to kernel parameters one-to-one without
+additional transformations. `accessor` classes require additional processing as
+The "device" implementation of this class contains pointers to the device memory
----------------
> Raw pointers map to kernel parameters one-to-one without additional transformations

Is this true ? I thought they should be passed as "pointer to global memory" (under the OpenCL or CUDA model).


================
Comment at: clang/docs/SYCLSupport.md:131
+additional transformations. `accessor` classes require additional processing as
+The "device" implementation of this class contains pointers to the device memory
+as a class member. OpenCL doesn't allow passing structures with pointer type
----------------



================
Comment at: clang/docs/SYCLSupport.md:132
+The "device" implementation of this class contains pointers to the device memory
+as a class member. OpenCL doesn't allow passing structures with pointer type
+members as kernel parameters. All memory objects shared between host and device
----------------
Won't be better to refer to SPIR or SPIR-V kernels rather than OpenCL ?

Or even SPIR-like or SPIR-defacto to be even less normative.


================
Comment at: clang/docs/SYCLSupport.md:134
+members as kernel parameters. All memory objects shared between host and device
+must be passed to the kernel as raw pointers.
+
----------------
This is a bit ambiguous and leaves the impression you can't objects pass by value.


================
Comment at: clang/docs/SYCLSupport.md:144
+
+To facilitate the mapping of SYCL kernel data members to SPIR kernel arguments
+and use OpenCL as a back-end we added the generation of an SPIR kernel
----------------
A PTX / cuda backend would use the exact same lowering logic. The generated LLVM IR will look a bit different (as the identification of PTX kernels is different), but this section also applies.


================
Comment at: clang/docs/SYCLSupport.md:161
+// Generated kernel function (expressed in OpenCL-like pseudo-code)
+__kernel KernelName(global int* a) {
+  KernelType KernelFuncObj; // Actually kernel function object declaration
----------------
This is missing  the template instantiation  that will eventually lead to that lowering.

I would also suggest to split code block in 2 as to mark what is in header and source file (SYCL code) and what is compiler generated (that pseudo OpenCL).

Might be good to also mention the glue generated in the integration header as this is what allows arguments to be set by the runtime (bridge between the structure in C++ and the SPIR-like kernel arguments).


================
Comment at: clang/docs/SYCLSupport.md:139
+
+To facilitate the mapping of SYCL kernel data members to SPIR kernel arguments
+and overcome OpenCL limitations we added the generation of an SPIR kernel
----------------
keryell wrote:
> To generalize to non SPIR & non OpenCL
SYCL's mechanism isn't special, just its way to.

Making an analogy with another programming model is I think a good way to describe it, but I would suggest to use a single source programming model. To that end, I think CUDA would be a better programming model to use here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99190/new/

https://reviews.llvm.org/D99190



More information about the cfe-commits mailing list