[Openmp-dev] Experimental SPIR-V back-end using OpenCL 2.1

Fri Oct 20 08:24:00 PDT 2017

Thanks for all the help!

These benchmarks, especially lulesh, showed me that my approach of 
setting the address spaces manually according to their scope doesn't 
work(*) and that I have to use the generic address space
like the nvptx backend does.

Now with this much more robust version, I decided to make my project public:

My clang fork is available at
https://github.com/daniel-schuermann/clang

and the openmp runtime fork at
https://github.com/daniel-schuermann/openmp

and the necessary llvm fork (for generating SPIR-V) is from
https://github.com/thewilsonator/llvm/tree/compute

All 3 of them might need a pull from upstream as they are not always synced.

libomptarget-spir needs an OpenCL runtime which supports SPIRV kernels 
(with OpenCL 2.1 headers).
Unfortunately, the Intel OpenCL runtime started to segfault with the 
change to generic address space.
Therefore, the only working OpenCL runtime I know of is AMDGPU-Pro.

The follwing pragmas should work for now:
#target (enter/exit data)
#teams
#distribute / parallel for
#master
#barrier

as well as the clauses:
map, shared, private, firstprivate, lastprivate, schedule

A device runtime is not planned for the moment (I also don't know, how 
to compile and link OpenCL sources into libomptarget), but the generated 
code works independently from runtime functions.

For those more adventurous, you can (try to) build your sources with 
-fopenmp -fopenmp-targets=spir64-unknown-unknown
For C++, I recommend to add -fno-exceptions and -O0 as there seem to be 
optimizer passes enabled
which don't work for spir.

I would be very thankful for some feedback (I hope, it won't get too 
depressing).
Although I am not able to accept pull requests at the moment for legal 
purposes,
I would welcome any hint to make the implementation more robust and complete
as well as statements about (not) working programs.
(no comments on code style please, this can be fixed later °°)

Kind regards,
Daniel

(*) While this is a valid program snipped,

#pragma omp target map(to:a[0:n])
{ int * b = a; }

it doesn't work if |a| is a pointer to addrSpace(1) (cl_global) and |b| 
gets allocated as pointer to cl_private.

On 10/11/2017 06:41 PM, Sunita Chandrasekaran wrote:
> Yes these SNU OMP 3.1 C codes are quite useful (and some of the 
> authors of this software also attend SC and have a booth on the 
> showfloor. I met one or 2 of the last year at SC).
>
> On a related note, some of us (w/ NASA) in my group have just begun to 
> create OpenMP 4.5 NPB codes (that did not make it to the SPEC HPG). We 
> will keep you informed.
>
> Thanks
> Sunita
>
>
> **********************************
> Sunita Chandrasekaran
> Asst. Prof. Computer and Information Sciences
> Affiliated, Center for Bioinformatics and Computational Biology
> 430 Smith Hall, University of Delaware
> p: 302-831-2714 <tel:%28302%29%20831-2714> e: schandra at udel.edu 
> <mailto:schandra at udel.edu>
> ----------------------------------------
> Adjunct Prof. Dept. of Computer Science
> University of Houston, TX
> ----------------------------------------
> t: https://twitter.com/chandrasunita <https://twitter.com/chandrasunita>
> w: https://www.eecis.udel.edu/~schandra/ 
> <https://www.eecis.udel.edu/%7Eschandra/>
> CRPL
> Research Group:
> http://crpl.cis.udel.edu/
>
>
> On Wed, Oct 11, 2017 at 12:24 PM, Jeff Hammond via Openmp-dev 
> <openmp-dev at lists.llvm.org <mailto:openmp-dev at lists.llvm.org>> wrote:
>
>     I just found http://aces.snu.ac.kr/software/snu-npb/
>     <http://aces.snu.ac.kr/software/snu-npb/> today, but have not
>     tried it.
>
>     Jeff
>
>     On Tue, Oct 10, 2017 at 7:37 AM, Jeff Hammond
>     <jeff.science at gmail.com <mailto:jeff.science at gmail.com>> wrote:
>
>         https://github.com/ParRes/Kernels/tree/master/Cxx11
>         <https://github.com/ParRes/Kernels/tree/master/Cxx11> implements
>         at least two kernels in OpenCL, GPU-oriented OpenMP4 (#pragma
>         omp teams distribute parallel for simd collapse(2)
>         schedule(static,1) as recommended by NVIDIA), RAJA, Kokkos,
>         and numerous CPU implementations.
>
>         You’ll need to (1) tune the OpenCL code for your GPU as it is
>         currently not optimized for any architecture and (2) tweak the
>         RAJA and KOKKOS implementations to target GPU models in
>         addition to CPU ones. Neither should be very difficult and the
>         PRK team is happy to provide assistance.
>
>         That project contains one C++ CUDA and two Fortran OpenACC
>         implementations but these are unfinished and need work.
>         However, I’m very happy to see others contribute better
>         versions.  It may not be the easiest route to fame and
>         fortune, but it’s not the worst either 😂
>
>         Write me privately or create GitHub issues if you have
>         questions. I am currently on leave from my day job but will be
>         responsive via GMail and GitHub.
>
>         Sorry if this is a repost. I intended to reply early but
>         didn’t and can’t find it if I did, but mistakes have been
>         known to occur.
>
>         Jeff
>
>         Sent from my iPhone
>
>         On Oct 10, 2017, at 6:56 AM, Jonas Hahnfeld via Openmp-dev
>         <openmp-dev at lists.llvm.org <mailto:openmp-dev at lists.llvm.org>>
>         wrote:
>
>>         Hi,
>>
>>         some (public) benchmarks that come to my mind:
>>         * LULESH: https://codesign.llnl.gov/lulesh.php
>>         <https://codesign.llnl.gov/lulesh.php>
>>         * CloverLeaf: http://uk-mac.github.io/CloverLeaf/
>>         <http://uk-mac.github.io/CloverLeaf/>
>>         * my own, a Conjugate Gradient solver:
>>         https://github.com/hahnjo/CGxx
>>
>>         As James wrote, SPEC ACCEL also has an OpenMP suite, but you
>>         need a license.
>>
>>         Regards
>>         Jonas
>>
>>         Am 2017-10-10 09:43, schrieb Daniel Schürmann via Openmp-dev:
>>>         Hello together,
>>>         as the title states, I am looking for benchmarks which are
>>>         particularly suitable for GPU accelerators (or at least make
>>>         use of
>>>         the #teams pragma).
>>>         I already tried rodinia benchmark suite, but they seem to be
>>>         written
>>>         for CPU acceleration only.
>>>         I would be very pleased if someone could provide me with one
>>>         or more
>>>         kernels which can be used with the NVPTX backend.
>>>         The purpose is a master thesis about using OpenCL and SPIR-V
>>>         as OpenMP backend.
>>>         Thank you in advance and
>>>         kind regards,
>>>         Daniel
>>>         _______________________________________________
>>>         Openmp-dev mailing list
>>>         Openmp-dev at lists.llvm.org <mailto:Openmp-dev at lists.llvm.org>
>>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>>         _______________________________________________
>>         Openmp-dev mailing list
>>         Openmp-dev at lists.llvm.org <mailto:Openmp-dev at lists.llvm.org>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>
>
>
>
>     -- 
>     Jeff Hammond
>     jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>     http://jeffhammond.github.io/
>
>     _______________________________________________
>     Openmp-dev mailing list
>     Openmp-dev at lists.llvm.org <mailto:Openmp-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20171020/01c16f88/attachment-0001.html>