[Openmp-dev] Experimental SPIR-V back-end using OpenCL 2.1

Fri Oct 20 09:40:38 PDT 2017

Hi Jonas,

yes, other OpenCL runtimes are blocked out at the moment.
I would like to just test on "OpenCL 2.1", but that would block the AMD 
runtime. (I hope, this will change in future)
It is the proprietary AMD OpenCL SDK, part of the AMDGPU-Pro driver. 
ROCm states to only support OpenCL 1.2 runtime, but it might be worth a 
test.

Part of SPIRV-Tools is a validator, but this may be too much overhead. 
You mean checking the magic number? Good idea!

Thanks,
Daniel

On 10/20/2017 06:23 PM, Jonas Hahnfeld wrote:
> Hi Daniel,
>
> interesting work!
>
> Two question:
>  - With the latest commit, you perform a strncmp with "OpenCL 2.0 
> AMD". Does this mean that all other OpenCL implementations are 
> effectively blocked out?
>  - Is this the proprietary AMD OpenCL SDK or the "new" ROCm stack? 
> https://rocm.github.io/
>
> One remark: For __tgt_rtl_is_valid_binary: Does SPIR-V have its own 
> machine id? That's how the CUDA plugin detects compatible binaries...
>
> Cheers,
> Jonas
>
> Am 2017-10-20 11:24, schrieb Daniel Schürmann via Openmp-dev:
>> Thanks for all the help!
>>
>> These benchmarks, especially lulesh, showed me that my approach of
>> setting the address spaces manually according to their scope doesn't
>> work(*) and that I have to use the generic address space
>> like the nvptx backend does.
>>
>> Now with this much more robust version, I decided to make my project
>> public:
>>
>> My clang fork is available at
>> https://github.com/daniel-schuermann/clang
>>
>> and the openmp runtime fork at
>> https://github.com/daniel-schuermann/openmp
>>
>> and the necessary llvm fork (for generating SPIR-V) is from
>> https://github.com/thewilsonator/llvm/tree/compute
>>
>> All 3 of them might need a pull from upstream as they are not always
>> synced.
>>
>> libomptarget-spir needs an OpenCL runtime which supports SPIRV kernels
>> (with OpenCL 2.1 headers).
>> Unfortunately, the Intel OpenCL runtime started to segfault with the
>> change to generic address space.
>> Therefore, the only working OpenCL runtime I know of is AMDGPU-Pro.
>>
>> The follwing pragmas should work for now:
>> #target (enter/exit data)
>> #teams
>> #distribute / parallel for
>> #master
>> #barrier
>>
>> as well as the clauses:
>> map, shared, private, firstprivate, lastprivate, schedule
>>
>> A device runtime is not planned for the moment (I also don't know, how
>> to compile and link OpenCL sources into libomptarget), but the
>> generated code works independently from runtime functions.
>>
>> For those more adventurous, you can (try to) build your sources with
>> -fopenmp -fopenmp-targets=spir64-unknown-unknown
>> For C++, I recommend to add -fno-exceptions and -O0 as there seem to
>> be optimizer passes enabled
>> which don't work for spir.
>>
>> I would be very thankful for some feedback (I hope, it won't get too
>> depressing).
>> Although I am not able to accept pull requests at the moment for legal
>> purposes,
>> I would welcome any hint to make the implementation more robust and
>> complete
>> as well as statements about (not) working programs.
>> (no comments on code style please, this can be fixed later °°)
>>
>> Kind regards,
>> Daniel
>>
>> (*) While this is a valid program snipped,
>>
>> #pragma omp target map(to:a[0:n])
>> { int * b = a; }
>>
>> it doesn't work if |a| is a pointer to addrSpace(1) (cl_global) and
>> |b| gets allocated as pointer to cl_private.