[Openmp-dev] Declare variant + Nvidia Device Offload

Mon May 18 10:31:49 PDT 2020

Hi Johannes,

thanks four your quick reply!

> The math stuff works because all declare variant functions are static.
>
> I think if we need to replace the `.` with a symbol that the user cannot
>
> use but the ptax assembler is not upset about. we should also move
>
> `getOpenMPVariantManglingSeparatorStr` from `Decl.h` into
>
> `llvm/lib/Frontends/OpenMP/OMPContext.h`, I forgot why I didn't.
>
The `.` also seems to be part of the mangled context. Where does that
mangling take place?

According to the PTX documentation [0], identifiers cannot contain dots,
but `$` and `%` are allowed in user-defined names (apart from a few
predefined identifiers).

Should we replace the dot only for Nvidia devices or in general? Do any
other parts of the code rely on the mangling of the variants with dots?

> You should also be able to use the clang builtin atomics
You were referring to
https://clang.llvm.org/docs/LanguageExtensions.html#c11-atomic-builtins,
weren't you? As far as I can see, those only work on atomic types.

> `omp atomic` should eventually resolve to the same thing (I hope).
From what I can see in the generated LLVM IR, this does not seem to be
the case. Maybe that's due to the fact, that I'm using update or structs
(for more context, see [1]):

>  #pragma omp atomic update
>  target_cells_[voxelIndex].mean[0] += (double) target_[id].data[0];
>  #pragma omp atomic update
>  target_cells_[voxelIndex].mean[1] += (double) target_[id].data[1];
> #pragma omp atomic update
> target_cells_[voxelIndex].mean[2] += (double) target_[id].data[2];
> #pragma omp atomic update
> target_cells_[voxelIndex].numberPoints += 1;
In the generated LLVM IR, there are a number of atomic loads and an
atomicrmw in the end, but no calls to CUDA builtins.

The CUDA equivalent of this target region uses calls to atomicAdd and
according to nvprof, this is ~10x faster than the code generated for the
target region by Clang (although I'm not entirely sure the atomics are
the only problem here).

Best,

Lukas

[0]
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#identifiers

[1]
https://github.com/esa-tu-darmstadt/daphne-benchmark/blob/054bbd723dfdf65926ef3678138c6423d581b6e1/src/OpenMP-offload/ndt_mapping/kernel.cpp#L1361

Lukas Sommer, M.Sc.
TU Darmstadt
Embedded Systems and Applications Group (ESA)
Hochschulstr. 10, 64289 Darmstadt, Germany
Phone: +49 6151 1622429
www.esa.informatik.tu-darmstadt.de

On 18.05.20 18:18, Johannes Doerfert wrote:
>
> Oh, I forgot about this one.
>
>
> The math stuff works because all declare variant functions are static.
>
> I think if we need to replace the `.` with a symbol that the user cannot
>
> use but the ptax assembler is not upset about. we should also move
>
> `getOpenMPVariantManglingSeparatorStr` from `Decl.h` into
>
> `llvm/lib/Frontends/OpenMP/OMPContext.h`, I forgot why I didn't.
>
>  
>
> You should also be able to use the clang builtin atomics and even the
>
> `omp atomic` should eventually resolve to the same thing (I hope).
>
>
> Let me know if that helps,
>
>   Johannes
>
>
>
> On 5/18/20 10:33 AM, Lukas Sommer via Openmp-dev wrote:
>> Hi all,
>>
>> what's the current status of declare variant when compiling for Nvidia
>> GPUs?
>>
>> In my code, I have declared a variant of a function, that uses CUDA's
>> built-in atomicAdd (using the syntax from OpenMP TR8):
>>
>>> #pragma omp begin declare variant match(device={kind(nohost)})
>>>
>>> void atom_add(double* address, double val){
>>>         atomicAdd(address, val);
>>> }
>>>
>>> #pragma omp end declare variant
>> When compiling with Clang from master, ptxas fails:
>>
>>> clang++ -fopenmp   -O3 -std=c++11 -fopenmp
>>> -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_72 -v
>>> [...]
>>> ptxas kernel-openmp-nvptx64-nvidia-cuda.s, line 322; fatal   : Parsing
>>> error near '.ompvariant': syntax error
>>> ptxas fatal   : Ptx assembly aborted due to errors
>>> [...]
>>> clang-11: error: ptxas command failed with exit code 255 (use -v to
>>> see invocation)
>> The line mentioned in the ptxas error looks like this:
>>
>>>         // .globl       _Z33atom_add.ompvariant.S2.s6.PnohostPdd
>>> .visible .func _Z33atom_add.ompvariant.S2.s6.PnohostPdd(
>>>         .param .b64 _Z33atom_add.ompvariant.S2.s6.PnohostPdd_param_0,
>>>         .param .b64 _Z33atom_add.ompvariant.S2.s6.PnohostPdd_param_1
>>> )
>>> {
>> My guess was that ptxas stumbles across the ".ompvariant"-part of the
>> mangled function name.
>>
>> Is declare variant currently supported when compiling for Nvidia GPUs?
>> If not, is there a workaround (macro defined only for device
>> compilation, access to the atomic CUDA functions, ...)?
>>
>> Thanks in advance,
>>
>> Best
>>
>> Lukas
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200518/37be8bd3/attachment-0001.html>