<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Johannes,</p>

    <p>thanks four your quick reply!</p>

    <p>

      <blockquote type="cite">

        <p><font face="Hack Nerd Font Mono">The math stuff works because

            all declare variant functions are static.</font></p>

        <p><font face="Hack Nerd Font Mono">I think if we need to

            replace the `.` with a symbol that the user cannot</font></p>

        <p><font face="Hack Nerd Font Mono">use but the ptax assembler

            is not upset about. we should also move</font></p>

        <p><font face="Hack Nerd Font Mono">`getOpenMPVariantManglingSeparatorStr`

            from `Decl.h` into</font></p>

        <p><font face="Hack Nerd Font Mono">`llvm/lib/Frontends/OpenMP/OMPContext.h`,

            I forgot why I didn't.</font></p>

      </blockquote>

      The `.` also seems to be part of the mangled context. Where does

      that mangling take place? <br>

    </p>

    <p>According to the PTX documentation [0], identifiers cannot

      contain dots, but `$` and `%` are allowed in user-defined names

      (apart from a few predefined identifiers). <br>

    </p>

    <p>Should we replace the dot only for Nvidia devices or in general?

      Do any other parts of the code rely on the mangling of the

      variants with dots? <br>

    </p>

    <p>

      <blockquote type="cite">You should also be able to use the clang

        builtin atomics</blockquote>

      You were referring to

      <a class="moz-txt-link-freetext" href="https://clang.llvm.org/docs/LanguageExtensions.html#c11-atomic-builtins">https://clang.llvm.org/docs/LanguageExtensions.html#c11-atomic-builtins</a>,

      weren't you? As far as I can see, those only work on atomic types.<br>

    </p>

    <p>

      <blockquote type="cite">`omp atomic` should eventually resolve to

        the same thing (I hope).</blockquote>

      From what I can see in the generated LLVM IR, this does not seem

      to be the case. Maybe that's due to the fact, that I'm using

      update or structs (for more context, see [1]):</p>

    <p>

      <blockquote type="cite"> #pragma omp atomic update<br>

         target_cells_[voxelIndex].mean[0] += (double)

        target_[id].data[0];<br>

         #pragma omp atomic update<br>

         target_cells_[voxelIndex].mean[1] += (double)

        target_[id].data[1];<br>

        #pragma omp atomic update<br>

        target_cells_[voxelIndex].mean[2] += (double)

        target_[id].data[2];<br>

        #pragma omp atomic update<br>

        target_cells_[voxelIndex].numberPoints += 1;<br>

      </blockquote>

      In the generated LLVM IR, there are a number of atomic loads and

      an atomicrmw in the end, but no calls to CUDA builtins. <br>

    </p>

    <p>The CUDA equivalent of this target region uses calls to atomicAdd

      and according to nvprof, this is ~10x faster than the code

      generated for the target region by Clang (although I'm not

      entirely sure the atomics are the only problem here).</p>

    <p>Best,</p>

    <p>Lukas</p>

    <p>[0]

<a class="moz-txt-link-freetext" href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#identifiers">https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#identifiers</a><br>

    </p>

    <p>[1]

<a class="moz-txt-link-freetext" href="https://github.com/esa-tu-darmstadt/daphne-benchmark/blob/054bbd723dfdf65926ef3678138c6423d581b6e1/src/OpenMP-offload/ndt_mapping/kernel.cpp#L1361">https://github.com/esa-tu-darmstadt/daphne-benchmark/blob/054bbd723dfdf65926ef3678138c6423d581b6e1/src/OpenMP-offload/ndt_mapping/kernel.cpp#L1361</a><br>

    </p>

    <pre class="moz-signature" cols="72">Lukas Sommer, M.Sc.

TU Darmstadt

Embedded Systems and Applications Group (ESA)

Hochschulstr. 10, 64289 Darmstadt, Germany

Phone: +49 6151 1622429

<a class="moz-txt-link-abbreviated" href="http://www.esa.informatik.tu-darmstadt.de">www.esa.informatik.tu-darmstadt.de</a></pre>

    <div class="moz-cite-prefix">On 18.05.20 18:18, Johannes Doerfert

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:7213b921-868a-8229-ba0a-e740715053de@gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p><font face="Hack Nerd Font Mono">Oh, I forgot about this one.</font></p>

      <p><font face="Hack Nerd Font Mono"><br>

        </font></p>

      <p><font face="Hack Nerd Font Mono">The math stuff works because

          all declare variant functions are static.</font></p>

      <p><font face="Hack Nerd Font Mono">I think if we need to replace

          the `.` with a symbol that the user cannot</font></p>

      <p><font face="Hack Nerd Font Mono">use but the ptax assembler is

          not upset about. we should also move</font></p>

      <p><font face="Hack Nerd Font Mono">`getOpenMPVariantManglingSeparatorStr`

          from `Decl.h` into</font></p>

      <p><font face="Hack Nerd Font Mono">`llvm/lib/Frontends/OpenMP/OMPContext.h`,

          I forgot why I didn't.<br>

        </font></p>

      <p> </p>

      <p>You should also be able to use the clang builtin atomics and

        even the</p>

      <p>`omp atomic` should eventually resolve to the same thing (I

        hope).<br>

      </p>

      <p><br>

      </p>

      <p>Let me know if that helps,</p>

      <p>  Johannes</p>

      <p><br>

      </p>

      <div class="moz-cite-prefix"><br>

      </div>

      <div class="moz-cite-prefix">On 5/18/20 10:33 AM, Lukas Sommer via

        Openmp-dev wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:d2c93b55-fe98-2f40-3b2c-fdfd46188563@esa.tu-darmstadt.de">

        <pre class="moz-quote-pre" wrap="">Hi all,

what's the current status of declare variant when compiling for Nvidia

GPUs?

In my code, I have declared a variant of a function, that uses CUDA's

built-in atomicAdd (using the syntax from OpenMP TR8):

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">#pragma omp begin declare variant match(device={kind(nohost)})

void atom_add(double* address, double val){

        atomicAdd(address, val);

}

#pragma omp end declare variant

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">When compiling with Clang from master, ptxas fails:

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">clang++ -fopenmp   -O3 -std=c++11 -fopenmp

-fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_72 -v

[...]

ptxas kernel-openmp-nvptx64-nvidia-cuda.s, line 322; fatal   : Parsing

error near '.ompvariant': syntax error

ptxas fatal   : Ptx assembly aborted due to errors

[...]

clang-11: error: ptxas command failed with exit code 255 (use -v to

see invocation)

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">The line mentioned in the ptxas error looks like this:

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">        // .globl       _Z33atom_add.ompvariant.S2.s6.PnohostPdd

.visible .func _Z33atom_add.ompvariant.S2.s6.PnohostPdd(

        .param .b64 _Z33atom_add.ompvariant.S2.s6.PnohostPdd_param_0,

        .param .b64 _Z33atom_add.ompvariant.S2.s6.PnohostPdd_param_1

)

{

</pre>

        </blockquote>

        <pre class="moz-quote-pre" wrap="">My guess was that ptxas stumbles across the ".ompvariant"-part of the

mangled function name.

Is declare variant currently supported when compiling for Nvidia GPUs?

If not, is there a workaround (macro defined only for device

compilation, access to the atomic CUDA functions, ...)?

Thanks in advance,

Best

Lukas

</pre>

      </blockquote>

    </blockquote>

  </body>

</html>