<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>I think 3 is the best option in general.<br>

    </p>

    <pre class="moz-signature" cols="72">-------------

Best regards,

Alexey Bataev</pre>

    <div class="moz-cite-prefix">12/8/2020 7:23 PM, Johannes Doerfert

      via cfe-dev пишет:<br>

    </div>

    <blockquote type="cite" cite="mid:%3C4b65d30e-66a0-98c0-82e2-7b82d6bb7493@gmail.com%3E">Hi

      Itay,

      <br>

      <br>

      the problem is the "escape" of `local` later in the function (into

      the

      <br>

      OpenMP reduction runtime call).

      <br>

      <br>

      There are various solutions to this, on the compiler side, which

      <br>

      I'll try to describe briefly below. If you need a shortstop

      solution

      <br>

      for this issue, you can introduce a secondary privatization

      variable

      <br>

      yourself <a class="moz-txt-link-freetext" href="https://godbolt.org/z/3fxxTT">https://godbolt.org/z/3fxxTT</a> (untested). Each thread will

      use

      <br>

      the user provided variable for accumulation and the OpenMP

      reduction

      <br>

      facility is used for the final reduction across threads.

      Obviously, this

      <br>

      is what the compiler should generate, so here we go:

      <br>

      <br>

      1) Use `PointerMayBeCapturedBefore` instead of

      `PointerMayBeCaptured` in

      <br>

         BasicAliasAnalysis.cpp (the call is done via

      `isNonEscapingLocalObject`).

      <br>

         The reason we don't do this, I assume, is compile time. Maybe

      someone

      <br>

         will try it out and measure the compile and runtime

      implications but for

      <br>

         not I assume this to be a solution that might not be enacted

      (though simple

      <br>

         and generic).

      <br>

      <br>

      2) Modify the OpenMP reduction output to introduce the second

      privatization

      <br>

         location. This should be relatively easy but it will only

      address the problem

      <br>

         at hand. Similar problems pop up more often and I would prefer

      1) or 3). That

      <br>

         said, we can for now enact 2) if someone writes the code ;)

      <br>

      <br>

      3) Introduce an attribute/annotation that indicates this is

      actually not an escaping

      <br>

         use. Such uses happen all the time (at least in the OpenMP

      runtime) and it would

      <br>

         be ideal if we could indicate the memory store is not causing

      the pointer to escape.

      <br>

         I'm thinking about this a bit more now, any input is welcome :)

      <br>

      <br>

      I hope this helps, feel free to reach out if you have more

      questions or comments.

      <br>

      <br>

      ~ Johannes

      <br>

      <br>

      <br>

      On 12/8/20 2:28 PM, Itay Bookstein via cfe-dev wrote:

      <br>

      <blockquote type="cite">Hey,

        <br>

        <br>

        I've encountered a peculiar code generation issue in a

        parallel-for-reduction.

        <br>

        Inside the per-thread reduction loop, I see a store to a stack

        slot that happens

        <br>

        *every iteration*, clobbering the value written by the previous

        one. The address

        <br>

        of that stack slot is later taken to pass to the inter-thread

        <br>

        reduction, but the store

        <br>

        is *not* hoisted outside the loop, while I'd expect it to happen

        just

        <br>

        once outside it.

        <br>

        <br>

        Attached below are the code example that triggers this, and an

        annotated x86-64

        <br>

        assembly snippet from the outlined OMP function. The code was

        compiled using

        <br>

        clang++-12 Ubuntu focal unstable branch, using the command-line:

        <br>

        clang++-12 -fopenmp -O3 main.cpp -o main

        <br>

        <br>

        I'm wondering whether this is some sort of bug, and in which

        component.

        <br>

        <br>

        Regards,

        <br>

        ~Itay

        <br>

        <br>

        // main.cpp

        <br>

        #include <cstdio>

        <br>

        #include <memory>

        <br>

        <br>

        double compute_dot_product(size_t n, double *xv, double *yv)

        <br>

        {

        <br>

           double local = 0.0;

        <br>

           #pragma omp parallel for reduction (+:local)

        <br>

           for (size_t i = 0; i < n; i++) local += xv[i] * yv[i];

        <br>

           return local;

        <br>

        }

        <br>

        <br>

        int main(int argc, char **argv)

        <br>

        {

        <br>

           constexpr size_t n = 0x1000;

        <br>

           auto xv = std::make_unique<double[]>(n);

        <br>

           auto yv = std::make_unique<double[]>(n);

        <br>

        <br>

           double result = compute_dot_product(n, xv.get(), yv.get());

        <br>

           printf("result = %e\n", result);

        <br>

           return 0;

        <br>

        }

        <br>

        <br>

        // Disassembly excerpt from objdump -d --no-show-raw-insn main,

        <br>

        function <...>omp_outlined<...>

        <br>

        4012f0: movsd (%rdx,%rcx,8),%xmm1 ; <--- Non-unrolled loop

        head

        <br>

        4012f5: mulsd (%rsi,%rcx,8),%xmm1

        <br>

        4012fa: addsd %xmm1,%xmm0

        <br>

        4012fe: movsd %xmm0,(%rsp) ; <--- Un-hoisted store every loop

        iteration

        <br>

        401303: add $0x1,%rcx

        <br>

        401307: add $0xffffffffffffffff,%rax

        <br>

        40130b: jne 4012f0 <.omp_outlined.+0xc0>  ; <---

        Non-unrolled loop latch

        <br>

        40130d: cmp $0x3,%rbp

        <br>

        401311: jb 40138b <.omp_outlined.+0x15b> ; <---

        x4-unrolled loop guard

        <br>

        401313: sub %rcx,%rdi ; <--- x4-unrolled loop preheader

        <br>

        401316: lea (%rsi,%rcx,8),%rsi

        <br>

        40131a: add $0x18,%rsi

        <br>

        40131e: lea (%rdx,%rcx,8),%rcx

        <br>

        401322: add $0x18,%rcx

        <br>

        401326: mov $0xffffffffffffffff,%rdx

        <br>

        40132d: nopl (%rax)

        <br>

        401330: movsd -0x10(%rcx,%rdx,8),%xmm1 ; <--- x4-unrolled

        loop head

        <br>

        401336: mulsd -0x10(%rsi,%rdx,8),%xmm1

        <br>

        40133c: addsd %xmm0,%xmm1

        <br>

        401340: movsd %xmm1,(%rsp) ; <--- Weird store #1

        <br>

        401345: movsd -0x8(%rcx,%rdx,8),%xmm0

        <br>

        40134b: mulsd -0x8(%rsi,%rdx,8),%xmm0

        <br>

        401351: addsd %xmm1,%xmm0

        <br>

        401355: movsd %xmm0,(%rsp)  ; <--- Weird store #2

        <br>

        40135a: movsd (%rcx,%rdx,8),%xmm1

        <br>

        40135f: mulsd (%rsi,%rdx,8),%xmm1

        <br>

        401364: addsd %xmm0,%xmm1

        <br>

        401368: movsd %xmm1,(%rsp)  ; <--- Weird store #3

        <br>

        40136d: movsd 0x8(%rcx,%rdx,8),%xmm0

        <br>

        401373: mulsd 0x8(%rsi,%rdx,8),%xmm0

        <br>

        401379: addsd %xmm1,%xmm0

        <br>

        40137d: movsd %xmm0,(%rsp)  ; <--- Weird store #4

        <br>

        401382: add $0x4,%rdx

        <br>

        401386: cmp %rdx,%rdi

        <br>

        401389: jne 401330 <.omp_outlined.+0x100> ; <---

        x4-unrolled loop latch

        <br>

        40138b: mov $0x402028,%edi

        <br>

        401390: mov %r14d,%esi

        <br>

        401393: callq 401040 <__kmpc_for_static_fini@plt>

        <br>

        401398: mov %rsp,%rax ; <--- Load address of the stack slot

        to pass to

        <br>

        reduction logic

        <br>

        40139b: mov %rax,0x20(%rsp)

        <br>

        _______________________________________________

        <br>

        cfe-dev mailing list

        <br>

        <a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>

        <br>

        <a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>

        <br>

      </blockquote>

      <br>

    </blockquote>

  </body>

</html>