<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 9/29/20 9:11 PM, Artur Pilipenko

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:38F308BC-0D3E-4A6D-AA5F-05B3FECFAE62@azul.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      Thanks for the feedback.

      <div class=""><br class="">

      </div>

      <div class="">I think both of the suggestions are very reasonable.

        I’ll incorporate them.<br class="">

        <div><br class="">

        </div>

        <div>Given there were no objections for two weeks, I’m going to

          go ahead with posting individual patches for review. </div>

        <div><br class="">

        </div>

        <div>One small question inline:</div>

        <div><br class="">

          <blockquote type="cite" class="">

            <div class="">On Sep 28, 2020, at 10:56 AM, Philip Reames

              <<a href="mailto:listmail@philipreames.com" class=""

                moz-do-not-send="true">listmail@philipreames.com</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <div class="">

              <div class="">

                <p class="">In general, I am supportive of this

                  direction.  It seems like an entirely reasonable

                  solution.  I do have some comments below, but they're

                  mostly of the "how do we generalize this?" variety.</p>

                <p class=""><br class="">

                </p>

                <p class="">First, let's touch on the attribute.  <br

                    class="">

                </p>

                <p class="">My first concern is naming; I think the use

                  of "statepoint" here is problematic as this doesn't

                  relate to lowering strategy needed (e.g. statepoints),

                  but the conceptual support (e.g. a safepoint).  This

                  could be resolved by simply tweaking to

                  require-safepoint.</p>

                <p class="">But that brings us to a broader point. 

                  We've chosen to build in the fact intrinsics don't

                  require safepoints.  If all we want is for some

                  intrinsics *to* require safepoints, why isn't this

                  simply a tweak to the existing code? 

                  callsGCLeafFunction already has a small list of

                  intrinsics which can have safepoints.  <br class="">

                </p>

                <p class="">I think you can completely remove the need

                  for this attribute by a) adding the atomic memcpy

                  variants to the exclude list in callsGCLeafFunction,

                  and b) using the existing "gc-leaf-function" on most

                  calls the frontend generates. 

                  <br class="">

                </p>

                <p class=""><br class="">

                </p>

                <p class="">Second, let's discuss the signature for the

                  runtime function.</p>

                <p class="">I think you should use a signature for the

                  runtime call which takes base pointers and offsets,

                  not base pointers and derived pointers.  Why?  Because

                  passing derived pointers in registers for arguments

                  presumes that the runtime knows how to map a record in

                  the stackmap to where a callee might have shuffled the

                  argument to.  Some runtimes may support this, others

                  may not.  Given using the offset scheme is just as

                  simple to implement, being considerate and minimizing

                  the runtime support required seems worthwhile.  <br

                    class="">

                </p>

                <p class="">On x86, the cost of a subtract (to produce

                  the offset in the worst case), and an LEA (to produce

                  the derived pointer again inside the runtime routine)

                  is pretty minimal.  Particular since the former is

                  likely to be optimized away and the later folded into

                  the addressing mode.  <br class="">

                </p>

                <p class=""><br class="">

                </p>

                <p class="">Finally, it's also worth noting that some

                  (but not all) GCs can convert from an interior derived

                  pointer to the base of the containing object.  With

                  the memcpy family we know that either the pointers are

                  all interior derived, or the length must be zero. 

                  This is not true for all GCs and thus we don't want to

                  rely on it.<br class="">

                </p>

              </div>

            </div>

          </blockquote>

          <div>Do you think it makes sense to control this aspect of

            lowering (derived pointers vs base+offset in memcpy args)

            using GCStrategy?</div>

        </div>

      </div>

    </blockquote>

    I would not bother.  The performance difference is tiny, and no one

    is to my knowledge using LLVM for such a use case.  If we have a

    reported regression, we can address then.<br>

    <blockquote type="cite"

      cite="mid:38F308BC-0D3E-4A6D-AA5F-05B3FECFAE62@azul.com">

      <div class="">

        <div>

          <div><br class="">

          </div>

          Artur<br class="">

          <blockquote type="cite" class="">

            <div class="">

              <div class="">

                <p class="">Philip<br class="">

                </p>

                <p class=""><br class="">

                </p>

                <div class="moz-cite-prefix">On 9/18/20 4:51 PM, Artur

                  Pilipenko via llvm-dev wrote:<br class="">

                </div>

                <blockquote type="cite"

                  cite="mid:12CEE083-CA0C-4974-AC6E-F073D4B2CEB2@azul.com"

                  class="">

                  <div class="">TLDR: a proposal to add GC-parseable

                    lowering to element atomic </div>

                  <div class="">memcpy/memmove instrinsics controlled by

                    a new "requires-statepoint” </div>

                  <div class="">call attribute. </div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Currently

                    llvm.{memcpy|memmove}.element.unordered.atomic calls

                    are </div>

                  <div class="">considered as GC leaf functions (like

                    most other intrinsics). As a </div>

                  <div class="">result GC cannot occur while copy

                    operation is in progress. This might</div>

                  <div class="">have negative effect on GC latencies

                    when large amounts of data are </div>

                  <div class="">copied. To avoid this problem copying

                    large amounts of data can be </div>

                  <div class="">done in chunks with GC safepoints in

                    between. We'd like to be able to </div>

                  <div class="">represent such copy using existing

                    instrinsics [1].</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">For that I'd like to propose a new

                    attribute for </div>

                  <div class="">llvm.{memcpy|memmove}.element.unordered.atomic

                    calls </div>

                  <div class="">"requires-statepoint". This attribute on

                    a call will result in a </div>

                  <div class="">different lowering, which makes it

                    possible to have a GC safepoint </div>

                  <div class="">during the copy operation.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">There are three parts to the new

                    lowering:</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">1) The calls with the new attribute will

                    be wrapped into a statepoint </div>

                  <div class="">by RewriteStatepointsForGC (RS4GC). This

                    way the stack at the calls </div>

                  <div class="">will be GC parceable. </div>

                  <div class=""><br class="">

                  </div>

                  <div class="">2) Currently these intrinsics are

                    lowered to GC leaf calls to the symbols</div>

                  <div class="">__llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. </div>

                  <div class="">The calls with the new attribute will be

                    lowered to calls to different </div>

                  <div class="">symbols, let's say</div>

                  <div class="">__llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.</div>

                  <div class="">This way the runtime can provide copy

                    implementations with safepoints.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">3) Currently memcpy/memmove calls take

                    derived pointers as arguments. </div>

                  <div class="">If we copy with safepoints we might need

                    to relocate the underlying </div>

                  <div class="">source/destination objects on a

                    safepoint. In order to do this we need </div>

                  <div class="">to know the base pointers as well. How

                    do we make the base pointers </div>

                  <div class="">available in the copy routine? I suggest

                    we add them explicitly as </div>

                  <div class="">arguments during lowering. </div>

                  <div class=""><br class="">

                  </div>

                  <div class="">For example: </div>

                  <div class="">__llvm_memcpy_element_unordered_atomic_safepoint_1(</div>

                  <div class="">  dest_base, dest_derived, src_base,

                    src_derived, length)</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">It will be up to RS4GC to do the new

                    lowering and prepare the arguments.</div>

                  <div class="">RS4GC knows how to compute base pointers

                    for a given derived pointer.</div>

                  <div class="">It also already does lowering for

                    deoptimize intrinsics by replacing </div>

                  <div class="">an intrinsic call with a symbol call. So

                    there is a precedent here.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Other alternatives:</div>

                  <div class="">- Change

                    llvm.{memcpy|memmove}.element.unordered.atomic API

                    to accept </div>

                  <div class="">  base pointers + offsets instead of

                    derived pointers. This will </div>

                  <div class="">  require autoupgrade of old

                    representation. Changing API of a generic </div>

                  <div class="">  intrinsic to facilitate GC-specific

                    lowering doesn't look like the</div>

                  <div class="">  best idea. This will not work if we

                    want to do the same for non-atomic </div>

                  <div class="">  intrinsics.</div>

                  <div class="">- Teach GC infrastructure to record base

                    pointers for all derived </div>

                  <div class="">  pointer arguments. This looks like an

                    overkill for single use case.</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Here is the proposed implementation in a

                    single patch:</div>

                  <div class=""><a

                      href="https://reviews.llvm.org/D87954" class=""

                      moz-do-not-send="true">https://reviews.llvm.org/D87954</a></div>

                  <div class="">If there are no objections I will split

                    it into individual reviews and</div>

                  <div class="">add langref changes. </div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Thoughts?</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">Artur</div>

                  <div class=""><br class="">

                  </div>

                  <div class="">[1] An alternative approach would be to

                    make the frontend generate a </div>

                  <div class="">chunked copy loop with a safepoint

                    inside. The downsides are: </div>

                  <div class="">- It's harder for the optimizer to see

                    that this loop is just a copy</div>

                  <div class="">  of a range of bytes. </div>

                  <div class="">- It forces one particular lowering with

                    the chunked loop inlined in </div>

                  <div class="">  compiled code. We can't outline the

                    copy loop into the copy routine. </div>

                  <div class="">  With the intrinsic representation of a

                    chunked copy we can choose </div>

                  <div class="">  different lowering strategies if we

                    want. </div>

                  <div class="">- In our system we have to outline the

                    copy loop into the copy routine</div>

                  <div class="">  due to interactions with

                    deoptimization.</div>

                  <br class="">

                  <fieldset class="mimeAttachmentHeader"></fieldset>

                  <pre class="moz-quote-pre" wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

                </blockquote>

              </div>

            </div>

          </blockquote>

        </div>

        <br class="">

      </div>

    </blockquote>

  </body>

</html>