<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">TLDR: a proposal to add GC-parseable lowering to element atomic </div>
<div class="">memcpy/memmove instrinsics controlled by a new "requires-statepoint” </div>
<div class="">call attribute. </div>
<div class=""><br class="">
</div>
<div class="">Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are </div>
<div class="">considered as GC leaf functions (like most other intrinsics). As a </div>
<div class="">result GC cannot occur while copy operation is in progress. This might</div>
<div class="">have negative effect on GC latencies when large amounts of data are </div>
<div class="">copied. To avoid this problem copying large amounts of data can be </div>
<div class="">done in chunks with GC safepoints in between. We'd like to be able to </div>
<div class="">represent such copy using existing instrinsics [1].</div>
<div class=""><br class="">
</div>
<div class="">For that I'd like to propose a new attribute for </div>
<div class="">llvm.{memcpy|memmove}.element.unordered.atomic calls </div>
<div class="">"requires-statepoint". This attribute on a call will result in a </div>
<div class="">different lowering, which makes it possible to have a GC safepoint </div>
<div class="">during the copy operation.</div>
<div class=""><br class="">
</div>
<div class="">There are three parts to the new lowering:</div>
<div class=""><br class="">
</div>
<div class="">1) The calls with the new attribute will be wrapped into a statepoint </div>
<div class="">by RewriteStatepointsForGC (RS4GC). This way the stack at the calls </div>
<div class="">will be GC parceable. </div>
<div class=""><br class="">
</div>
<div class="">2) Currently these intrinsics are lowered to GC leaf calls to the symbols</div>
<div class="">__llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. </div>
<div class="">The calls with the new attribute will be lowered to calls to different </div>
<div class="">symbols, let's say</div>
<div class="">__llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.</div>
<div class="">This way the runtime can provide copy implementations with safepoints.</div>
<div class=""><br class="">
</div>
<div class="">3) Currently memcpy/memmove calls take derived pointers as arguments. </div>
<div class="">If we copy with safepoints we might need to relocate the underlying </div>
<div class="">source/destination objects on a safepoint. In order to do this we need </div>
<div class="">to know the base pointers as well. How do we make the base pointers </div>
<div class="">available in the copy routine? I suggest we add them explicitly as </div>
<div class="">arguments during lowering. </div>
<div class=""><br class="">
</div>
<div class="">For example: </div>
<div class="">__llvm_memcpy_element_unordered_atomic_safepoint_1(</div>
<div class=""> dest_base, dest_derived, src_base, src_derived, length)</div>
<div class=""><br class="">
</div>
<div class="">It will be up to RS4GC to do the new lowering and prepare the arguments.</div>
<div class="">RS4GC knows how to compute base pointers for a given derived pointer.</div>
<div class="">It also already does lowering for deoptimize intrinsics by replacing </div>
<div class="">an intrinsic call with a symbol call. So there is a precedent here.</div>
<div class=""><br class="">
</div>
<div class="">Other alternatives:</div>
<div class="">- Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept </div>
<div class=""> base pointers + offsets instead of derived pointers. This will </div>
<div class=""> require autoupgrade of old representation. Changing API of a generic </div>
<div class=""> intrinsic to facilitate GC-specific lowering doesn't look like the</div>
<div class=""> best idea. This will not work if we want to do the same for non-atomic </div>
<div class=""> intrinsics.</div>
<div class="">- Teach GC infrastructure to record base pointers for all derived </div>
<div class=""> pointer arguments. This looks like an overkill for single use case.</div>
<div class=""><br class="">
</div>
<div class="">Here is the proposed implementation in a single patch:</div>
<div class=""><a href="https://reviews.llvm.org/D87954" class="">https://reviews.llvm.org/D87954</a></div>
<div class="">If there are no objections I will split it into individual reviews and</div>
<div class="">add langref changes. </div>
<div class=""><br class="">
</div>
<div class="">Thoughts?</div>
<div class=""><br class="">
</div>
<div class="">Artur</div>
<div class=""><br class="">
</div>
<div class="">[1] An alternative approach would be to make the frontend generate a </div>
<div class="">chunked copy loop with a safepoint inside. The downsides are: </div>
<div class="">- It's harder for the optimizer to see that this loop is just a copy</div>
<div class=""> of a range of bytes. </div>
<div class="">- It forces one particular lowering with the chunked loop inlined in </div>
<div class=""> compiled code. We can't outline the copy loop into the copy routine. </div>
<div class=""> With the intrinsic representation of a chunked copy we can choose </div>
<div class=""> different lowering strategies if we want. </div>
<div class="">- In our system we have to outline the copy loop into the copy routine</div>
<div class=""> due to interactions with deoptimization.</div>
</body>
</html>