[llvm-dev] GC-parseable element atomic memcpy/memmove

Fri Sep 18 16:51:16 PDT 2020

TLDR: a proposal to add GC-parseable lowering to element atomic
memcpy/memmove instrinsics controlled by a new "requires-statepoint”
call attribute.

Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are
considered as GC leaf functions (like most other intrinsics). As a
result GC cannot occur while copy operation is in progress. This might
have negative effect on GC latencies when large amounts of data are
copied. To avoid this problem copying large amounts of data can be
done in chunks with GC safepoints in between. We'd like to be able to
represent such copy using existing instrinsics [1].

For that I'd like to propose a new attribute for
llvm.{memcpy|memmove}.element.unordered.atomic calls
"requires-statepoint". This attribute on a call will result in a
different lowering, which makes it possible to have a GC safepoint
during the copy operation.

There are three parts to the new lowering:

1) The calls with the new attribute will be wrapped into a statepoint
by RewriteStatepointsForGC (RS4GC). This way the stack at the calls
will be GC parceable.

2) Currently these intrinsics are lowered to GC leaf calls to the symbols
__llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>.
The calls with the new attribute will be lowered to calls to different
symbols, let's say
__llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.
This way the runtime can provide copy implementations with safepoints.

3) Currently memcpy/memmove calls take derived pointers as arguments.
If we copy with safepoints we might need to relocate the underlying
source/destination objects on a safepoint. In order to do this we need
to know the base pointers as well. How do we make the base pointers
available in the copy routine? I suggest we add them explicitly as
arguments during lowering.

For example:
__llvm_memcpy_element_unordered_atomic_safepoint_1(
  dest_base, dest_derived, src_base, src_derived, length)

It will be up to RS4GC to do the new lowering and prepare the arguments.
RS4GC knows how to compute base pointers for a given derived pointer.
It also already does lowering for deoptimize intrinsics by replacing
an intrinsic call with a symbol call. So there is a precedent here.

Other alternatives:
- Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept
  base pointers + offsets instead of derived pointers. This will
  require autoupgrade of old representation. Changing API of a generic
  intrinsic to facilitate GC-specific lowering doesn't look like the
  best idea. This will not work if we want to do the same for non-atomic
  intrinsics.
- Teach GC infrastructure to record base pointers for all derived
  pointer arguments. This looks like an overkill for single use case.

Here is the proposed implementation in a single patch:
https://reviews.llvm.org/D87954
If there are no objections I will split it into individual reviews and
add langref changes.

Thoughts?

Artur

[1] An alternative approach would be to make the frontend generate a
chunked copy loop with a safepoint inside. The downsides are:
- It's harder for the optimizer to see that this loop is just a copy
  of a range of bytes.
- It forces one particular lowering with the chunked loop inlined in
  compiled code. We can't outline the copy loop into the copy routine.
  With the intrinsic representation of a chunked copy we can choose
  different lowering strategies if we want.
- In our system we have to outline the copy loop into the copy routine
  due to interactions with deoptimization.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200918/91173390/attachment.html>