[llvm-dev] GC-parseable element atomic memcpy/memmove

Philip Reames via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 30 13:08:21 PDT 2020


On 9/29/20 9:11 PM, Artur Pilipenko wrote:
> Thanks for the feedback.
>
> I think both of the suggestions are very reasonable. I’ll incorporate 
> them.
>
> Given there were no objections for two weeks, I’m going to go ahead 
> with posting individual patches for review.
>
> One small question inline:
>
>> On Sep 28, 2020, at 10:56 AM, Philip Reames 
>> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>>
>> In general, I am supportive of this direction.  It seems like an 
>> entirely reasonable solution.  I do have some comments below, but 
>> they're mostly of the "how do we generalize this?" variety.
>>
>>
>> First, let's touch on the attribute.
>>
>> My first concern is naming; I think the use of "statepoint" here is 
>> problematic as this doesn't relate to lowering strategy needed (e.g. 
>> statepoints), but the conceptual support (e.g. a safepoint).  This 
>> could be resolved by simply tweaking to require-safepoint.
>>
>> But that brings us to a broader point. We've chosen to build in the 
>> fact intrinsics don't require safepoints.  If all we want is for some 
>> intrinsics *to* require safepoints, why isn't this simply a tweak to 
>> the existing code? callsGCLeafFunction already has a small list of 
>> intrinsics which can have safepoints.
>>
>> I think you can completely remove the need for this attribute by a) 
>> adding the atomic memcpy variants to the exclude list in 
>> callsGCLeafFunction, and b) using the existing "gc-leaf-function" on 
>> most calls the frontend generates.
>>
>>
>> Second, let's discuss the signature for the runtime function.
>>
>> I think you should use a signature for the runtime call which takes 
>> base pointers and offsets, not base pointers and derived pointers.  
>> Why?  Because passing derived pointers in registers for arguments 
>> presumes that the runtime knows how to map a record in the stackmap 
>> to where a callee might have shuffled the argument to.  Some runtimes 
>> may support this, others may not.  Given using the offset scheme is 
>> just as simple to implement, being considerate and minimizing the 
>> runtime support required seems worthwhile.
>>
>> On x86, the cost of a subtract (to produce the offset in the worst 
>> case), and an LEA (to produce the derived pointer again inside the 
>> runtime routine) is pretty minimal.  Particular since the former is 
>> likely to be optimized away and the later folded into the addressing 
>> mode.
>>
>>
>> Finally, it's also worth noting that some (but not all) GCs can 
>> convert from an interior derived pointer to the base of the 
>> containing object.  With the memcpy family we know that either the 
>> pointers are all interior derived, or the length must be zero. This 
>> is not true for all GCs and thus we don't want to rely on it.
>>
> Do you think it makes sense to control this aspect of lowering 
> (derived pointers vs base+offset in memcpy args) using GCStrategy?
I would not bother.  The performance difference is tiny, and no one is 
to my knowledge using LLVM for such a use case.  If we have a reported 
regression, we can address then.
>
> Artur
>>
>> Philip
>>
>>
>> On 9/18/20 4:51 PM, Artur Pilipenko via llvm-dev wrote:
>>> TLDR: a proposal to add GC-parseable lowering to element atomic
>>> memcpy/memmove instrinsics controlled by a new "requires-statepoint”
>>> call attribute.
>>>
>>> Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are
>>> considered as GC leaf functions (like most other intrinsics). As a
>>> result GC cannot occur while copy operation is in progress. This might
>>> have negative effect on GC latencies when large amounts of data are
>>> copied. To avoid this problem copying large amounts of data can be
>>> done in chunks with GC safepoints in between. We'd like to be able to
>>> represent such copy using existing instrinsics [1].
>>>
>>> For that I'd like to propose a new attribute for
>>> llvm.{memcpy|memmove}.element.unordered.atomic calls
>>> "requires-statepoint". This attribute on a call will result in a
>>> different lowering, which makes it possible to have a GC safepoint
>>> during the copy operation.
>>>
>>> There are three parts to the new lowering:
>>>
>>> 1) The calls with the new attribute will be wrapped into a statepoint
>>> by RewriteStatepointsForGC (RS4GC). This way the stack at the calls
>>> will be GC parceable.
>>>
>>> 2) Currently these intrinsics are lowered to GC leaf calls to the 
>>> symbols
>>> __llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>.
>>> The calls with the new attribute will be lowered to calls to different
>>> symbols, let's say
>>> __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>.
>>> This way the runtime can provide copy implementations with safepoints.
>>>
>>> 3) Currently memcpy/memmove calls take derived pointers as arguments.
>>> If we copy with safepoints we might need to relocate the underlying
>>> source/destination objects on a safepoint. In order to do this we need
>>> to know the base pointers as well. How do we make the base pointers
>>> available in the copy routine? I suggest we add them explicitly as
>>> arguments during lowering.
>>>
>>> For example:
>>> __llvm_memcpy_element_unordered_atomic_safepoint_1(
>>>   dest_base, dest_derived, src_base, src_derived, length)
>>>
>>> It will be up to RS4GC to do the new lowering and prepare the arguments.
>>> RS4GC knows how to compute base pointers for a given derived pointer.
>>> It also already does lowering for deoptimize intrinsics by replacing
>>> an intrinsic call with a symbol call. So there is a precedent here.
>>>
>>> Other alternatives:
>>> - Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept
>>>   base pointers + offsets instead of derived pointers. This will
>>>   require autoupgrade of old representation. Changing API of a generic
>>>   intrinsic to facilitate GC-specific lowering doesn't look like the
>>>   best idea. This will not work if we want to do the same for 
>>> non-atomic
>>>   intrinsics.
>>> - Teach GC infrastructure to record base pointers for all derived
>>>   pointer arguments. This looks like an overkill for single use case.
>>>
>>> Here is the proposed implementation in a single patch:
>>> https://reviews.llvm.org/D87954
>>> If there are no objections I will split it into individual reviews and
>>> add langref changes.
>>>
>>> Thoughts?
>>>
>>> Artur
>>>
>>> [1] An alternative approach would be to make the frontend generate a
>>> chunked copy loop with a safepoint inside. The downsides are:
>>> - It's harder for the optimizer to see that this loop is just a copy
>>>   of a range of bytes.
>>> - It forces one particular lowering with the chunked loop inlined in
>>>   compiled code. We can't outline the copy loop into the copy routine.
>>>   With the intrinsic representation of a chunked copy we can choose
>>>   different lowering strategies if we want.
>>> - In our system we have to outline the copy loop into the copy routine
>>>   due to interactions with deoptimization.
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200930/373a7562/attachment.html>


More information about the llvm-dev mailing list