[cfe-dev] Clang builtins for C++20 STL features

Fri Dec 14 06:34:46 PST 2018

On Thu, Dec 13, 2018 at 2:45 PM JF Bastien <jfbastien at apple.com> wrote:

> On Tue, Dec 11, 2018 at 6:51 PM James Y Knight via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> After considering this a bit, I'm not sure the __builtin_clear_padding
>> proposal is workable. Worse, I'm not sure that the standards proposal
>> itself is workable, either.
>>
>> 1. Regarding general implementability of P0528:
>>
>> Given a call to
>>   bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T
>> desired)
>> we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
>> a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
>> b. if they're not equal, copy the bytes pointed to by OBJ into the bytes
>> pointed to by EXPECTED.
>>
>> The problem is: how to do a comparison ignoring padding? We cannot
>> reasonably modify the comparison operation itself (it's either a hardware
>> instruction, or else a library routine taking only a size). The obvious
>> solution is to clear the padding on both sides of the comparison (thus
>> __builtin_clear_padding).
>>
>> We can clear the padding in EXPECTED, although we'll need to make a copy
>> first, as we're not allowed to store into it unless the compare-exchange
>> fails. But, we cannot modify OBJ except as part of the "exchange", when the
>> comparison has already succeeded.
>>
>> If we can ensure that the existing value in OBJ has always had its
>> padding cleared, we're okay. We can be fully in-control of all stores that
>> occur into a std::atomic, and, I believe that's true for C11 _Atomic types
>> as well. (it's unclear to me whether using memcpy to non-atomically
>> overwrite a C11 "_Atomic struct Foo" is allowable, but I'll assume that it
>> is not.)
>>
>> However, it seems infeasible to make this guarantee for std::atomic_ref,
>> as that can be created pointing to any existing object. Said existing
>> object may have its padding in a non-zero state. Of course, <
>> http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref
>> paper, <http://wg21.link/P0019r8> mentions this concern, and simply
>> states "We require that the resolution of padding bits follow [P0528r2]."
>>
>> It's also infeasible to ensure for the GCC __atomic_compare_exchange
>> builtin, as it also operates on arbitrary existing objects. (Of course,
>> this is irrelevant to the C++ standard.)
>>
>> I believe a workable alternative would be to generate an extra load of
>> OBJ into a temporary memory, copy only the non-padding bytes of EXPECTED on
>> top of that, and then do the compare_exchange operation of OBJ with the
>> temporary. Finally, copy the temporary back into EXPECTED upon a fail
>> result. While that seems like it could work, requiring an extra load before
>> compare-exchange operations is rather poor.
>>
>
> I suggest you synchronize with your company’s C++ representatives, who
> were present at the EWG discussion which led to this change. They were
> strongly in favor of this direction, and AFAIK it was understood that there
> would be a cost for structs with padding bits.
>

I'm not on the standards committee myself, and, while I am employed by
Google, I do not generally coordinate my llvm-dev emails with my company,
nor do I have a list of all Google employees who attend which standards
body meetings readily available. However, certainly some of said
representatives are on this list and can read this message, and could chime
in with their own opinions if they so desire. Or, if you tell me which
individuals you mean, I could contact them directly.

Was requiring that the previous value of the atomic be loaded during a
compare_exchange the understood implementation strategy in committee
discussions? From what I can tell from the outside, it seemed like people
were in favor of the change because it was thought it would be easy and
cheap to implement. Adding an extraneous load of the atomic value seems to
me a whole different matter in terms of cost, and not one I would expect
that people would've been readily willing to agree to.

Had it been discussed that the example in the R0 paper was not an
infinite-loop because padding is unspecified during object initialization
object copies, as the paper suggests is the case? But, that there is a
potential infinite loop only if the C++ standard allows the padding bits in
"expected" to arbitrarily modify themselves at any time between the
iterations of the loop, which, unfortunately, it would appear to allow?

Which leads into...
>>
>> 2. Do we really need to make this change?
>>
>> The initial paper gives a sample program in http://wg21.link/P0528r0 asserting that
>> it might infinite loop. That would be quite unfortunate.
>>
>> Repeating the example here, it's basically:
>> ```
>>   T desired = ...;
>>   T expected;
>>   while (
>>     !atomic->compare_exchange_strong(
>>       expected,
>>       desired // Padding bits added and removed here
>>   ));
>> ```
>>
>> However, that padding bits in DESIRED may be modified is not super
>> problematic. As long as padding bits in EXPECTED (passed by reference) are
>> not spuriously-modified, we should be fine.
>>
>> That is, in practice, I believe the loop should execute at most twice,
>> not infinitely. Upon the first failure, all the bits (including padding)
>> pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a
>> reference parameter). At the next iteration of the loop, the call will
>> compare all the bits, including the padding, which was copied as-is from
>> OBJ. As nothing writes to EXPECTED between the two calls, the padding bits
>> will then remain unchanged, and therefore the comparison will succeed on
>> the second iteration.
>>
>> However, Richard Smith informs me that while the above is true in Clang,
>> apparently in theory it is broken -- nothing in the C++ standard prohibits
>> padding bits of an object from be discarded or modified arbitrarily, at any
>> time! That is, according to the spec (but not Clang), the below function
>> "foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply
>> for demonstrating that even if memcpy/memcmp calls themselves are simply
>> function calls and not specially-handled, the problem remains.). One could
>> imagine this occurring in a non-pathologically-evil compiler, if it were to
>> copy EXPECTED into registers and then back into memory between the two
>> calls -- and while doing so, only copies the non-padding parts of the
>> object, for efficiency.
>>
>> ```
>> struct Foo { char x; /* padding exists here */ long y; };
>>
>> int foo(char *mem) {
>>   Foo expected;
>>   my_memcpy((char*)&expected, mem, sizeof(Foo));
>>   return my_memcmp((char*)&expected, mem, sizeof(Foo));
>> }
>> ```
>>
>> The C standard does appear to prohibit this behavior (C11 6.2.6.1/6),
>> specifying that the padding bytes get set to unspecified values only upon
>> writes to the object.
>>
>> In summary: in C++, it would appear that the motivating example can in
>> theory infinite-loop, but in Clang, and the C standard, it'd seem that it
>> cannot.
>>
>> Changing C++ to require the same semantics for padding stability as C
>> could be a reasonable alternative resolution to the original issue,
>> especially if no implementations actually make use of this allowance today.
>> You may still have a single spurious failure, but it would never
>> infinite-loop.
>>
>>
>> 3. Finally, if the decision is to go forward with P0528 anyways, and if
>> padding bits *can* arbitrarily change, then `__builtin_clear_padding(T*
>> ptr)` is not a theoretically sound interface, because the padding bits
>> could change again, after the call to clear them returns.
>>
>> We'd need to either implement the support within the atomic builtins
>> themselves (which, as noted, cannot be done within the existing GCC
>> builtins), or define some alternative interface.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181214/801e58cf/attachment.html>