[cfe-dev] Clang builtins for C++20 STL features

James Y Knight via cfe-dev cfe-dev at lists.llvm.org
Tue Dec 11 15:50:27 PST 2018


After considering this a bit, I'm not sure the __builtin_clear_padding
proposal is workable. Worse, I'm not sure that the standards proposal
itself is workable, either.

1. Regarding general implementability of P0528:

Given a call to
  bool atomic_compare_exchange_strong(std::atomic<T>* obj, T* expected, T
desired)
we need to first, compare OBJ to EXPECTED -- ignoring padding. Then:
a. if they're equal, store DESIRED into the bytes pointed to by OBJ.
b. if they're not equal, copy the bytes pointed to by OBJ into the bytes
pointed to by EXPECTED.

The problem is: how to do a comparison ignoring padding? We cannot
reasonably modify the comparison operation itself (it's either a hardware
instruction, or else a library routine taking only a size). The obvious
solution is to clear the padding on both sides of the comparison (thus
__builtin_clear_padding).

We can clear the padding in EXPECTED, although we'll need to make a copy
first, as we're not allowed to store into it unless the compare-exchange
fails. But, we cannot modify OBJ except as part of the "exchange", when the
comparison has already succeeded.

If we can ensure that the existing value in OBJ has always had its padding
cleared, we're okay. We can be fully in-control of all stores that occur
into a std::atomic, and, I believe that's true for C11 _Atomic types as
well. (it's unclear to me whether using memcpy to non-atomically overwrite
a C11 "_Atomic struct Foo" is allowable, but I'll assume that it is not.)

However, it seems infeasible to make this guarantee for std::atomic_ref, as
that can be created pointing to any existing object. Said existing object
may have its padding in a non-zero state. Of course, <
http://wg21.link/P0528r3> doesn't mention atomic_ref, but the atomic_ref
paper, <http://wg21.link/P0019r8> mentions this concern, and simply states
"We require that the resolution of padding bits follow [P0528r2]."

It's also infeasible to ensure for the GCC __atomic_compare_exchange
builtin, as it also operates on arbitrary existing objects. (Of course,
this is irrelevant to the C++ standard.)

I believe a workable alternative would be to generate an extra load of OBJ
into a temporary memory, copy only the non-padding bytes of EXPECTED on top
of that, and then do the compare_exchange operation of OBJ with the
temporary. Finally, copy the temporary back into EXPECTED upon a fail
result. While that seems like it could work, requiring an extra load before
compare-exchange operations is rather poor.

Which leads into...

2. Do we really need to make this change?

The initial paper gives a sample program in http://wg21.link/P0528r0
asserting that it might infinite loop. That would be quite unfortunate.

Repeating the example here, it's basically:
```
  T desired = ...;
  T expected;
  while (
    !atomic->compare_exchange_strong(
      expected,
      desired // Padding bits added and removed here
  ));
```

However, that padding bits in DESIRED may be modified is not super
problematic. As long as padding bits in EXPECTED (passed by reference) are
not spuriously-modified, we should be fine.

That is, in practice, I believe the loop should execute at most twice, not
infinitely. Upon the first failure, all the bits (including padding)
pointed to by ATOMIC are copied into the object pointed to by EXPECTED (a
reference parameter). At the next iteration of the loop, the call will
compare all the bits, including the padding, which was copied as-is from
OBJ. As nothing writes to EXPECTED between the two calls, the padding bits
will then remain unchanged, and therefore the comparison will succeed on
the second iteration.

However, Richard Smith informs me that while the above is true in Clang,
apparently in theory it is broken -- nothing in the C++ standard prohibits
padding bits of an object from be discarded or modified arbitrarily, at any
time! That is, according to the spec (but not Clang), the below function
"foo" could arbitrarily return non-zero, randomly! (I used "my_mem*" simply
for demonstrating that even if memcpy/memcmp calls themselves are simply
function calls and not specially-handled, the problem remains.). One could
imagine this occurring in a non-pathologically-evil compiler, if it were to
copy EXPECTED into registers and then back into memory between the two
calls -- and while doing so, only copies the non-padding parts of the
object, for efficiency.

```
struct Foo { char x; /* padding exists here */ long y; };

int foo(char *mem) {
  Foo expected;
  my_memcpy((char*)&expected, mem, sizeof(Foo));
  return my_memcmp((char*)&expected, mem, sizeof(Foo));
}
```

The C standard does appear to prohibit this behavior (C11 6.2.6.1/6),
specifying that the padding bytes get set to unspecified values only upon
writes to the object.

In summary: in C++, it would appear that the motivating example can in
theory infinite-loop, but in Clang, and the C standard, it'd seem that it
cannot.

Changing C++ to require the same semantics for padding stability as C could
be a reasonable alternative resolution to the original issue, especially if
no implementations actually make use of this allowance today. You may still
have a single spurious failure, but it would never infinite-loop.


3. Finally, if the decision is to go forward with P0528 anyways, and if
padding bits *can* arbitrarily change, then `__builtin_clear_padding(T*
ptr)` is not a theoretically sound interface, because the padding bits
could change again, after the call to clear them returns.

We'd need to either implement the support within the atomic builtins
themselves (which, as noted, cannot be done within the existing GCC
builtins), or define some alternative interface.


On Fri, Nov 30, 2018 at 4:13 PM JF Bastien <jfbastien at apple.com> wrote:

>
>
> On Nov 30, 2018, at 1:10 PM, James Y Knight <jyknight at google.com> wrote:
>
>
>
> On Fri, Nov 30, 2018 at 4:06 PM JF Bastien <jfbastien at apple.com> wrote:
>
>>
>>
>> On Nov 30, 2018, at 12:39 PM, Casey Carter via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> On Fri, Nov 30, 2018 at 12:35 PM James Y Knight via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> On Thu, Nov 29, 2018, 9:53 PM Richard Smith via cfe-dev <
>>> cfe-dev at lists.llvm.org wrote:
>>>
>>>> * P0528R3 Atomic Compare-And-Exchange With Padding Bits
>>>>> We need compiler magic here, in some form. Billy O'Neal wrote to the
>>>>> C1XX team: "To implement the new atomic_ref as well as the change to
>>>>> compare the value representation of atomics only, the library needs a way
>>>>> to zero out the padding in arbitrary T, which we can't discover with
>>>>> library tech alone. We would like an intrinsic that accepts a
>>>>> trivially-copyable T and produces a copy with the padding zeroed, or takes
>>>>> a T* and zeros the padding inside that T, or similar."
>>>>>
>>>>
>>>> I think this should be done in-place in memory; producing a copy has
>>>> the problem that you're passing around a value of type T, and that might
>>>> permit the padding bits to become undefined again.
>>>>
>>>> void __builtin_clear_padding(T *ptr)
>>>> Effects: Set to zero all bits that are padding bits in the
>>>> representation of every value of type T.
>>>>
>>>
>>> Is the intent here that every value stored into a std::atomic (e.g. via
>>> store, exchange, or compare_exchange) would be passed through
>>> __builtin_clear_padding first, before being stored into the atomic object?
>>> And presumably the same would need to occur implicitly for C-style `_Atomic
>>> T`?
>>>
>>
>> Yes; see P0582R3 "The Curious Case of Padding Bits" (
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html)
>> for details. The intent is to make compare-exchange work for types with
>> padding bits by zeroing those padding bits before storing a value in an
>> atomic or comparing with the value stored in an atomic.
>>
>>
>> I think you rather want to zero out padding bits in the atomic, not in
>> the value, and you want to do it after storing into the atomic (unless
>> you’re doing zeroing, and then copying the non-padding elements one at a
>> time, in which case you could just memset instead of having a builtin).
>>
>
> Errr...certainly you can't do additional stores into an atomic after
> storing a new value into it, that would kinda ruin the "atomic" part.
>
>
> Haha yeah you’re totally right! Ignore me, no idea what I was thinking.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181211/a317425c/attachment.html>


More information about the cfe-dev mailing list