[libcxx-commits] [PATCH] D96946: [libcxx][RFC] Unspecified behavior randomization in libcxx

Thu Feb 18 09:23:39 PST 2021

danlark added a comment.

In D96946#2571506 <https://reviews.llvm.org/D96946#2571506>, @ldionne wrote:

> First of all, thanks for writing a design document and explaining the purpose of your patch. Let me try to understand the use case a bit better.
>
> So basically, let's say you're trying to adopt a new implementation of `std::sort` internally. You change that, but then you realize that a bunch of tests break because they were relying on the exact order of equal elements after `std::sort`, which is really unspecified. But since those tests are failing, you can't use the new `std::sort` implementation until all the tests have been fixed. So, in order to make that transition easier and allow fixing tests one by one, you add this feature. You then turn it on locally for some projects/tests inside your organization, and you fix them one by one. Once you're done, you can switch the algorithm implementation, and turn off the randomization. Is that the idea?

Yes, that's a correct understanding except switching off the randomization would not happen. It should be a by default feature for us to preserve new cases of stability dependencies in some debug builds, in release builds, we are not going to do any randomization but we can guarantee that new sorting algorithm works as best as we can test.

> The main problem I see with this reasoning is that it doesn't really help libc++ as a project to migrate to a new algorithm. Libc++ is used by plenty of people who might be relying on the order of elements after calling `std::sort`, and they'll never know about this setting before we break their code. Are we comfortable with breaking such users? I mean, we're technically allowed to by the Standard, but still. I *think* I'd be fine with that, but this needs a bit of discussion.
>
> Technically speaking, there's a few things on my mind while reviewing this:
>
> 1. Is there a bad interaction because we instantiate some algorithms in the `dylib`. I checked and I don't think that's the case.
> 2. Is there a way we can achieve the same with some sort of sanitizer or tool external to libc++ instead? If we could do that, I think it would certainly deliver more value in the long term, and it would certainly be more powerful than manually-added checks in libc++.
> 3. How can we minimize the complexity we're adding to libc++? This is the Standard Library, and literally everybody wants to add their "very small tweak" to it. If we're not very aggressive about reducing complexity, it becomes a mess. For example, do you really need to be able to specify the seed? Can you explain why? And in case it almost always makes sense to specify a seed, can we make it mandatory instead? I'm trying to reduce the number of configurations we have to maintain.
> 4. If we do this, then does it make sense to group it with other features such as the Debug mode? It seems to me that this effort, iterator invalidation checking and other similar checks kind of fall into the same bucket of "things that turn silent misuses into loud failures so you diagnose them before they hit prod". If we decide to go ahead with your patch, I think it would be useful to have a wider vision for how we handle these sorts of additions in the future.
>
> Requesting changes so it shows up properly in the queue. Let's have a discussion.
>
> Thanks!

I think we should at some point break users in algorithms, for example, in sorting because we are not compliant now and new algorithms just evolve. Providing an option to migrate is possibly the best we can do

1. No, no bad interaction
2. That's possible for the compiler to introduce some hooks before calling std::sort, however, I thought, it would be easier just to introduce an option if they ever wonder why their tests are failing after the migration to a new version of libc++. Ideally this should be another type of sanitizer which I scared to suggest, to be honest, and patching libcxx seemed much easier and relevant
3. The idea for seed was to have reproducible results no matter. I am fine with removing it if you feel this is an unnecessary complication. Having random seed per run was the initial idea so that the tests become flaky and the users notice
4. I am fine with moving it under debug, I did not do it right away because DEBUG mode is deterministic currently and I thought another option is still good. If you are ok with moving it under debug, me too

================
Comment at: libcxx/include/algorithm:3113
+
+template <class _Dummy>
+class _LIBCPP_TYPE_VIS __random_unspecified_behavior_gen_
----------------
ldionne wrote:
> Why are we making this a template?
Because there is no algorithm.cpp and I decided not to create one because of such feature. It is needed to instantiate `static const void* const __seed_static;` and inline variables are only a C++17 feature, before that it was a common trick to put the static variable definition in header

See

```
template <class _Void>
const void* const __random_unspecified_behavior_gen_<_Void>::__seed_static =
    &__seed_static;
```

================
Comment at: libcxx/include/algorithm:3127-3132
+    result_type operator()() {
+      std::uint_fast64_t __oldstate = __state;
+      __state = __oldstate * 6364136223846793005ULL + __inc;
+      std::uint_fast32_t __xorshifted = ((__oldstate >> 18u) ^ __oldstate) >> 27u;
+      std::uint_fast32_t __rot = __oldstate >> 59u;
+      return (__xorshifted >> __rot) | (__xorshifted << ((-__rot) & 31));
----------------
ldionne wrote:
> I'm far from being a PRNG expert - can you explain what you're doing here?
It's some very simple PCG random generator, I switched to a more simple linear congruential in order to remove complexity

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96946/new/

https://reviews.llvm.org/D96946