[libcxx-commits] [PATCH] D130695: [libc++][ranges]Refactor `copy{, _backward}` and `move{, _backward}`

Tue Aug 30 12:05:11 PDT 2022

philnik added inline comments.

================
Comment at: libcxx/include/__algorithm/copy.h:54
 copy(_InputIterator __first, _InputIterator __last, _OutputIterator __result) {
-  return std::__copy(__first, __last, __result).second;
+  return std::__copy<_ClassicAlgPolicy>(__first, __last, __result).second;
 }
----------------
ldionne wrote:
> philnik wrote:
> > Why does `copy` get an `_AlgPolicy` now? AFAICT it's never used anywhere, which means it just produces duplicate code.
> For consistency.
> 
> To address the code size concern, I think what we should do is change `__copy` above like so:
> 
> ```
> template <class _AlgPolicy, class _InIter, class _Sent, class _OutIter>
> pair<_InIter, _OutIter>
> inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
> __copy(_InIter __first, _Sent __last, _OutIter __result) {
>   // NOTE: There is no difference between the ranges and the STL classic version of this algorithm,
>   //       so we always use _ClassicAlgPolicy to reduce code bloat.
>   return std::__dispatch_copy_or_move<__copy_loop<_ClassicAlgPolicy>, __copy_trivial<_ClassicAlgPolicy> >(
>       std::move(__first), std::move(__last), std::move(__result));
> }
> ```
This isn't just affecting `copy` though. This also results in multiple versions for all the `set` algorithms.  I really don't think that this bit of consistency is worth lots of extra instantiations for these algorithms.

================
Comment at: libcxx/include/__algorithm/copy_move_common.h:34
+  const size_t __n = static_cast<size_t>(__last - __first);
+  std::memmove(__result, __first, __n * sizeof(_Out));
+
----------------
ldionne wrote:
> philnik wrote:
> > Why did you get rid of calling `__builtin_memmove` during constant evaluation?
> GCC didn't support it, which added a bunch of complexity. So now what we do instead is unwrap the iterators and use the loop version when we're inside constant evaluation. The performance during constant evaluation is less of a concern than performance at runtime.
AFAICT the only complexity added by GCC not supporting it is an extra `#ifndef _LIBCPP_COMPILER_GCC`. How does calling `__builtin_memmove` result in worse runtime performance? IMO we shouldn't massively pessimize constant evaluation performance for removing a few lines of code.

For example:
```
#include <cstddef>

constexpr void copy_arrays(int* begin, int* end, int* out) {
#ifdef USE_MEMMOVE
  ::__builtin_memmove(out, begin, static_cast<std::size_t>(end - begin));
#else
  while (begin != end) {
    *out = *begin;
    ++begin;
    ++out;
  }
#endif
}

constexpr bool use_data() {
  int a[100000] = {};
  int b[100000] = {};
  ::copy_arrays(a, a + 100000, b);

  return true;
}

static_assert(use_data());
```
Without using `__builtin_memmove` takes 200ms to compile, with `__builtin_memmove` is takes 30ms.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130695/new/

https://reviews.llvm.org/D130695