[libcxx-commits] [PATCH] D121485: [libc++] Fix performance inconsistency between map/set copy-assignment and copy-constructor

Fri May 5 12:46:51 PDT 2023

ldionne added inline comments.

================
Comment at: libcxx/benchmarks/map.bench.cpp:175
+  void run(benchmark::State& State) const {
+    auto Data = makeTestingSets(MapSize, Mode::Hit, Shuffle::None, 1);
+    auto& Map = Data.Maps.front();
----------------
EricWF wrote:
> var-const wrote:
> > Question: why are we using a different naming convention here (with capitalized variable `Name`s)?
> Probably because I started it because I was feeling funky. Not sure it matters much.
Yeah it's weird but I am staying consistent with the rest of the file.

================
Comment at: libcxx/benchmarks/map.bench.cpp:177
+    auto& Map = Data.Maps.front();
+    while (State.KeepRunningBatch(MapSize)) {
+#ifndef VALIDATE
----------------
EricWF wrote:
> I don't think this is the correct usage of `KeepRunningBatch`. 
This is what I see in GoogleBenchmark:

```
  // Returns true iff the benchmark should run n more iterations.
  // REQUIRES: 'n' > 0.
  // NOTE: A benchmark must not return from the test until KeepRunningBatch()
  // has returned false.
  // NOTE: KeepRunningBatch() may overshoot by up to 'n' iterations.
  //
  // Intended usage:
  //   while (state.KeepRunningBatch(1000)) {
  //     // process 1000 elements
  //   }
  bool KeepRunningBatch(IterationCount n);
```

You're right that it's a bit weird since we're only "processing" one element here (aka we're doing the assignment of a single map). However, if I look at how all these benchmarks were written, I think the intent is that we pretend that copying each element in the map is "one iteration". I guess that's done so that the output of Google Benchmark can show the time taken on average for each element given a map with N elements in it. That's what we do in all the benchmarks in this file.

Anyway, I switched all the constructor-related benchmarks to use the pattern used in `ConstructorMove`, which gave me the most sensical and the most consistent results.

================
Comment at: libcxx/benchmarks/map.bench.cpp:179
+#ifndef VALIDATE
+      std::map<uint64_t, int64_t> M;
+      M = Map;
----------------
EricWF wrote:
> What about the case where the map is already populated? And with different sizes?
Added a benchmark.

================
Comment at: libcxx/include/__tree:1098
     __tree(const __tree& __t);
-    __tree& operator=(const __tree& __t);
     template <class _ForwardIterator>
----------------
EricWF wrote:
> Mordante wrote:
> > var-const wrote:
> > > From the patch description: "In the future, we could instead reintroduce a copy assignment operator in
> > > tree and copy the whole tree without performing any comparison". Would it be worthwhile adding a TODO to that effect here?
> > +1 for the TODO.
> On another note: a copy assignment operator gets implicitly declared by the compiler. For readability lets make that implicit declaration explicit. 
See https://github.com/llvm/llvm-project/issues/62571

================
Comment at: libcxx/include/__tree:1098
     __tree(const __tree& __t);
-    __tree& operator=(const __tree& __t);
     template <class _ForwardIterator>
         void __assign_unique(_ForwardIterator __first, _ForwardIterator __last);
----------------
ldionne wrote:
> EricWF wrote:
> > Mordante wrote:
> > > var-const wrote:
> > > > From the patch description: "In the future, we could instead reintroduce a copy assignment operator in
> > > > tree and copy the whole tree without performing any comparison". Would it be worthwhile adding a TODO to that effect here?
> > > +1 for the TODO.
> > On another note: a copy assignment operator gets implicitly declared by the compiler. For readability lets make that implicit declaration explicit. 
> See https://github.com/llvm/llvm-project/issues/62571
Since there's a move constructor, it won't get declared. But I explicitly marked it as deleted.

================
Comment at: libcxx/include/set:594
         {
-            __tree_ = __s.__tree_;
+            if (this != _VSTD::addressof(__s)) {
+                __tree_.clear();
----------------
Mordante wrote:
> var-const wrote:
> > Question: it's `_VSTD` and not `std` for consistency within the file, right?
> These changes are quite unexpected when reading the patch description. Can you update the description?
> (Same for the multi(map|set).)
This is taken from the `__tree::operator=` code we were using before, so there are no changes really.

> Question: it's `_VSTD` and not `std` for consistency within the file, right?

Hmm, I'll use `std` anyway.

================
Comment at: libcxx/include/set:598
+                __tree_.__copy_assign_alloc(__s.__tree_);
+                insert(__s.begin(), __s.end());
+            }
----------------
EricWF wrote:
> What's the performance like if you call `__assign_unique(__s.begin(), __s.end())` instead?
It's better in case there's already elements in the tree, so I changed it to that instead. Good catch!

================
Comment at: libcxx/test/libcxx/containers/associative/perf.assignment_vs_construction.pass.cpp:3
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
----------------
EricWF wrote:
> var-const wrote:
> > A couple of questions:
> > - can/should we use Google Benchmark here?
> > - is this file compiled with optimizations enabled?
> Yes, this should be google benchmark.
I basically removed this test. The goal was to have a regression test that would catch when copy assignment and copy construction are not within a reasonable margin of each other, which we don't achieve with our current benchmarks. But I think it's fine to instead have just the benchmarks under `libcxx/benchmarks`.

================
Comment at: libcxx/test/libcxx/containers/associative/perf.assignment_vs_construction.pass.cpp:71
+    auto percentage = (std::labs((ssize_t)ctor_avg - (ssize_t)assignment_avg) / (double)ctor_avg) * 100.0;
+    assert(percentage < 10.0 && "Copy assignment and copy constructor have more than 10% difference");
+}
----------------
Mordante wrote:
> When this is the reason to not do it in Google Benchmark I think it's worthwhile to consider how we can test for timing regressions in the benchmarks. (This would even be good to look into regardless of this patch.)
Yes, I agree, however let's not block this patch on figuring out how to do performance regression testing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121485/new/

https://reviews.llvm.org/D121485