[libcxx-commits] [PATCH] D112152: [libc++] Unroll loop in std::find_if and define other std::find variants in terms of it

Tue Nov 2 08:26:35 PDT 2021

fhahn added a comment.

In D112152#3078256 <https://reviews.llvm.org/D112152#3078256>, @ldionne wrote:

> Re: all
>
> Thanks a lot for chiming in. Morally, I share everybody's views here -- this is a compiler issue and we shouldn't be working around it in the library. This was reported to me and @fhahn asked me to look into whether it might be possible to implement this in the library. So I did, but I agree I'd rather not check this in if the compiler can be fixed instead. I'll let Florian explain what a compiler fix would involve.

I think the main problem with optimising the code in LLVM at the moment is that we lose the information that all elements between `begin()` and `end()` are dereferenceable (simplified IR shown below), e.g.  the IR function could get passed a garbage end pointer and only exit via the early exit. In the original C++ source, we have stronger guarantees allowing for better optimisations.

Ideally we would vectorise the search loop, but I don't think that's possible without more information in the IR. While I agree it is unfortunate that the IR seems to be missing crucial information and the compiler should do better, I don't think there's an easy fix for now.

I'd like for the Clang + libc++ package to provide the best performance for our users and surgically modifying the libc++ implementation might be the easiest first step in that direction. As @ldionne  mentioned, we encountered use cases where this difference in `std::find` implementation is causing a 10% performance difference compared to GCC+libstdc++ for a largish application (*not* a `std::find` specific micro-benchmark).

As reviewers already pointed out, if we go with a source modification, the implementation should probably only kick in for data types that can be checked cheaply (e.g. integers, floats and pointers).

I am definitely also interested in working towards a way to provide deferenceability info for the `begin()` & `end()` range. (If there's already a way I missed, please let me know :)).

  define i8** @src(i8** %begin, i8** %end, i8* %tgt) {
  entry:
    %cmp.i.i.not8.i = icmp eq i8** %begin, %end
    br i1 %cmp.i.i.not8.i, label %exit, label %header

  header:
    %ptr.iv = phi i8** [ %ptr.iv.next, %latch ], [ %begin, %entry ]
    %lv = load i8*, i8** %ptr.iv, align 8
    %cmp = icmp eq i8* %lv, %tgt
    br i1 %cmp, label %exit, label %latch

  latch:
    %ptr.iv.next = getelementptr inbounds i8*, i8** %ptr.iv.1, i64 1
    %cmp.i.i.not.i = icmp eq i8** %ptr.iv.next, %end
    br i1 %cmp.i.i.not.i, label %exit, label %header

  exit:
    %res.lcssa = phi i8** [ %begin, %entry ], [ %end, %latch ], [ %ptr.iv, %header ]
    ret i8** %res.lcssa
  }

I also took a stab at adding a first set of benchmarks to `libcxx/benchmarks/algorithms.bench.cpp` , but something is not quite right I think. I'm not very familiar with the libc++ benchmarking system, so if anybody can spot any issues, please let me know.

  diff --git a/libcxx/benchmarks/algorithms.bench.cpp b/libcxx/benchmarks/algorithms.bench.cpp
  index 564d89d659cb..f0e610df6ba0 100644
  --- a/libcxx/benchmarks/algorithms.bench.cpp
  +++ b/libcxx/benchmarks/algorithms.bench.cpp
  @@ -309,6 +309,48 @@ struct PopHeap {
     };
   };

  +template <class ValueType, class Order>
  +struct FindMid {
  +  size_t Quantity;
  +
  +  void run(benchmark::State& state) const {
  +    runOpOnCopies<ValueType>(state, Quantity, Order(), BatchSize::CountElements, [](auto& Copy) {
  +      auto I = std::find(Copy.begin(), Copy.end(), Copy[Copy.size() / 2]);
  +      // Try to prevent the compiler from optimizing out std::find.
  +      benchmark::DoNotOptimize(I);
  +      benchmark::DoNotOptimize(Copy);
  +      std::swap(*I, *Copy.begin());
  +    });
  +  }
  +
  +  bool skip() const { return Order() == ::Order::Heap; }
  +
  +  std::string name() const {
  +    return "BM_FindMid" + ValueType::name() + Order::name() + "_" + std::to_string(Quantity);
  +  };
  +};
  +
  +template <class ValueType, class Order>
  +struct FindEnd {
  +  size_t Quantity;
  +
  +  void run(benchmark::State& state) const {
  +    runOpOnCopies<ValueType>(state, Quantity, Order(), BatchSize::CountElements, [](auto& Copy) {
  +      auto I = std::find(Copy.begin(), Copy.end(), Copy.back());
  +      // Try to prevent the compiler from optimizing out std::find.
  +      benchmark::DoNotOptimize(I);
  +      benchmark::DoNotOptimize(Copy);
  +      std::swap(*I, *Copy.begin());
  +    });
  +  }
  +
  +  bool skip() const { return Order() == ::Order::Heap; }
  +
  +  std::string name() const {
  +    return "BM_FindEnd" + ValueType::name() + Order::name() + "_" + std::to_string(Quantity);
  +  };
  +};
  +
   } // namespace

   int main(int argc, char** argv) {
  @@ -333,5 +375,7 @@ int main(int argc, char** argv) {
         Quantities);
     makeCartesianProductBenchmark<PushHeap, AllValueTypes, AllOrders>(Quantities);
     makeCartesianProductBenchmark<PopHeap, AllValueTypes>(Quantities);
  +  makeCartesianProductBenchmark<FindMid, AllValueTypes, AllOrders>(Quantities);
  +  makeCartesianProductBenchmark<FindEnd, AllValueTypes, AllOrders>(Quantities);
     benchmark::RunSpecifiedBenchmarks();
   }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112152/new/

https://reviews.llvm.org/D112152