[libcxx-commits] [PATCH] D112152: [libc++] Unroll loop in std::find_if and define other std::find variants in terms of it
Florian Hahn via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Tue Nov 2 08:26:35 PDT 2021
fhahn added a comment.
In D112152#3078256 <https://reviews.llvm.org/D112152#3078256>, @ldionne wrote:
> Re: all
>
> Thanks a lot for chiming in. Morally, I share everybody's views here -- this is a compiler issue and we shouldn't be working around it in the library. This was reported to me and @fhahn asked me to look into whether it might be possible to implement this in the library. So I did, but I agree I'd rather not check this in if the compiler can be fixed instead. I'll let Florian explain what a compiler fix would involve.
I think the main problem with optimising the code in LLVM at the moment is that we lose the information that all elements between `begin()` and `end()` are dereferenceable (simplified IR shown below), e.g. the IR function could get passed a garbage end pointer and only exit via the early exit. In the original C++ source, we have stronger guarantees allowing for better optimisations.
Ideally we would vectorise the search loop, but I don't think that's possible without more information in the IR. While I agree it is unfortunate that the IR seems to be missing crucial information and the compiler should do better, I don't think there's an easy fix for now.
I'd like for the Clang + libc++ package to provide the best performance for our users and surgically modifying the libc++ implementation might be the easiest first step in that direction. As @ldionne mentioned, we encountered use cases where this difference in `std::find` implementation is causing a 10% performance difference compared to GCC+libstdc++ for a largish application (*not* a `std::find` specific micro-benchmark).
As reviewers already pointed out, if we go with a source modification, the implementation should probably only kick in for data types that can be checked cheaply (e.g. integers, floats and pointers).
I am definitely also interested in working towards a way to provide deferenceability info for the `begin()` & `end()` range. (If there's already a way I missed, please let me know :)).
define i8** @src(i8** %begin, i8** %end, i8* %tgt) {
entry:
%cmp.i.i.not8.i = icmp eq i8** %begin, %end
br i1 %cmp.i.i.not8.i, label %exit, label %header
header:
%ptr.iv = phi i8** [ %ptr.iv.next, %latch ], [ %begin, %entry ]
%lv = load i8*, i8** %ptr.iv, align 8
%cmp = icmp eq i8* %lv, %tgt
br i1 %cmp, label %exit, label %latch
latch:
%ptr.iv.next = getelementptr inbounds i8*, i8** %ptr.iv.1, i64 1
%cmp.i.i.not.i = icmp eq i8** %ptr.iv.next, %end
br i1 %cmp.i.i.not.i, label %exit, label %header
exit:
%res.lcssa = phi i8** [ %begin, %entry ], [ %end, %latch ], [ %ptr.iv, %header ]
ret i8** %res.lcssa
}
I also took a stab at adding a first set of benchmarks to `libcxx/benchmarks/algorithms.bench.cpp` , but something is not quite right I think. I'm not very familiar with the libc++ benchmarking system, so if anybody can spot any issues, please let me know.
diff --git a/libcxx/benchmarks/algorithms.bench.cpp b/libcxx/benchmarks/algorithms.bench.cpp
index 564d89d659cb..f0e610df6ba0 100644
--- a/libcxx/benchmarks/algorithms.bench.cpp
+++ b/libcxx/benchmarks/algorithms.bench.cpp
@@ -309,6 +309,48 @@ struct PopHeap {
};
};
+template <class ValueType, class Order>
+struct FindMid {
+ size_t Quantity;
+
+ void run(benchmark::State& state) const {
+ runOpOnCopies<ValueType>(state, Quantity, Order(), BatchSize::CountElements, [](auto& Copy) {
+ auto I = std::find(Copy.begin(), Copy.end(), Copy[Copy.size() / 2]);
+ // Try to prevent the compiler from optimizing out std::find.
+ benchmark::DoNotOptimize(I);
+ benchmark::DoNotOptimize(Copy);
+ std::swap(*I, *Copy.begin());
+ });
+ }
+
+ bool skip() const { return Order() == ::Order::Heap; }
+
+ std::string name() const {
+ return "BM_FindMid" + ValueType::name() + Order::name() + "_" + std::to_string(Quantity);
+ };
+};
+
+template <class ValueType, class Order>
+struct FindEnd {
+ size_t Quantity;
+
+ void run(benchmark::State& state) const {
+ runOpOnCopies<ValueType>(state, Quantity, Order(), BatchSize::CountElements, [](auto& Copy) {
+ auto I = std::find(Copy.begin(), Copy.end(), Copy.back());
+ // Try to prevent the compiler from optimizing out std::find.
+ benchmark::DoNotOptimize(I);
+ benchmark::DoNotOptimize(Copy);
+ std::swap(*I, *Copy.begin());
+ });
+ }
+
+ bool skip() const { return Order() == ::Order::Heap; }
+
+ std::string name() const {
+ return "BM_FindEnd" + ValueType::name() + Order::name() + "_" + std::to_string(Quantity);
+ };
+};
+
} // namespace
int main(int argc, char** argv) {
@@ -333,5 +375,7 @@ int main(int argc, char** argv) {
Quantities);
makeCartesianProductBenchmark<PushHeap, AllValueTypes, AllOrders>(Quantities);
makeCartesianProductBenchmark<PopHeap, AllValueTypes>(Quantities);
+ makeCartesianProductBenchmark<FindMid, AllValueTypes, AllOrders>(Quantities);
+ makeCartesianProductBenchmark<FindEnd, AllValueTypes, AllOrders>(Quantities);
benchmark::RunSpecifiedBenchmarks();
}
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112152/new/
https://reviews.llvm.org/D112152
More information about the libcxx-commits
mailing list