[PATCH] D55427: [libcxx] Call __count_bool_true for bitset count
Adhemerval Zanella via Phabricator
reviews at reviews.llvm.org
Thu Dec 13 10:23:58 PST 2018
zatrazz added a comment.
In D55427#1329797 <https://reviews.llvm.org/D55427#1329797>, @mclow.lists wrote:
> In D55427#1325285 <https://reviews.llvm.org/D55427#1325285>, @zatrazz wrote:
> > This patch aims to help clang with better information so it can inline
> > __bit_reference count function usage for both std::biset. Current clang
> > inliner can not infer that the passed typed will be used only to select
> > the optimized variant, it evaluates the type argument and type check as
> > a load plus compare (although later optimization phases correctly
> > optimized this out).
> I'm unclear on the magnitude of the improvement here.
> Are we talking a single load + compare instruction in the call to `std::count` ?
> Or something inside the loop?
> [ I'm pretty sure that the patch is correct now - but I don't understand how important it is ]
It is mainly to help llvm inliner to generate better code for std::bitset count for aarch64. It helps
on both runtime and code size, since if inline decides that _VSTD::count should not be inlined
the vectorization will create both aligned and unaligned variants (which add both code size and
For instance, on aarch64 the snippet:
int foo (std::bitset<256> &bt)
Generates a text of 844 bytes, while with the patch is just 112 bytes (due vectorization code
being able to assume aligned input and just generate one code path).
As a side note, x86_64 it is not affected because of the cost analysis being done see less
instruction being required and the template instantiation being less costly.
CHANGES SINCE LAST ACTION
More information about the libcxx-commits