[PATCH] D55427: [libcxx] Call __count_bool_true for bitset count

Adhemerval Zanella via Phabricator reviews at reviews.llvm.org
Thu Dec 13 10:23:58 PST 2018

zatrazz added a comment.

In D55427#1329797 <https://reviews.llvm.org/D55427#1329797>, @mclow.lists wrote:

> In D55427#1325285 <https://reviews.llvm.org/D55427#1325285>, @zatrazz wrote:
> > This patch aims to help clang with better information so it can inline
> >  __bit_reference count function usage for both std::biset. Current clang
> >  inliner can not infer that the passed typed will be used only to select
> >  the optimized variant, it evaluates the type argument and type check as
> >  a load plus compare (although later optimization phases correctly
> >  optimized this out).
> I'm unclear on the magnitude of the improvement here.
>  Are we talking a single load + compare instruction in the call to `std::count` ?
>  Or something inside the loop?
> [ I'm pretty sure that the patch is correct now - but I don't understand how important it is ]

It is mainly to help llvm inliner to generate better code for std::bitset count for aarch64. It helps
on both runtime and code size, since if inline decides that _VSTD::count should not be inlined
the vectorization will create both aligned and unaligned variants (which add both code size and
runtime costs)

For instance, on aarch64 the snippet:


#include <bitset>

int foo (std::bitset<256> &bt)

  return bt.count();


Generates a text of 844 bytes, while with the patch is just 112 bytes (due vectorization code
being able to assume aligned input and just generate one code path).

As a side note, x86_64 it is not affected because of the cost analysis being done see less
instruction being required and the template instantiation being less costly.



More information about the libcxx-commits mailing list