[PATCH] D55427: [libcxx] Call __count_bool_true for bitset count
Adhemerval Zanella via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 13 10:23:58 PST 2018
zatrazz added a comment.
In D55427#1329797 <https://reviews.llvm.org/D55427#1329797>, @mclow.lists wrote:
> In D55427#1325285 <https://reviews.llvm.org/D55427#1325285>, @zatrazz wrote:
>
> > This patch aims to help clang with better information so it can inline
> > __bit_reference count function usage for both std::biset. Current clang
> > inliner can not infer that the passed typed will be used only to select
> > the optimized variant, it evaluates the type argument and type check as
> > a load plus compare (although later optimization phases correctly
> > optimized this out).
>
>
> I'm unclear on the magnitude of the improvement here.
> Are we talking a single load + compare instruction in the call to `std::count` ?
> Or something inside the loop?
>
> [ I'm pretty sure that the patch is correct now - but I don't understand how important it is ]
It is mainly to help llvm inliner to generate better code for std::bitset count for aarch64. It helps
on both runtime and code size, since if inline decides that _VSTD::count should not be inlined
the vectorization will create both aligned and unaligned variants (which add both code size and
runtime costs)
For instance, on aarch64 the snippet:
-
#include <bitset>
int foo (std::bitset<256> &bt)
{
return bt.count();
}
-
Generates a text of 844 bytes, while with the patch is just 112 bytes (due vectorization code
being able to assume aligned input and just generate one code path).
As a side note, x86_64 it is not affected because of the cost analysis being done see less
instruction being required and the template instantiation being less costly.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55427/new/
https://reviews.llvm.org/D55427
More information about the llvm-commits
mailing list