<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - It is slower to use std::string::operator+= with a char literal argument than a string literal"

   href="https://bugs.llvm.org/show_bug.cgi?id=47669">47669</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>It is slower to use std::string::operator+= with a char literal argument than a string literal

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libc++

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>7.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>All Bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>pierre.tallotte@viacesi.fr

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org, mclow.lists@gmail.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>There is clang-tidy option performance-faster-string-find that detects the use

of the std::basic_string::find method (and related ones) with a single

character string literal as argument. According to it, the use of a character

literal is more efficient.

However, I performed a benchmark and noticed it is the case only for small

string (when the small string optimization is used).

Here is my code:

#include <benchmark/benchmark.h>

#include <string>

static void BM_string_literal(benchmark::State& state)

{

    std::string s;

    for (int i = 0; i < state.range(0); i++)

        s += 'a';

    s += 'b';

    benchmark::DoNotOptimize(s.data());

    benchmark::ClobberMemory();

    size_t pos;

    for (auto _ : state)

    {

        benchmark::DoNotOptimize(pos = s.find("b")); // "b" is a string

literal, it should be longer

        benchmark::ClobberMemory();

    }

}

BENCHMARK(BM_string_literal)->RangeMultiplier(2)->Range(8, 8<<10);

static void BM_char_literal(benchmark::State& state)

{

    std::string s;

    for (int i = 0; i < state.range(0); i++)

        s += 'a';

    s += 'b';

    benchmark::DoNotOptimize(s.data());

    benchmark::ClobberMemory();

    size_t pos;

    for (auto _ : state)

    {

        benchmark::DoNotOptimize(pos = s.find('b')); // 'b' is a char literal,

it should be faster

        benchmark::ClobberMemory();

    }

}

BENCHMARK(BM_char_literal)->RangeMultiplier(2)->Range(8, 8<<10);

BENCHMARK_MAIN();

According to clang-tidy, I should prefer the code in BM_char_literal which is

faster. However, the results of the benchmark are the following:

[BM_string_literal vs. BM_char_literal]/8                   -0.0760        

-0.0760             9             8            9             8

[BM_string_literal vs. BM_char_literal]/16                  -0.0757        

-0.0767             9             8            9             8

[BM_string_literal vs. BM_char_literal]/32                  +0.3812        

+0.3809             4             5            4             5

[BM_string_literal vs. BM_char_literal]/64                  +0.1609        

+0.1602             4             5            4             5

[BM_string_literal vs. BM_char_literal]/128                 +0.1946        

+0.1944             4             5            4             5

[BM_string_literal vs. BM_char_literal]/256                 +0.1616        

+0.1623             6             6            6             6

[BM_string_literal vs. BM_char_literal]/512                 +0.2225        

+0.2211             7             9            7             9

[BM_string_literal vs. BM_char_literal]/1024                +0.1052        

+0.1051            11            12            11            12

[BM_string_literal vs. BM_char_literal]/2048                +0.0789        

+0.0781            18            20            18            20

[BM_string_literal vs. BM_char_literal]/4096                +0.0349        

+0.0348            31            32            31            32

[BM_string_literal vs. BM_char_literal]/8192                +0.0053        

+0.0042            56            57            56            57

We can see it is faster using a string_literal when the std::string is at least

32 characters long (I can reproduce these results again and again, it is not a

variance issue).

Is clang-tidy wrong or is there a bug in libc++? Or is my benchmark wrong

somewhere?

To reproduce my case, here are the commands I used (on a debian-stable):

apt-get -y install clang libc++-dev libc++abi-dev git cmake python python-pip

git clone <a href="https://github.com/google/benchmark.git">https://github.com/google/benchmark.git</a>

git clone <a href="https://github.com/google/googletest.git">https://github.com/google/googletest.git</a> benchmark/googletest

pushd benchmark

cmake -E make_directory "build"

cmake -E chdir "build" cmake -DCMAKE_C_COMPILER=clang

-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release

-DCMAKE_CXX_FLAGS="-stdlib=libc++" -DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON ../

cmake --build "build" --config Release --target install

popd

pip install scipy

clang++ -stdlib=libc++ -O3 bench.cpp -lbenchmark -lpthread -o bench

./benchmark/tools/compare.py filters ./bench BM_string_literal BM_char_literal</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>