[llvm-bugs] [Bug 43096] New: std::basic_string loses the bottom bit of capacity for regular ABI little-endian and alternate ABI big-endian strings

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Aug 22 16:58:02 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=43096

            Bug ID: 43096
           Summary: std::basic_string loses the bottom bit of capacity for
                    regular ABI little-endian and alternate ABI big-endian
                    strings
           Product: libc++
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: All Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: richard-llvm at metafoo.co.uk
                CC: llvm-bugs at lists.llvm.org, mclow.lists at gmail.com

libc++'s std::basic_string reuses the low-order bit of its capacity field as an
'is long' marker when in regular ABI little-endian or alternate ABI big-endian
mode.

When sizeof(_CharT) <= 8, this is OK: all allocations are always of a multiple
of 16 bytes, so capacity is always even, so we don't actually lose any data.
But when sizeof(_CharT) is 16, things go wrong:


#include <string>
#include <iostream>

struct big_character {
    char data[16];
};

int main() {
    for (int n = 0; n != 20; ++n) {
        std::basic_string<big_character> s;
        s.reserve(n);
        std::cout << "asked for " << n << " got " << s.capacity() << "\n";
    }
}


produces:


asked for 0 got 1
asked for 1 got 1
asked for 2 got 3
asked for 3 got 3
asked for 4 got 3
asked for 5 got 5
asked for 6 got 5
asked for 7 got 7
asked for 8 got 7
asked for 9 got 9
asked for 10 got 9
asked for 11 got 11
asked for 12 got 11
asked for 13 got 13
asked for 14 got 13
asked for 15 got 15
asked for 16 got 15
asked for 17 got 17
asked for 18 got 17
asked for 19 got 19


Note that for even sizes greater than 2, reserve(n) fails to result in a
capacity() >= n. Similarly:

    std::basic_string<big_character> s(6, big_character{});
    std::cout << "size is " << s.size() << ", capacity is " << s.capacity() <<
"\n";

prints:

size is 6, capacity is 5


If we want to avoid breaking ABI, it looks like we need to ensure that we
always allocate an even number of elements (including the terminator) in the
affected cases.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190822/a4fa3a47/attachment.html>


More information about the llvm-bugs mailing list