[cfe-dev] Clang generates absurd amount of assembly for libc++ std::vector::emplace

Wed Jul 25 08:57:23 PDT 2018

The number of lines of assembly isn't really a good proxy for the performance of some code - mostly due to inlining (one piece of code may be many more lines of assembly because it's not calling large/complicated external functions - or, even taken as a whole (including those external functions) it might still be more efficient to have longer code (because it's more specialized - ie: two calls to one generic function were inlined into two places and each one simplified/optimized a bit for those situations))

Yeah, you’re right; that was also pointed out to me by someone on one of the IRC channels I lurk on. A bit more investigation on Godbolt revealed that the difference could be to unrolling. It was certainly a surprise to me, as I expected that libstdc++ and libc++ would have relatively similar implementations that would produce relatively similar outputs. Guess something about libc++’s implementation is a bit easier for Clang to inspect? In any case, let that be a lesson to me to be a bit more careful about drawing conclusions from code size

That said, libc++ does have a bunch of forced inlining that's not for performance reasons, but for linkage reasons (to ensure that certain kinds of changes/updates to libc++ don't break existing compiled code/libraries). It's a tradeoff that not every user of libc++ needs to make & there are steps being taken to make that tradeoff more configurable/optional, so far as I understand it.

Huh, that’s interesting. That isn’t what is happening here, though, right? I didn’t see anything that looks like that around the declarations/implementations of emplace() and friends

On Mon, Jul 23, 2018 at 4:43 PM via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

Hello all,

Just a quick question to make sure I’m not missing something.

This program:

#include <vector>

void f(std::vector<double>& vec, double val) {

      vals.emplace(std::cbegin(vec), val);

}

When compiled with trunk Clang on Godbolt with -O3 -march=haswell -std=c++17 -stdlib=libstdc++, 132 lines of assembly are produced. If -stdlib=libc++ is used, though, 638 (!) lines of assembly are produced. A few of those lines are due to f() itself, but it appears the vast majority are due to the implementation of emplace(). As a partial comparison, GCC trunk produced 136 lines of assembly, and seems to have partially inlined emplace(), leaving 94 lines of assembly for _M_realloc_insert.

I can sort of duplicate this on Debian sid, with libc++-dev 6.0.1-1 and clang++-7 (--version doesn’t appear to give a revision number, unfortunately?). Using libstdc++ results in 176 lines of assembly, and libc++ results in 803 lines of assembly (counted by wc -l).

Is this something to be worried about? I’m still rather new to performance-related work, so I’m working from a relatively simplistic view of what could be affecting performance. A 4x difference in what could be a commonly-used function seems rather unusual to me, though.

Thanks,

Alex

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180725/9ab10388/attachment.html>