[LLVMdev] std::string

Sun Jan 20 08:39:14 PST 2013

On 20.01.2013, at 16:46, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:

> On 1/19/2013 10:00 PM, Chris Lattner wrote:
>> On Jan 19, 2013, at 7:04 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:
>>> 
>>> Were the "small n" characteristics the main motivation?
>> 
>> It is one of the motivations.
> 
> What were the others?
> 
> The reason I ask is that STL comes all ready, with containers and algorithms.  They may not be optimal for every task, but they do their job and they are part of the standard.  There may be some price to pay in terms of performance/memory usage/etc. for a specific application, but overall it may be worth it.  Evidently, in case of LLVM, someone (you?) decided that having local set of containers is a better idea.  I simply want to understand the reasons behind this decision.
> 
> I quickly looked over the library section on containers in the C++03 standard and I didn't see any paragraphs regarding the allocation strategy for classes like "set" or "map".  The LLVM page seems to contain information that was based on some specific implementation (or several implementations), but was not mandated by the standard itself.

The containers in the STL are general purpose classes that are fast for a wide variety of different elements but don't excel on any special case. The LLVM classes are much more specialized.

For example DenseMap is a hash table that embeds both keys and values into a big table of memory. This gives nice cache behavior for simple pointer-to-pointer mappings which are common throughout LLVM while std::map wastes memory and has worse cache behavior. For big keys DenseMap wastes a ton of memory and becomes slow, std::map is better for those cases.

Another example: clang's lexing speed is directly bound by StringMap, a specialized hash table for string keys (std::map<std::string, …> is extremely slow in comparison). A fast lexer is a big deal when you want to embed clang into an IDE and still get responsive autocompletion.

Most of the containers in LLVM were motivated by cases where the standard containers were just too slow and needed a replacement. Some of them were closely related to a bad implementation decision in a particular version of the standard library, e.g. libstdc++'s std::string doesn't implement the small string optimization, making incremental building of strings slow, to make that use case faster SmallString was invented. Now that small std::strings are more commonly implemented we may want to retire it eventually.

- Ben