[LLVMdev] [POTENTIAL API CHANGE] Improving llvm::Value getName()/setName()

Wed Jul 22 20:54:25 PDT 2009

Following up on my introduction of the StringRef type, I wanted to
clean up the API for getting and setting the name of an llvm::Value.
There are a number of problems with the current API:

1. There are too many ...Name methods.

2. getName() returns an std::string, which is expensive.

3. getName() is frequently used to derive new names, which are passed
to setName(). Currently this uses std::string concatenation.

4. In a Release-Asserts build, we usually don't name values, but we
still end up with the overhead of creating and destroying strings we
either don't use, or don't care about.

Problem #1 can be addressed solely with the StringRef type, but it
requires either updating many clients (which frequently do something
like (getName() + ".not")), or making StringRef transparently convert
to an std::string, which hides inefficient code.

Chris and I discussed this problem and came up with what I think is a
nice and tidy solution, based on introducing a new type for handling
the results of string concatenation. I tried to do an extensive
evaluation of the results to make sure the approach was working &
worth it, and in true salesman style, I'm going to give you the
benefits first before the implementation! :)

On 64-bit Darwin, using gcc-4.2 to compile LLVM; and using
instcombine.bc (the .bc for InstructionCombining.bc) as a test case:

Code size:
-1% on opt (from 5315144 to 5249024)

Memory allocation:
-10% calls to malloc (from 944090 to 832970), for 'opt -f -o /dev/null
-std-compile-opts instcombine.bc'

Performance:
-1% for the preceeding opt command.

Compile Time:
Compile time for InstructionCombining.cpp got slightly faster,
although gcc's memory usage increased by ~15%.

----

The idea is to introduce a new class (Twine) which represents the
result of string concatenation. The class is essentially a lightweight
rope data structure, where the nodes are typed pointers to C++
temporaries. Thus a Twine doesn't require any heap allocation during
construction, and we can defer the conversion of integers into a
string until the result of the Twine is used -- in the case that the
result is never needed, the conversion is avoided completely.
Additionally, by deferring the generation of the concatenated string,
we reduce the amount of code that is generated at the place where the
concatenation is done and coalesce it into one place. Finally, we can
provide a special Twine value which always concatenates to form an
empty string -- llvm::Value can return this value when it has no name,
and any clients which derive names from this value will automatically
avoid doing unused string concatenation, uitostr conversion, etc.

As with StringRef, the main price we end up paying is increased
complexity and the potential for unsafe memory operations (the Twine
is very tied to the ability to keep pointers to memory it doesn't
own).

The nice part of this API change is that it does not impact the
majority of clients; the StringRef and Twine classes combine so that
clients can write code like 'setName(getName() + ".neg")' just as
before.

To keep API/high-level discussion separate from implementation
details, I will send the patches separately to llvm-commits for
review.

 - Daniel