[LLVMdev] Efficient Constant Uniquing

Wed Feb 1 19:56:07 PST 2012

Chris asked me to look into improving the current sad state of
ConstantUniqueMap, and by coincidence Meador Inge started working on the
same thing on his own initiative, so I'm writing down a concrete proposal
to make sure we're not working at cross purposes.

The ConstantUniqueMap template class is used to store unique instances of
ConstantStruct, ConstantVector, ConstantArray, ConstantExpr and InlineAsm
objects. As implemented, this class has a number of problems: For one
thing, the map keys contain a std::vector, meaning that just to do a find()
operation you need to allocate memory. Also, it keeps two copies of the
operands: One copy lives inside the Constant as part of the regular operand
list, and a second copy is stored as a std::vector as part of the map key.
So if you have a 1000-element constant array, that's actually 2000 *
sizeof(Constant*) worth of memory consumed.

Recently I added a new method to DenseMap, called find_as(), which allows
you to do find() operations using a lookup key that has a different C++
data type than the keys that are stored in the map. The requirement is that
these lookup keys must be equality-comparable with the regular keys, and
must hash to the same value - a custom DenseMapInfo can be created which
provides this.

My plan for ConstantUniqueMap was to replace the std::map with a DenseMap
whose keys were the Constant being uniqued, with the value field would be
unused. With find_as(), we can create a lookup key whose type is
std::pair<Type*, ArrayRef<Constant>>, allowing us to lookup the constants
by their contents. In other words, we can pass around an ArrayRef of the
operands and use that to find the constant, or to create a new constant if
it does not already exist.

There's a few wrinkles however. The first is that for large arrays and
structs, ConstantUniqueMap also manages an inverse map. The purpose of the
inverse map is for the case where you already have a pointer to the
constant in the map, but you for some reason need an iterator to the map
entry. An example of where this is needed is when you want to delete a
constant from the map. Without the inverse map, you would need to
re-calculate the hash value of the constant, which could be expensive if it
has a large number of operands.

The current implementation of the inverse map is std::map<Constant*,
Map::iterator>. This only works because std::maps iterators are not
invalidated by mutation, if we switch to a DenseMap this will no longer be
true.

There are a few different solutions for the inverse map problem:

a) Just get rid of the inverse map entirely. The need to lookup a constant
by pointer is fairly rare, and the cost of re-hashing a large array isn't
probably that significant compared to the cost savings of not having to
call malloc() on every key. Also, you only need to iterate over the operand
list for hashing - the equality comparison can still be address-based. This
is the approach I was going to take initially.

b) Save the hash value as part of the Constant. This means adding a new
field to every constant, but given that we've just saved a whopping load of
memory by not having to maintain a copy of the operands, this is probably a
win overall.

c) Have a separate unique map for the operand list. In other words, when
creating a constant, you first unique the operand list, which returns a
pointer. Then you construct a constant, which is now simply a 2-tuple of
(Type*,ArrayRef<Constant>*), and can be hashed very quickly. This means
that if two arrays or two structs have the same operands but different
types, they will both share the same operand list. The problem with this
approach is that there are a few cases where the operand arrays are
modified directly (such as when doing replaceAllUsesWith) and having to
un-share the array would complicate things. Also, this represents a fairly
major change to the way constants are represented, and I don't want to take
on such a large project.

Second wrinkle: Originally, I had thought that you could produce an
ArrayRef of the operands of a Constant by doing ArrayRef(CP->op_begin(),
CP->op_end()). However, the type of op_begin is Use*, not Constant*, and
casting a Use* to a Constant* is not a valid operation. Instead, one must
go through the CP->getOperand(i) interface. This is not a huge deal, except
that it means you have to write the hash algorithm twice: Once for the
lookup key (ArrayRef<Constant>) and once for the Constant itself.

What would be helpful in this case is if there was a standard incremental
hash utility in ADT. I noticed that there are a number of different hash
function implementations scattered throughout the LLVM code base, and it
would make sense to consolidate these in a Hashing.h header file. I have
something similar in Tart, see here for an
example<http://code.google.com/p/tart/source/browse/trunk/compiler/include/tart/Common/Hashing.h>
.

Third wrinkle: As mentioned earlier, occasionally the replaceAllUsesWith
operation will mutate the operand array directly. In these cases, the
constant needs to be removed from the uniquing map, and re-inserted at a
new location corresponding to the hash of its updated operands. The
existing code does a lot of subtle micro-optimizations which don't
translate over to the new representation scheme (for example, there's no
'insert_as' analogue of 'find_as'). My thinking was to focus on big wins
(getting rid of std::vector) and not try and preserve every
micro-optimization.

Fourth wrinkle: The current implementation does some tricky template stuff
to try and get it to work with five different types of uniqued constants,
even though ConstantArray is very different from InlineAsm for example. My
thinking was to split it into two classes, one for constants whose operands
could be represented as ArrayRef<Constant>, and one for the other two types.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120201/d4cd86fb/attachment.html>