[LLVMdev] improving constant/type uniquing

Tue Feb 15 19:17:02 PST 2011

On Tue, Feb 15, 2011 at 5:21 PM, Chris Lattner <clattner at apple.com> wrote:

> <moving to llvmdev>
>
> On Feb 14, 2011, at 9:49 AM, Talin wrote:
>
> A few additional thoughts on PR1210:
>
> I can think of three approaches to addressing this issue. The first
> approach is the one outlined in PR1210, which is to have the key have
> enough knowledge of the internals of the Type to be able to use the internal
> type array as the data for the key.
>
> The second approach is similar, which is to factor out the type array as a
> separate, immutable object which is shared by both the key and the type
> object. That is, you first create the type array, use that to look up the
> type, and if it doesn't find it, clone the type array and use it to
> construct a new type object, then use the same type array as the key.
>
>
> This is effectively what FoldingSet does.  The actual bits are stored in
> the type if it exists.  If it doesn't exist, the input vector of element
> types is hashed by the profiling function, and then the hash value is looked
> up.  If there is no match, a type is created.  The profile is designed to
> typically live on the stack to avoid a heap allocation.
>
> Note that you can't use the same type array that you used for the lookup,
> because otherwise the get-or-insert function would sometimes have to take
> ownership of the type array depending on whether the type existed or not,
> which would be confusing.
>
>
> Right: the idea is that a lookup (which hits an already created value)
> should not do a heap allocation.
>
> Another way around the ownership problem is to allocate all type arrays as
> part of the memory pool associated with the LLVMContext, and then 'intern'
> the type arrays so that they can be compared via pointer rather than by
> value. The type arrays go away when the context does, so you don't have to
> worry about who is responsible for freeing it. The downside is that you now
> have to do two hash lookups for the type object's get-or-create method.
>
>
> I don't really understand what you mean by this.  You still need to hash on
> the identity of the type and the subclass of type still needs to have
> accessors for its elements.
>
> I'll try to explain it a different way. Imagine you have a single folding
set, independent of any type subclass, which takes as input an array of
types (the key), and returns a pointer to an immutable vector or tuple of
those types - much like an interned string. That pointer can then be used as
a key to get a StructType or a FunctionType or any other type which has more
than one type parameter, and all derived types now represent their type
parameters as a single pointer, either to a type or to a tuple. It also
means that type "{ int; int; int; }" shares the same type parameter array as
"int (int, int)", both pointing to a tuple of 3 ints. Each subtype still has
its own hash map that enables lookup of a subtype via it's type parameters,
however there's no need to dereference the tuple pointer during this second
lookup, you only need to compare the two pointers, not what they point to.

I'm not certain that this approach has any real benefit, however, given the
normal usage patterns of looking up types in LLVM. I was merely including it
for the sake of completeness... since normally subtypes are interned anyway,
and since most subtypes take only a single type parameter, the benefits of
separately interning the type parameter array probably aren't worth the
additional complexity.

-Chris
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110215/5ad168a8/attachment.html>