[LLVMdev] Testcases where GVN uses too much memory?
Owen Anderson
resistor at mac.com
Sat May 3 13:22:32 PDT 2014
On May 3, 2014, at 7:59 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> On 2014 May 2, at 21:22, Owen Anderson <resistor at mac.com> wrote:
>
>>> Each element in `LeaderTable` was an intrusively linked list of
>>> `LeaderTableEntry`, with each element of the list costing 24B (assuming
>>> 64-bit pointers).
>>>
>>> - Remove the link from `LeaderTableEntry`, slimming it down to 16B.
>>>
>>> - Replace the linked list with a hand-rolled vector, with size and
>>> capacity each represented using `uint32_t`.
>>>
>>> If the list has a single entry, it still costs 24B and does not require
>>> a dereference. If it has more than one element, it mallocs an array and
>>> costs 24B plus 16B per entry.
>>>
>>> Since the entries will be next to each other in memory, we get a
>>> locality savings here.
>>
>> I don’t follow the logic here. My recollection is that the common case is a single element in the list, for which the data structure is already optimized: the head element (not a pointer, but the actual element) is stored in-line with the LeaderTable itself, requiring no additional allocation at all.
>
> Right, it's already optimized for the common case. My goal was to leave
> the common case about the same, but improve the rare case. When GVN
> "blows up", is it just a really big common case, or are entries in
> `LeaderTable` with 3+ elements dominating the memory footprint?
>
> After this patch:
>
> - The memory footprint of the common case is unchanged: if there's
> only one element, then it's still 24B stored in-line with no
> additional allocation. However, runtime speed is probably slower
> because of the branch to check where the storage is. (How much
> slower?)
>
> - In the rare two element case, there's one allocation of 32B instead
> of one allocation of 24B. Depending on the malloc, this might be
> equivalent, or might be more expensive. Runtime speed to visit them
> should be dominated by the single dereference in both cases.
>
> - In the rare 3+ element case, there's 1 x N*16B allocation instead of
> (N-1) x 24B allocations. This is where the patch might add value
> (both memory footprint and runtime speed).
>
> But it's speculative enough that it's needs supporting data.
I agree. It seems like it has the potential to be worthwhile, but we need at the very least a histogram of the frequencies of various list lengths. Ideally, we want to measure both the memory footprint and the runtime impact of this change.
—Owen
More information about the llvm-dev
mailing list