[PATCH] Devirtualize llvm::Value and all subclasses

Fri Jun 26 14:14:38 PDT 2015

On Fri, Jun 26, 2015 at 1:50 PM, Pete Cooper <peter_cooper at apple.com> wrote:

>
> On Jun 26, 2015, at 1:42 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> Completely shooting from the hip, we have been seeing some slow-downs over
> the last week or so & there's at least a vague guess that it could be
> related to these pervasive data structure changes that have been going in
> within that window.
>
> Thats a reasonable guess.  I’d think the same.
>
>
> Have you been running/seen any compile-time perf changes in that same
> window?
>
> I have yeah.  We have the bitcode saved from the linker for
> verify_uselist_order with and without debug info and before/after
> optimizations.  I’ve been putting those through either opt/llc and
> measuring compile time.  It never strayed from noise TBH.
>

What about broader perf numbers? Not seeing any significant fluctuations in
the last 7-10 days?

One of the things I was thinking, if this is an issue, is that we could do
> something vtable-esque. Since we already have an enum to differentiate the
> various derived classes - could we have an array of manual vtables, indexed
> by that enum, and explicitly invoke through that? If the switch stuff is
> slowing things down - possible that it isn't, and the switch is cheaper
> than the indirect call, etc.
>
> I was about to reply ‘surely the optimizer will just make a jump table
> from the switch which is just as fast’.
>

Could be, I really don't know.

> However, then I thought about what that would look like, and its not quite
> what you’re suggesting.  The jump table optimization would only jump to a
> block which then does the call, but the call itself is then another jump.
>
> We couldn’t rely on it 100%, but I wonder if we can teach the optimizer to
> optimize jump tables to calls where every call has identical argument lists.
>
> Anyway, it probably wouldn’t help with the performance tracking for me to
> commit this (assuming someone gives it a LGTM), so i’ll hold off for now.
>

Yeah, until you/I/we/someone has actually reliable perf stuff that points
to any particular change or changes, I probably wouldn't start shooting in
the dark - just figured I'd bring it up in case it's something you're
already aware of or, upon looking, would become so.

- Dave

>
> Pete
>
>
> Just some idle thoughts.
>
> On Tue, Jun 16, 2015 at 4:46 PM, Pete Cooper <peter_cooper at apple.com>
> wrote:
>
>> Hi Nick
>>
>> As you were the last person to attempt this on PR889, I thought i’d see
>> how you like this series of patches.
>>
>> I’ve tried to devirtualize a small number of virtual methods at a time,
>> hence the bunch of patches.  Unfortunately the last one is pretty huge as
>> it handles all the places we used to call delete on Value’s and now need to
>> call a destroy method to dispatch to the correct subclass.  I’m not sure
>> there’s a great way to split the last one up.
>>
>> David, i’ve CCed you specifically to ask about how safe some of the
>> changes here are.  In particular, the way i’ve changed things is:
>> - If a subclass is final (and i’ve added final where I can), then I give
>> it its own delete method which just does ‘using User::operator delete’.  Is
>> this reasonable?  Do i need to handle any other kinds of delete like User
>> does?
>> - If a subclass is not able to be final, then i’ve added a destroy()
>> method and switch on the value kind to then call ‘delete
>> cast<SubClass>(this)’
>>
>> Thanks to Duncan for the tip on std::default_delete, i’ve made use of
>> this on various places which had a std::unique_ptr of say Instruction or
>> Value.  The default deleter for those calls destroy() instead of delete.
>> Deleter’s aren’t needed for final subclasses because I was able to give
>> them a delete method instead.
>>
>> I also extended helpers like DeleteContainerPointers to take a
>> std::default_delete so that we can forward to the destroy methods where
>> needed.
>>
>> I measured memory usage on the verify_uselist_order test case Duncan has
>> been using.  This contains debug info and is measured with 'llc
>> ~/verify-uselistorder.lto.opt.bc -o /dev/null --filetype=obj
>> -disable-verify’.  Prior to this change we consumed 814MB, and 800MB
>> after.  So almost 2% savings.  This is in line with what I was expecting as
>> the savings is 8 bytes per Value, but due to malloc alignment on Mac OS, we
>> only save that on an average of half of values.
>>
>> Finally, this is the order I did all the work locally.  If any of this
>> makes sense to be squashed or split, please let me know.  In the first
>> patch for example I create Constant.def, but I later change that to
>> Value.def.  This one should obviously be fixed to be Value.def from the
>> start.
>>
>> Cheers,
>> Pete
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150626/42f0d078/attachment.html>